Quote Originally Posted by mickey View Post
You said that I can do with 2 files A and B. So I work with 256000 lines. Sort they and put in file A; then read other 256000 lines, sort(), and put in file B. Now using mergesort to merge two file and write they in the first 500.000 lines of file X. Now I go on. 256.000*2 = 512.000; now the first 512000 lines of X are sorted! But the file isn't finish. Here begin what I don't understand....
Before you start to merge anything, you have to split the whole file, not only the first blocks.

If you have a file 'X' with 2048k of lines, you have to do the following:
  • repeat 4 times:
    • lines := read 256k lines from X,
    • sort lines
    • write lines to file A
    • lines := read 256k lines from X,
    • sort lines
    • write lines to file B
  • rewind files
  • repeat 4 times:
    • merge 256k line blocks from A and B and append them to X
  • rewind files
  • repeat 2 times:
    • append 512k lines from X to A
    • append 512k lines from X to B
  • rewind files
  • repeat 2 times:
    • merge 512k line blocks from A and B and append them to X
  • rewind files
  • append 1024k lines from X to A
  • append 1024k lines from X to B
  • rewind files
  • merge 1024k line blocks from A and B and append them to X


Quote Originally Posted by mickey View Post
It seems me slow...
Of course it's slow. If you can't fit your data in memory, everything will be slow, but other methods might be event slower.

Although there might be other solution. If you find such function f, that if line1 < line2, then f(line1) < f(line2), you can sort values returned by that function instead.