Before you start to merge anything, you have to split the whole file, not only the first blocks.
If you have a file 'X' with 2048k of lines, you have to do the following:
- repeat 4 times:
- lines := read 256k lines from X,
- sort lines
- write lines to file A
- lines := read 256k lines from X,
- sort lines
- write lines to file B
- rewind files
- repeat 4 times:
- merge 256k line blocks from A and B and append them to X
- rewind files
- repeat 2 times:
- append 512k lines from X to A
- append 512k lines from X to B
- rewind files
- repeat 2 times:
- merge 512k line blocks from A and B and append them to X
- rewind files
- append 1024k lines from X to A
- append 1024k lines from X to B
- rewind files
- merge 1024k line blocks from A and B and append them to X
Of course it's slow. If you can't fit your data in memory, everything will be slow, but other methods might be event slower.
Although there might be other solution. If you find such function f, that if line1 < line2, then f(line1) < f(line2), you can sort values returned by that function instead.
Bookmarks