PDA

View Full Version : reading file with multi-thread



weixj2003ld
15th April 2009, 02:19
I read a binary file from disk to memory,
1.Can multi-thread accelerate the speed ?
2.If it can not,how to abtain it?

weixj2003ld
15th April 2009, 03:09
There are 10 binary files to read,If I read it with multi-threads,Can it accelerate the speed ?

lni
15th April 2009, 03:11
I read a binary file from disk to memory,
1.Can multi-thread accelerate the speed ?
2.If it can not,how to abtain it?

It will in fact slow down. I don't know why, but I just tested this yesterday.

I have a file containing floating points with size 1000 x 1000 x 1000 x 335, if I read it with one process (each time it reads a block of 1000 x 1000), it took less than a hour.

However, if I have 2 processes running, each reading same size file as above (same way of reading), each will take 28 hours!

Someone who understands hardware please give explanation. Thanks!

weixj2003ld
15th April 2009, 03:31
It will in fact slow down. I don't know why, but I just tested this yesterday.

I have a file containing floating points with size 1000 x 1000 x 1000 x 335, if I read it with one process (each time it reads a block of 1000 x 1000), it took less than a hour.

However, if I have 2 processes running, each reading same size file as above (same way of reading), each will take 28 hours!

Someone who understands hardware please give explanation. Thanks!
Is it because of considering synchronizing the threads,If there are 10 files,I think it can accelerate the speed,Am I right or not?

lni
15th April 2009, 03:58
Is it because of considering synchronizing the threads,If there are 10 files,I think it can accelerate the speed,Am I right or not?

I don't know very much about hardware, but I think hard drive has a limit on sustained transfer rate. It is like a water pipe having a fixed diameter that limits the amount of water flowing inside the pipe each second. If you turn on 10 faucets simultaneously at the end of pipe, each faucet will only get 1/10 of water, and will lose some in the middle... I don't if this explanation makes sense, hope someone may have better idea.

Check this, but I didn't read it (too much for me). You can summarize for us :)
http://www.pcguide.com/ref/hdd/perf/perf/spec/transSTR-c.html

wysota
15th April 2009, 07:53
It will in fact slow down. I don't know why, but I just tested this yesterday.
Because disk access is not multithreaded. There is a single (or 'single') header in the disk that sequentially reads data based on pending requests. If you read two or more files that are in different sectors of the disk the header has to jump back and forth between the sectors to do the reading or optimizes reading in series (reading a few sectors from one file while other files are getting starved) which in both cases slows the whole process down. The situation might be worsened by caching issues ("cache flickering" or whatever that's called in English) where you constantly overwrite cached data that you are going to need in a moment with data that you need currently.

If you were reading those files from different physical devices, multithreading would have made things faster. You can optimize reading with reading the data in large chunks and caching it in memory (i.e. by using mmap()) or by reading from a ramdisk (try comparing compilation of a large project from disk and from ramdisk). There are also prefetching calls in your operating system you might use to ask the system to prefetch and store a file in its system buffers for you. Using a different operating system with a different disk driver might also help. It's not a discovery that Linux reads FAT disks faster than Windows does :)

By the way - you can see this "feature" in action when using a system that is constantly swapping memory to/from disk. One thing is that data transfer rate is decreased because more data needs to be read/written and another is the header spends most of its time moving, not accessing data. You can hear that the disk becomes much louder when the machine is swapping and doing something else in the same time. The same issue applies to fragmented file systems - the header jumps back and forth and the cached data is useless. That's why adding memory to your computer makes a larger difference than replacing the disk with a faster one or replacing the CPU with a faster one. Of course once your computer's memory requirements get saturated, the difference is much smaller although still significant because decent operating systems buffer data in memory that's not allocated for other processes. That's one of the reasons why Vista's memory usage seems so high.

lni
15th April 2009, 14:09
Because disk access is not multithreaded. There is a single (or 'single') header in the disk that sequentially reads data based on pending requests. If you read two or more files that are in different sectors of the disk the header has to jump back and forth between the sectors to do the reading or optimizes reading in series (reading a few sectors from one file while other files are getting starved) which in both cases slows the whole process down. The situation might be worsened by caching issues ("cache flickering" or whatever that's called in English) where you constantly overwrite cached data that you are going to need in a moment with data that you need currently.

If you were reading those files from different physical devices, multithreading would have made things faster. You can optimize reading with reading the data in large chunks and caching it in memory (i.e. by using mmap()) or by reading from a ramdisk (try comparing compilation of a large project from disk and from ramdisk). There are also prefetching calls in your operating system you might use to ask the system to prefetch and store a file in its system buffers for you. Using a different operating system with a different disk driver might also help. It's not a discovery that Linux reads FAT disks faster than Windows does :)

By the way - you can see this "feature" in action when using a system that is constantly swapping memory to/from disk. One thing is that data transfer rate is decreased because more data needs to be read/written and another is the header spends most of its time moving, not accessing data. You can hear that the disk becomes much louder when the machine is swapping and doing something else in the same time. The same issue applies to fragmented file systems - the header jumps back and forth and the cached data is useless. That's why adding memory to your computer makes a larger difference than replacing the disk with a faster one or replacing the CPU with a faster one. Of course once your computer's memory requirements get saturated, the difference is much smaller although still significant because decent operating systems buffer data in memory that's not allocated for other processes. That's one of the reasons why Vista's memory usage seems so high.

Excellent explanation, thanks!

For files in different physical devices, are you sure multi-thread will be faster? My observation is that it will be even slower due to bandwidth limitation.

If those disks belong to different computers, then sure it will be faster, but we won't need to discuss this scenario :)

We have a cluster with more than 2000 computers nodes with one giant disk server, if the 2000 nodes read/write at the same time, you may just have to kill the job...

I just get to my office, so here is the actual number in my test:
File size 1098 x 1168 x 1001 x 335 (float points), each read/write a block 1098 x 1168 points:

If only one process read/write, it take 15333 second (4.15'33"), speed = 107 MB/second

If two processes read/write two files of same size:
process 1 take 52300 seconds (14.31'40"), speed = 31.37 MB / second
process 2 take 55254 seconds (15.20"54"), speed = 29.69 MB / second

wysota
15th April 2009, 16:27
For files in different physical devices, are you sure multi-thread will be faster? My observation is that it will be even slower due to bandwidth limitation.
If they are different physical devices (better yet connected to different IDE busses) then they don't share the bandwidth. Especially that the bandwidth is much wider than the actual amount of data you want it to transfer at once. Having DMA enabled (which is a standard nowadays) enables full asynchronous data transfer without the interference by the central processing unit. Thus the CPU can schedule a transfer and go think about other things (like context switching to another process/thread which takes time anyway) and when it's done, the data will most likely be already taken from the data bus.


If those disks belong to different computers, then sure it will be faster, but we won't need to discuss this scenario :)
Each IDE enabled board has at least two distinct IDE busses so if you connect the disks with separate cords, they won't share bandwidth in the IDE layer (they might in the data bus layer but it's unlikely this will be slower than reading from a single disc).


We have a cluster with more than 2000 computers nodes with one giant disk server, if the 2000 nodes read/write at the same time, you may just have to kill the job...
Two and two thousand make a difference by a factor of thousand ;)



If only one process read/write, it take 15333 second (4.15'33"), speed = 107 MB/second

If two processes read/write two files of same size:
process 1 take 52300 seconds (14.31'40"), speed = 31.37 MB / second
process 2 take 55254 seconds (15.20"54"), speed = 29.69 MB / second
So you see this is not a saturation of the data bus as the values don't sum up to the initial value. You get massive losses in the disk cache, cpu data cache, disk seeking and context switching.

lni
15th April 2009, 17:05
So you see this is not a saturation of the data bus as the values don't sum up to the initial value. You get massive losses in the disk cache, cpu data cache, disk seeking and context switching.

Those tests are using local disks. I will try to put data in a server and do the same test to see what happen...

wysota
16th April 2009, 00:03
Ok, we're way offtopic here :)

weixj2003ld
16th April 2009, 02:12
I have done a test before.
If I write a file with one thread,and run it on a dual -core computer,it cost about 19s.I find the CPU is uses 50% when my program runing.
If I write a file with two threads,and run it on a the computer ,it cost about 11s.I found the CPU is used 100% when my program runing.Why?