Results 1 to 11 of 11

Thread: reading file with multi-thread

  1. #1
    Join Date
    Mar 2009
    Location
    Gansu,China
    Posts
    176
    Qt products
    Qt4
    Platforms
    Windows

    Default reading file with multi-thread

    I read a binary file from disk to memory,
    1.Can multi-thread accelerate the speed ?
    2.If it can not,how to abtain it?

  2. #2
    Join Date
    Mar 2009
    Location
    Gansu,China
    Posts
    176
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: reading file with multi-thread

    There are 10 binary files to read,If I read it with multi-threads,Can it accelerate the speed ?

  3. #3
    Join Date
    Dec 2006
    Posts
    426
    Thanks
    8
    Thanked 18 Times in 17 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: reading file with multi-thread

    Quote Originally Posted by weixj2003ld View Post
    I read a binary file from disk to memory,
    1.Can multi-thread accelerate the speed ?
    2.If it can not,how to abtain it?
    It will in fact slow down. I don't know why, but I just tested this yesterday.

    I have a file containing floating points with size 1000 x 1000 x 1000 x 335, if I read it with one process (each time it reads a block of 1000 x 1000), it took less than a hour.

    However, if I have 2 processes running, each reading same size file as above (same way of reading), each will take 28 hours!

    Someone who understands hardware please give explanation. Thanks!

  4. #4
    Join Date
    Mar 2009
    Location
    Gansu,China
    Posts
    176
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: reading file with multi-thread

    Quote Originally Posted by lni View Post
    It will in fact slow down. I don't know why, but I just tested this yesterday.

    I have a file containing floating points with size 1000 x 1000 x 1000 x 335, if I read it with one process (each time it reads a block of 1000 x 1000), it took less than a hour.

    However, if I have 2 processes running, each reading same size file as above (same way of reading), each will take 28 hours!

    Someone who understands hardware please give explanation. Thanks!
    Is it because of considering synchronizing the threads,If there are 10 files,I think it can accelerate the speed,Am I right or not?

  5. #5
    Join Date
    Dec 2006
    Posts
    426
    Thanks
    8
    Thanked 18 Times in 17 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: reading file with multi-thread

    Quote Originally Posted by weixj2003ld View Post
    Is it because of considering synchronizing the threads,If there are 10 files,I think it can accelerate the speed,Am I right or not?
    I don't know very much about hardware, but I think hard drive has a limit on sustained transfer rate. It is like a water pipe having a fixed diameter that limits the amount of water flowing inside the pipe each second. If you turn on 10 faucets simultaneously at the end of pipe, each faucet will only get 1/10 of water, and will lose some in the middle... I don't if this explanation makes sense, hope someone may have better idea.

    Check this, but I didn't read it (too much for me). You can summarize for us
    http://www.pcguide.com/ref/hdd/perf/...ransSTR-c.html
    Last edited by lni; 15th April 2009 at 04:03.

  6. #6
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,363
    Thanks
    3
    Thanked 5,012 Times in 4,791 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: reading file with multi-thread

    Quote Originally Posted by lni View Post
    It will in fact slow down. I don't know why, but I just tested this yesterday.
    Because disk access is not multithreaded. There is a single (or 'single') header in the disk that sequentially reads data based on pending requests. If you read two or more files that are in different sectors of the disk the header has to jump back and forth between the sectors to do the reading or optimizes reading in series (reading a few sectors from one file while other files are getting starved) which in both cases slows the whole process down. The situation might be worsened by caching issues ("cache flickering" or whatever that's called in English) where you constantly overwrite cached data that you are going to need in a moment with data that you need currently.

    If you were reading those files from different physical devices, multithreading would have made things faster. You can optimize reading with reading the data in large chunks and caching it in memory (i.e. by using mmap()) or by reading from a ramdisk (try comparing compilation of a large project from disk and from ramdisk). There are also prefetching calls in your operating system you might use to ask the system to prefetch and store a file in its system buffers for you. Using a different operating system with a different disk driver might also help. It's not a discovery that Linux reads FAT disks faster than Windows does

    By the way - you can see this "feature" in action when using a system that is constantly swapping memory to/from disk. One thing is that data transfer rate is decreased because more data needs to be read/written and another is the header spends most of its time moving, not accessing data. You can hear that the disk becomes much louder when the machine is swapping and doing something else in the same time. The same issue applies to fragmented file systems - the header jumps back and forth and the cached data is useless. That's why adding memory to your computer makes a larger difference than replacing the disk with a faster one or replacing the CPU with a faster one. Of course once your computer's memory requirements get saturated, the difference is much smaller although still significant because decent operating systems buffer data in memory that's not allocated for other processes. That's one of the reasons why Vista's memory usage seems so high.
    Last edited by wysota; 15th April 2009 at 08:02.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  7. #7
    Join Date
    Dec 2006
    Posts
    426
    Thanks
    8
    Thanked 18 Times in 17 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: reading file with multi-thread

    Quote Originally Posted by wysota View Post
    Because disk access is not multithreaded. There is a single (or 'single') header in the disk that sequentially reads data based on pending requests. If you read two or more files that are in different sectors of the disk the header has to jump back and forth between the sectors to do the reading or optimizes reading in series (reading a few sectors from one file while other files are getting starved) which in both cases slows the whole process down. The situation might be worsened by caching issues ("cache flickering" or whatever that's called in English) where you constantly overwrite cached data that you are going to need in a moment with data that you need currently.

    If you were reading those files from different physical devices, multithreading would have made things faster. You can optimize reading with reading the data in large chunks and caching it in memory (i.e. by using mmap()) or by reading from a ramdisk (try comparing compilation of a large project from disk and from ramdisk). There are also prefetching calls in your operating system you might use to ask the system to prefetch and store a file in its system buffers for you. Using a different operating system with a different disk driver might also help. It's not a discovery that Linux reads FAT disks faster than Windows does

    By the way - you can see this "feature" in action when using a system that is constantly swapping memory to/from disk. One thing is that data transfer rate is decreased because more data needs to be read/written and another is the header spends most of its time moving, not accessing data. You can hear that the disk becomes much louder when the machine is swapping and doing something else in the same time. The same issue applies to fragmented file systems - the header jumps back and forth and the cached data is useless. That's why adding memory to your computer makes a larger difference than replacing the disk with a faster one or replacing the CPU with a faster one. Of course once your computer's memory requirements get saturated, the difference is much smaller although still significant because decent operating systems buffer data in memory that's not allocated for other processes. That's one of the reasons why Vista's memory usage seems so high.
    Excellent explanation, thanks!

    For files in different physical devices, are you sure multi-thread will be faster? My observation is that it will be even slower due to bandwidth limitation.

    If those disks belong to different computers, then sure it will be faster, but we won't need to discuss this scenario

    We have a cluster with more than 2000 computers nodes with one giant disk server, if the 2000 nodes read/write at the same time, you may just have to kill the job...

    I just get to my office, so here is the actual number in my test:
    File size 1098 x 1168 x 1001 x 335 (float points), each read/write a block 1098 x 1168 points:

    If only one process read/write, it take 15333 second (4.15'33"), speed = 107 MB/second

    If two processes read/write two files of same size:
    process 1 take 52300 seconds (14.31'40"), speed = 31.37 MB / second
    process 2 take 55254 seconds (15.20"54"), speed = 29.69 MB / second
    Last edited by lni; 15th April 2009 at 14:17.

  8. #8
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,363
    Thanks
    3
    Thanked 5,012 Times in 4,791 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: reading file with multi-thread

    Quote Originally Posted by lni View Post
    For files in different physical devices, are you sure multi-thread will be faster? My observation is that it will be even slower due to bandwidth limitation.
    If they are different physical devices (better yet connected to different IDE busses) then they don't share the bandwidth. Especially that the bandwidth is much wider than the actual amount of data you want it to transfer at once. Having DMA enabled (which is a standard nowadays) enables full asynchronous data transfer without the interference by the central processing unit. Thus the CPU can schedule a transfer and go think about other things (like context switching to another process/thread which takes time anyway) and when it's done, the data will most likely be already taken from the data bus.

    If those disks belong to different computers, then sure it will be faster, but we won't need to discuss this scenario
    Each IDE enabled board has at least two distinct IDE busses so if you connect the disks with separate cords, they won't share bandwidth in the IDE layer (they might in the data bus layer but it's unlikely this will be slower than reading from a single disc).

    We have a cluster with more than 2000 computers nodes with one giant disk server, if the 2000 nodes read/write at the same time, you may just have to kill the job...
    Two and two thousand make a difference by a factor of thousand


    If only one process read/write, it take 15333 second (4.15'33"), speed = 107 MB/second

    If two processes read/write two files of same size:
    process 1 take 52300 seconds (14.31'40"), speed = 31.37 MB / second
    process 2 take 55254 seconds (15.20"54"), speed = 29.69 MB / second
    So you see this is not a saturation of the data bus as the values don't sum up to the initial value. You get massive losses in the disk cache, cpu data cache, disk seeking and context switching.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  9. #9
    Join Date
    Dec 2006
    Posts
    426
    Thanks
    8
    Thanked 18 Times in 17 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: reading file with multi-thread

    Quote Originally Posted by wysota View Post
    So you see this is not a saturation of the data bus as the values don't sum up to the initial value. You get massive losses in the disk cache, cpu data cache, disk seeking and context switching.
    Those tests are using local disks. I will try to put data in a server and do the same test to see what happen...

  10. #10
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,363
    Thanks
    3
    Thanked 5,012 Times in 4,791 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: reading file with multi-thread

    Ok, we're way offtopic here
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  11. #11
    Join Date
    Mar 2009
    Location
    Gansu,China
    Posts
    176
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: reading file with multi-thread

    I have done a test before.
    If I write a file with one thread,and run it on a dual -core computer,it cost about 19s.I find the CPU is uses 50% when my program runing.
    If I write a file with two threads,and run it on a the computer ,it cost about 11s.I found the CPU is used 100% when my program runing.Why?

Similar Threads

  1. File Reading Advice
    By tntcoda in forum Qt Programming
    Replies: 1
    Last Post: 11th November 2008, 19:44
  2. help in reading XML file
    By cshiva_in in forum Qt Programming
    Replies: 1
    Last Post: 24th March 2008, 13:55
  3. QTextStream loses position while reading file
    By bjh in forum Qt Programming
    Replies: 2
    Last Post: 13th February 2008, 15:47
  4. Replies: 3
    Last Post: 18th October 2007, 18:07
  5. QWT 5, QT3, SuSE 10.2. Crash and burn
    By DrMcCleod in forum Qwt
    Replies: 8
    Last Post: 7th September 2007, 20:53

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.