Today, I’ve been unzipping the contents of over 90 GB of data from one hard drive to the same drive. Unfortunately, I’ve seen time and again huge bottlenecks happen whenever large quantities of data are being dealt with.
In fact, if you don’t know the details of how an operating system works, you’ll be more than surprised to see your computer grind to a halt when moving a large file. This especially happens when extracting resources from a zip file, typically when doing so from one drive to that same drive, since both input and output are intensively used on the same disk.
I’m not a user of Linux or FreeBSD, the later of which I heard had better management abilities regarding file handling, especially given its main use as a high availability enterprise server OS, so I cannot tell you how the situation is over there, but I can tell you that on Windows 7 and OS X Snow Leopard, the latest as of this writing, the situation is pretty bleak.
Because using so much I/O leaves little for most operations requiring data from the hard drive, which ends up being practically anything you do, the system generally stalls as a whole. It’s still usable, but you have to factor in the uber slow load speeds caused by a saturated hard drive I/O.
However, I was thinking that this is kind of weird. Shouldn’t an OS be intelligent enough to balance the use of the file transfer’s I/O and that of loading applications so that you can continue working while transferring or extracting data. It’s amazing to me that it’s almost as if there was no SMP on the system. Yes, my computers used to stall when they only had one processor and when software was not well optimized to do SMP (multi-processor stuff), but I’m still stunned by how dumb Windows and Mac are in regards to hard drive I/O (input/ouput of file data).
In any cases, this is what I learned:
- A surprising amount of apps and processes rely excessively on hard drive instead of caching data in the large amounts of RAM
- Windows’ memory management for applications that have already been opened isn’t very good; e.g. Google Chromes takes a lot of time to re-open when I/O is saturated, while it is not as bad on Mac OS X
- The hard drive is a major performance bottleneck
- Modern desktop OS have few capabilities in regards to the management of I/O; they’re horribly inefficient
- Current HDDs are slow, really slow
If I have the chance, I will be testing this out on other operating systems, notably Ubuntu, Fedora and FreeBSD.