Yet another informative post
Oh, do I love myself when I make these intelligent articles that actually inform people rather than annoy them. Well, turns out they are not such a common occurence on this blog, but hey, they’re usually longer. So, again, I’m not asking anything here, unlike the title seems to indicate, I’m actually teaching you something. Now you could go ahead and read everything about it in your favorite encyclopedia, cough, Wikipedia, cough, but sometimes doing that yourself and decifering the amalgamy of information to just what you want to know can be daunting. If you haven’t, take a look at my PS3 Sound Article, it’s all stuff straight out from encyclopedias and very technical articles, but reformulated just so the average person can understand it without having to do all the nasty research. So, let’s dive in!
Why should I know this?
The big question, why should you, of all people, care about kb, kB and KiB. What is it and what does it have to do with my life. Turns out, if you use a computer, it concerns you. If you use a cell phone with any form of data on it, it concerns you. If you pay an Internet bill, it concerns you. If you use an MP3 player (ie. an iPod), it concerns you. And if you’re reading this blog, this is sure to come in handy. Even if you’re pretty well versed in computing, there may be some things that’ll surprise you, so read on.
First off, the bit and the byte
In computers, data, like your Microsoft Office Word documents, pictures, videos and music, are all consisting of 1 and 0 somewhere in the memory of your computer (ie. your hard disk drive). This is called a binary system. Every electronic machine capable of computing something relies on a binary system to do it. Yup, from your digital watch to your cell phone, from your calculator to your oven’s dash (if it’s digital), and of course your computer, they all use a binary system.
The reason is pretty simple. The only way for electricty to be represented into computable data is to use its single invariably changing property, whether it’s on or off. This is represented by a 1 for on, and a 0 for off. And then, will all sorts of complicated mechanisms not explained here, cabling, chipsets and boards, along with billions of ones and zeros, comes to life everything digital you use in your daily life, like your brand new HDTV or your gaming console.
Let’s go back to computer data. Earlier we said it was all 1 and 0. Well, each time you have either a 1 or a 0, you have a bit. A single bit of data can have two possible meanings, 1 or 0. On a CD, this is represented by a hole (1) and no hole (0). This is why we call it “burning a CD”, because we literaly burn in a hole every time we get a bit representing a 1. And by combining all those bits together, we get a CD with stuff on it, like Music.
However, a bit alone cannot represent much, this is why in computing, most bits are coupled in series (technically a word, the smallest addressable sub-field, or useable piece of data, of a computer). The most common serie is one of 8 bit. For example, the 8 bit ASCII system used to represent raw text in its Western flavor, the letter a is represented by 01100001, which is a binary sequence of 8 bit, because it has a combination of 8 ones and zeros.
A single bit is represented by the lower case letter b.
Eventually however, counting data strictly in bits becomes tiresome. Only two letters is 16 bits, and 12 bits of text equals one letter and a half, which is impossible. This means that in an 8 bit system, counting the bits is useless, since you can’t really split them, they have to come in pairs of 8, always. And so, some brilliant guy in the past invented the byte. The byte is simply a measure of a single bit serie. Thus, in an 8 bit system, 1 byte is equal to 8 bit, so 2 letters is equal to 2 byte, a bit more representative. Additionally, the byte prevents impossible spliting since you cannot split 1 without getting a floating point number (ie. 0.5), which doesn’t exist in the binary system.
A single byte is represented by the upper case letter B.
A bit of history
Historically, a byte could represent any serie of bits, as long it matched its system’s base. If data was represented by a 4 bit system, a byte would equal 4 bits, if data was represented by a 16 bit system, a byte would equal 16 bits, and on and on. However, due to IBM’s System/360′s 8 bit architecture in the 1960s and the explosion in popularity of microcomputers based on 8 bit microprocessors in the 1980s, 8 bit remained the standard by which we measure data. Other things, like the more recent microcomputers, also know as your common Desktop PC, and microprocessors, also known today simply as processors or CPUs (ie. the Intel Pentium), have however migrated to a higher number of bits per bit serie for higher efficiency and bigger memory. You’ll probably hear a lot about that in the coming years with the mainstreaming of 64 bit microprocessor architectures to replace 32 bit systems, mainly due to the memory limitations of 32 bit architectures.
In other words, even though file systems (the thing that handles your files) are today reaching 128 bit architectures, data is still represented in an 8 bit format, where as a single byte represents 8 bit.
Handling larger numbers
Once computer inovation started to ramp up, the quantity of data we work with continually expanded. Before engineers knew it, they had to deal with thousands of bytes, instigating the need for yet another way to sum up data into smaller numbers. Unlike money, which exists since way longer than bits and bytes, it was easy to perceive how we would eventually reach much more than billions of bytes of data. To sum up things better, engineers decided to borrow on the SI system, otherwise known as the “Le Système International d’Unités”, or International System of Units, a French invention to sum up large numbers in the metric system. If you live elsewhere than in the United States, you’re already familiar with this notation appearing everything.
In the SI, a lower case krepresents a thousand, a capital M represents a million, a capital G represents a billion, etc. Each measure has its full written form too, which consists of a prefix to add to any measure. k equals kilo, so 1000 meters equals 1 kilometer (also km). 1000 grams equals 1 kilogram (also kg). Since the Earth’s circumference is roughly only 40,000 km, some measures like the megameter (Mm) have yet to see any real use. 40 megameter seems less impressive than 40,000 km.
The importance of case sensitivity: The SI system makes great use of lower and capital case for its acronym. The best way to show this is with an example. A million, or a mega, is represented by an upper case M. A megameter is thus represented as follows: Mm. However, a very close measure, written mm, is the millimeter. A milli, or 1/1000th, is represented by a lower case m. And so, very subtle case sensitivity like the lower case k of a thousand aren’t so unimportant anymore. You wouldn’t want to use a capital K because it means degree kelvin. This is often overlooked in the computer world however, where subdivions of a unit are impossible. Indeed, a millibyte or a millibit does not exist.
In the case of a bit, the SI system does well. Since the SI system is always used at the power of 10, it is possible to use it perfectly standardly with the bit. 1000 bits is equal to 1 kilobit (kb).
However, when it comes to bytes, the story was a bit different. Because of the computer’s binary nature, memory addresses (where the data is physically located on a memory chip) are written in binary sequences. As such, the number of addressable memory locations are counted with a power of 2. For example, an 8 bit address space has a total addressable memory of 256 bytes. However, newer 16 bit memory architectures at the time could now jump-start the 256 bytes limit produce a much more potent 65,536 bytes of addressing space. Applying the SI system at the power of 10 on this produced a somewhat awkward number, 65.536 kilobytes (kB). To remedy the situation, engineers took the SI system and made it at the power of 2 to match memory addressable space, so that a kilobyte would equal 1024 bytes while a kilobit still equaled 1000 bits, bringing confusion still reigning into the world of computing today and a nice round 64 kilobytes of memory for the 16 bit architecture.
Controversy
Invented in America where the metric system has not yet been adopted as of 2009, the secluded world of computing did not wake anyone up with their awkward borrow on the SI system until 1995, when the IUPAC and the NIST proposed to a newer unambiguous system to the IEC, only to be accepted in 1999, giving birth the kibibyte and the mebibyte, a play on the word kilobinary and megabinary, later followed by the other higher multiples in 2000.
However, by that time, it was already 40 years since the computer industry had been using the SI multiples at power of 2. 10 years after the standardization of the IEC format, adoption has been nearly innexistant, with the only systems using the measure being rare very recent Linux distributions. Even the latest Windows and Mac OS X still use the SI prefixes for bytes.
But using the SI prefixes for bytes isn’t necessarily wrong, since the IEC standard allows it. Only, you have to use them at the power of 10, so one kilobyte equals 1000 bytes, and one kibibyte equals 1024 bytes. It goes without saying that this is a highly contested and controversial standard. It does bring clarity and non ambiguousness, but it also brings confusion for legacy systems.
For example, MP3 player users are used to see their storage capacity in GB when it should be in GiB. Adding the i will obviously bring in a lot of questionning in stores where a brand might not use the same notation. Most store clerks would probably been unable to even explain the difference. This is why marketing forces at Apple and Microsoft, along with almost every hardware manufacturer, decided to keep the original measurement.
Additionally, there’s very little explanation to why this system should be implemented at all. Afterall, in traditionnal computing, a kilobyte never meant anything else than 1024 bytes, so there’s very little proof at how the IEC’s newer system may improve the situation. Memory manufacturers won’t start making memory systems in multiples of 10 and so many think that the confusing would just be transfered over to the kibibyte, which would often been mistook for a thousand bytes.
Adoption
However, when it comes to the educated world, standards make a long way. Most of today’s technical documentation and teaching material has adopted IEC’s standard, refering to kilobytes as multiples of 10, and kibibytes as multiples of 2. This means that computer classes in school will start teaching it that way, and that eventually, operating system makers and memory companies will have to adapt, regardless of controversy.
Conclusion
While your computer probably still uses the wrong notation, assuming you’re reading this article as of 2009, future systems will probably adopt the IEC standard, so let’s review the whole thing. I’ve also included the bit version of the IEC, known as the kibibit (Kib) and its brothers, in this review, along with an in-depth explanation of capitalization rules in red.
Note on k/K: Although the SI system makes an absolute use of a lower case k for a thousand because the upper case K is for a degree kelvin, remember that the computer use of the SI system has never been standardized into the SI’s base or derived units and many sources suggest that a capital K can also be used for a thousand.
A KB or a Kb cannot be mistaken for a kelvinbyte or kelvinbit because it doesn’t make sense. Also, since the binary system doesn’t make use of subdivisions and thus does not possess the lower case d, c, m and other prefixes, only the k being lower case would make for an inconsistent notation.
For these reasons, the IEC chose to have the kibibyte with a capital K and many often use a capital K for a kilobyte and a kylobit. Note that the use of a lower case k (ie. kiB) for a kibibyte is not accepted.
As the writer of this article though, I am a very purist person when it comes to standards. Since the kilobyte and the kilobit are both borrowing on the SI standard, I think they should use a lower case k, regardless of application. However this hasn’t been standardized, and you can use whichever you think is better. The review here under uses a lower case k.
1 bit (b) = 1 bit (b) 1 byte (B) = 8 bit (b) 1 kilobyte (kB) = 1000 bytes (B) 1 megabyte (MB) = 1000 kilobytes (kB) 1 gigabyte (GB) = 1000 megabytes (MB) 1 terabyte (TB) = 1000 gigabytes (GB) 1 kibibyte (KiB) = 1024 bytes (B) 1 mebibyte (MiB) = 1024 kibibytes (KiB) 1 gibibyte (GiB) = 1024 mebibytes (MiB) 1 tebibyte (TiB) = 1024 gibibytes (GiB)
Higher prefixes can be seen here for the SI system, and here for the IEC system.
Kibibits, mebibits and else also exist, effectively meaning bit multiples at the power of 2, exactly like kibibytes, mebibytes and company. However, the IEC pretty much created this standard just for the sake of it, as it isn’t really useful. Maybe in the future 1024 bit architectures will be called 1 kibibit architectures, but most architectures are far from 1024 bit in any cases.
Also, another notation exists for bits instead of b. Literaly using the full word bit instead, however invariable. The SI prefix is the traditional k, M, G, etc. and the IEC prefix is Ki, Mi, Gi, etc. So this goes like this: Kibit, Mibit, Gibit. The IEC seems to particularily encourage this notation to further distinguish between a byte and a bit traditionnally only being difference by a lower and an upper case.