The world of computing uses many metrics to measure data. Probably the most significant is the amount of bits contained in a given file. However, no one has ever truly agreed on the way we write these measures, and efforts to standardize this are only ever recent.
Nevertheless, many governments and companies have started moving ahead with the adoption of more standard practices, even though companies like Microsoft have not yet adapted.
Why Should You Know This?
But the real question is: why should you, as a user, know this? Why should you care about kb, kB and KiB at all? It turns out that the moment you use a computer, it concerns you. Better yet, with today’s caps on Internet bandwidth, understanding the difference between the various measures is of crucial importance to understanding your telecom bill. There’s more to it than simple measures, but it’s at least part of the puzzle.
First off, the bit and the byte
In computers, data; documents, pictures, videos and music, all consist of binary information somewhere in the memory of your computer. Binary information is represented by 1s and 0s.
These 1s and 0s are understood by your computer via electricity. In its most basic state, 0 represents off, and 1 represents on. Various techniques are used to represent such data in different medias. For example, compact discs present a series of microscopic holes on a circular plate, where a hole may represent 1, no hole may represent 0.
One such 1 or 0 is always measured as a bit. Fun trivia, a standard CD has 5.6 billion possible holes, or in other words a capacity of 5.6 billion bits.
But a bit alone doesn’t mean much. All it can do is tell the computer whether it represents 1 or 0, not very exciting, which is where the word comes in. The word represents the natural unit of data understood by a processor. It’s a group of bits that represents the most basic data set a processor can understand. For example, in typical systems, the letter A is represented by the 8-bit word 01000001. It’s called 8-bit because it has a length of 8 bits. Therefore, if you throw 16 bits at the processor, it will divide them in 2 distinct data sets and try to understand these.
But a processor cannot understand individual bits. It is bound by its word size, meaning that if you throw 7 bits at a processor designed to handle 8-bit words, it won’t understand.
Because of this limitation, bits alone are not a good way to represent data when programming a computer. To solve this, we need to use the byte, a unit meant to represent a set of bits.
Historically, because the byte was not defined as a fixed size, it was analogous to a word. However, in modern systems, byte is universally 8 bits, stemming from the once popular 8-bit word size. But modern processors use varying word sizes, so the use of words to quantify data is ill-advised.
In this way, word is a unit relegated to the specifics of memory management in processor architectures, while byte is strictly used as a general unit of data representing 8 bits.
To sum it all, what’s important to know is that on an 8-bit word length computer, data can only be represented in multiples of 8 bits. Therefore, bits are seldom-used in the representation of data, in favor of the byte, except perhaps in the measuring of Internet connections speed, usually in bits per second.
Handling Larger Numbers
Once computer innovation started to ramp up, the quantity of data we work with continually expanded. Before engineers knew it, they had to deal with thousands of bytes, instigating the need for a way to sum up data into smaller numbers. Unlike money, which exists since way longer than bits and bytes, it was easy to perceive how we would eventually reach much more than billions of bytes of data. To sum up things better, engineers decided to borrow on the SI system, otherwise known as the “Le Système International d’Unités”, or International System of Units, a French invention to sum up large numbers in the metric system. If you live elsewhere than in the United States, you’re probably already familiar with this notation appearing in nearly everything.
In the SI system, a lower case k represents a thousand, a capital M represents a million, a capital G represents a billion, etc. Each measure has its full written form too, which consists of a prefix to add to any measure. For data, in order to represent a thousand bytes, you can say a kilobyte, a megabyte for a million, and so on.
The importance of case sensitivity: The SI system makes great use of lower and capital case for its acronym. For example, while a million is represented by an upper case M or mega, a milli, or 1/1000th, is represented by a lowercase m. When combined with the measures for distances, the very similar Mm and mm represent vastly different things, namely a megameter and a millimeter, the later being one billion times smaller.
In the case of a bit, the SI system does well. Since the SI system is always used at the power of 10, it is possible to use it perfectly standardly with the bit. 1000 bits is equal to 1 kilobit (kb).
However, when it comes to bytes, the story was a bit different. Because of the computer’s binary nature, memory addresses (where the data is physically located on a memory chip) are written in binary sequences. As such, the number of addressable memory locations are counted with a power of 2. For example, an 8 bit address space has a total addressable memory of 256 bytes. However, newer 16 bit memory architectures at the time could now jump-start the 256 bytes limit produce a much more potent 65,536 bytes of addressing space. Applying the SI system at the power of 10 on this produced a somewhat awkward number, 65.536 kilobytes (kB). To remedy the situation, engineers took the SI system and made it at the power of 2 to match memory addressable space, so that a kilobyte would equal 1024 bytes while a kilobit still equaled 1000 bits, bringing confusion still reigning into the world of computing today and a nice round 64 kilobytes of memory for the 16 bit architecture.
Controversy
Invented in America where the metric system has not yet been adopted as of 2009, the secluded world of computing did not wake anyone up with their awkward borrow on the SI system until 1995, when the IUPAC and the NIST proposed to a newer unambiguous system to the IEC, only to be accepted in 1999, giving birth the kibibyte and the mebibyte, a play on the word kilobinary and megabinary, later followed by the other higher multiples in 2000.
However, by that time, it was already 40 years since the computer industry had been using the SI multiples at power of 2. 10 years after the standardization of the IEC format, adoption has been nearly innexistant, with the only systems using the measure being rare very recent Linux distributions. Even the latest Windows and Mac OS X still use the SI prefixes for bytes.
But using the SI prefixes for bytes isn’t necessarily wrong, since the IEC standard allows it. Only, you have to use them at the power of 10, so one kilobyte equals 1000 bytes, and one kibibyte equals 1024 bytes. It goes without saying that this is a highly contested and controversial standard. It does bring clarity and non ambiguousness, but it also brings confusion for legacy systems.
For example, MP3 player users are used to see their storage capacity in GB when it should be in GiB. Adding the i will obviously bring in a lot of questionning in stores where a brand might not use the same notation. Most store clerks would probably been unable to even explain the difference. This is why marketing forces at Apple and Microsoft, along with almost every hardware manufacturer, decided to keep the original measurement.
Additionally, there’s very little explanation to why this system should be implemented at all. Afterall, in traditionnal computing, a kilobyte never meant anything else than 1024 bytes, so there’s very little proof at how the IEC’s newer system may improve the situation. Memory manufacturers won’t start making memory systems in multiples of 10 and so many think that the confusing would just be transfered over to the kibibyte, which would often been mistook for a thousand bytes.
Adoption
However, when it comes to the educated world, standards make a long way. Most of today’s technical documentation and teaching material has adopted IEC’s standard, refering to kilobytes as multiples of 10, and kibibytes as multiples of 2. This means that computer classes in school will start teaching it that way, and that eventually, operating system makers and memory companies will have to adapt, regardless of controversy.
Conclusion
While your computer probably still uses the wrong notation, assuming you’re reading this article as of 2009, future systems will probably adopt the IEC standard, so let’s review the whole thing. I’ve also included the bit version of the IEC, known as the kibibit (Kib) and its brothers, in this review, along with an in-depth explanation of capitalization rules in red.
Note on k/K: Although the SI system makes an absolute use of a lower case k for a thousand because the upper case K is for a degree kelvin, remember that the computer use of the SI system has never been standardized into the SI’s base or derived units and many sources suggest that a capital K can also be used for a thousand.
A KB or a Kb cannot be mistaken for a kelvinbyte or kelvinbit because it doesn’t make sense. Also, since the binary system doesn’t make use of subdivisions and thus does not possess the lower case d, c, m and other prefixes, only the k being lower case would make for an inconsistent notation.
For these reasons, the IEC chose to have the kibibyte with a capital K and many often use a capital K for a kilobyte and a kylobit. Note that the use of a lower case k (ie. kiB) for a kibibyte is not accepted.
As the writer of this article though, I am a very purist person when it comes to standards. Since the kilobyte and the kilobit are both borrowing on the SI standard, I think they should use a lower case k, regardless of application. However this hasn’t been standardized, and you can use whichever you think is better. The review here under uses a lower case k.
1 bit (b) = 1 bit (b)
1 byte (B) = 8 bit (b)
1 kilobyte (kB) = 1000 bytes (B)
1 megabyte (MB) = 1000 kilobytes (kB)
1 gigabyte (GB) = 1000 megabytes (MB)
1 terabyte (TB) = 1000 gigabytes (GB)
1 kibibyte (KiB) = 1024 bytes (B)
1 mebibyte (MiB) = 1024 kibibytes (KiB)
1 gibibyte (GiB) = 1024 mebibytes (MiB)
1 tebibyte (TiB) = 1024 gibibytes (GiB)
Higher prefixes can be seen here for the SI system, and here for the IEC system.
Kibibits, mebibits and else also exist, effectively meaning bit multiples at the power of 2, exactly like kibibytes, mebibytes and company. However, the IEC pretty much created this standard just for the sake of it, as it isn’t really useful. Maybe in the future 1024 bit architectures will be called 1 kibibit architectures, but most architectures are far from 1024 bit in any cases.
Also, another notation exists for bits instead of b. Literaly using the full word bit instead, however invariable. The SI prefix is the traditional k, M, G, etc. and the IEC prefix is Ki, Mi, Gi, etc. So this goes like this: Kibit, Mibit, Gibit. The IEC seems to particularily encourage this notation to further distinguish between a byte and a bit traditionnally only being difference by a lower and an upper case.