If you use a computer regularly, you know that every document you write, every music file you listen to and every photo you see is stored as a file. You may also know that a file consists of bytes and that bytes consist of smaller building blocks called bits. But, what are bits and bytes really? How do they work, why do they work that way, and how does storing them create text, sound and images? To find out we have to start with something very basic: counting.
Consider what happens if you count using ten fingers. Once you reach ten you have no more fingers left and thus, when you reach eleven, you remember one times ten and keep up one finger. This same thing happens over and over again as you keep counting upwards. This is also reflected in the way that we count when we write down numbers. Though in that case we use ten symbols instead of ten fingers: 0 up to including 9. Once we reach nine and want to express ten, we no longer have any symbols. So, we remember 1, which for writing means: we shift the number 1 to the left, and start over again with a 0, giving us 10. The difference with finger counting is that for zero we use no fingers at all, whereas when writing numbers we use the symbol 0 to denote zero.
The way we write down numbers is just a convention. The number system that we use most often is the decimal numeral system, because it has, as discussed, ten symbols that we enumerate. Consider that there is nothing stopping us from defining a counting system with only five symbols: 0 up to including 4. In this case we will run out of symbols when we want to express five, and we will have to do the same thing we did before: shift 1 to the left and then start over again. This gives us (again) 10. But since we now count with five symbols, this means that the two symbols 10 actually represent the value five. If you find this confusing, try counting from zero to five using your fingers on only one hand and use the first finger for the zero. Notice that reaching five forces you to remember one and continue counting by resetting your hand to one finger again for zero.
The ten-symbol decimal counting system is just one of an infinite number of possible counting systems. However, there are only several such systems in common usage. Namely, the octal system that uses eight symbols, the hexadecimal system which uses sixteen symbols (letters A through F are used to denote 10 up to 16) and also the binary system, which uses only two symbols: 0 and 1. Having only two symbols is similar to counting with only two fingers. So, how would we count from zero to three using only two symbols? Zero would just be 0, one would just be 1. For two it becomes more complicated. Since we have now run out of symbols we need to shift 1 left and start over with zero, giving us 10. Finally, for three: since we still have a next symbol for the rightmost position, we only have to replace the 0 with a 1 to express three in binary, giving us 11. If we now want to go up to four, we can not increase the existing symbols anymore. Hence, we have to set them to 0, and add a new position with a 1, giving us 100. In the table below we show the values for counting up to including five. If this is not immediately clear: do not worry, a more intuitive explanation follows.
The binary system brings us to our first definition: the binary digit, commonly shortened to ‘bit’. One bit is thus simply a single-digit binary number. Glancing at the table above we see that higher decimal numbers need more bits to be expressed in binary notation. Specifically, to store the number two or three we need two bits, and to store the number four or five we need three bits. The key insight is that by concatenating bits, that themselves can only take two values, we can represent any whole number, no matter how small or large, by using a sufficient amount of bits.
there are only 10 kinds of people in the world: those who understand binary, and those who do notnumeral base joke
An intuitive way to think of bits is as switches that turn on or off values. Take a look at the table below. The first row contains numbers, the second row contains a 0 if the number is ‘off’ and a 1 if the number is ‘on’. Try to work out the decimal value by summing the values of the first row for all the ‘on’ switches.
The second row represents a number (01001010) written in binary notation. The first row consists of powers of two if you consider them from right to left. In fact we can simply rewrite the first row of the table as follows:
Why powers of two? Well, since we have two symbols: 0 and 1. Starting at the right, each step towards the left increases the value by one power of two. If we would have three symbols, each position step would represent a power of three. Consider what happens when we have not two, but ten symbols, as in the discussed decimal system: each step towards the left is then an increase of a power of ten. An example:
If we try to express this as a decimal number we get 1000 + 100 = 1100, which in fact is exactly the same as the on/off switches concatenated (1100), as each switch represents a power of ten.
Back to bits: it may seem as though storing numbers in binary is inefficient, as we need more bits to store higher numbers. However, the number of bits required does not increase linearly, instead it decreases exponentially. To see this, let us write the decimal number ten in binary: 1010. This requires four bits, now a thousand: 11 1110 1000, requires only ten positions despite being a magnitude hundred larger than ten. This in another key insight: we can represent very large numbers with relatively few bits.
You can think of a bit as the smallest possible piece of information. Many things map nicely on the two states that a single bit can have. For example: a logic statement can be either true or false. However, by adding additional bits we can express more states than just two and perform actual arithmetic with them. If we want to add the numbers 1 and 2 in binary we can simply do this by turning on all the switches that are ‘on’ in the representation of both 1 and 2 of the result. If we look at 1 and 2 in the table below, we see that to form 3, the sum of these numbers, we can look at the two rows above 3 and set the bit to 1 if either the bit in the row of 1 OR the row of 2 is one. This is why this is called an OR operation.
There are in fact four of these logical operations we can do on bits:
- OR : turns on the switch in the result if either of the two input switches are on.
- AND : turns on the switch in the result if both of the two input switches are on.
- XOR : turns on the switch in the result if either of the two input switches are on, but not both. This is called an eXclusive-OR.
- NOT : takes one input row and inverts all the switches, turning all zeros to 1 and all ones to 0.
With these four operations we can implement common mathematical operations on whole numbers. A hardware implementation of one of these operations is called a logic gate and your computer has many of them: billions.
Bytes & Text
Now we have some feeling for bits, let us turn to bytes. Computers only work with these binary numbers expressed as groups of bits. So, how do we get from groups of bits to stuff such as text, audio and images? The answer: mappings. You are probably familiar with many systems in the real world where a number ‘maps’ to something else. For example: a telephone number maps to a person or a company, a postal code maps to some area within a city, and in some restaurants a number maps to a specific dish. The same mechanism of mapping a number to something with a specific semantic interpretation is used in computers.
For storing text on a computer we need to map the numbers to the alphabet. There a twenty-six letters in the alphabet. However, if we need to express both upper and lowercase letters, we would need to double that amount: fifty-two. How many bits would that take? The closest power of two is 64 which is , thus we need at least 6 bits. However, we also need to express many other symbols. Take a look at your keyboard and you will find that it contains at least a 100 or so keys, and some combinations of keys can produce yet other symbols. Though, historically six bits were indeed used to store text, over time many more symbols were added. This eventually led to an 7-bit standard known as ASCII, commonly extended with 1 extra bit for additional language-specific symbols. The modern day successor of this is Unicode which can use up to 32 bits per character, allowing for many more symbols. Yet, the first 7-bits of Unicode still map to the same symbols as ASCII.
Early microprocessors were designed to operate on eight bits at a time. Early personal computers used eight bits as well. Since these groups of eight bits form a natural unit we refer to them as a byte. A byte can take distinct numeric values, which in practice means 0 up to including 255. Half a byte, four bits, is sometimes referred to as a nibble, which is a tong-in-the-cheek reference to the larger ‘bite’.
A byte is thus just a group of eight bits. Any meaning that a byte has really depends on what we are expressing. In a text file the bytes are characters, in a music file they are audio samples and in a digital photo they represent the colours of a specific part of the image.
Image files are quite intuitive to understand. If you take a photo and put a grid over it, you can give each small square in the grid a specific colour. Zoom very far into a digital photo, and you will see this grid, the small squares are often referred to as pixels: a portmanteau of ‘picture element’. If a digital photo has more pixels, it contains more information and we say it has a higher resolution.
The colours of pixels can be encoded in bits. Indeed, colours are often expressed in the Red Green Blue (RGB) format. In a commonly used variant of this format each colour is one byte and can take a value from 0 to 255 that maps to the intensity of the colour. So, for each square in the grid we use three bytes: twenty-four bits. For example the colour would represent pure white and is pure black as in these cases all colours are mixed evenly. However, would be bright red, and so on.
If you have ever edited a web page and had to change a colour, you probably noticed that colours can be expressed as a hashtags followed by some combination of characters and numbers, for example: . This takes us back to the discussion of counting systems. The hexadecimal system is another way to represent numbers. For this system we count 0 through 9 and then A through F, giving us 16 possibilities in total. Perhaps you already see what those symbols behind the hashtag mean.
Let us look at . This actually consists of three groups of two symbols. The first group is FF, we know that F maps to 15 in decimal, so the hexadecimal value of FF is . The other two groups are 0. These three groups in this notation are the three colours! The first group is red, with value 255 (), the second is green with a value of 0, and the last is blue, also with value 0 (). Hence, again we have bright red , but now expressed in hexadecimal as .
Modern images take up a lot of storage space when we would just store them in RGB format. A modern camera easily snaps photos at 16 million pixels, for each of these pixels we need to store three bytes. So that’s over 48 million bytes (megabytes) for one photo. Storing video is similarly challenging, as that is essentially a collection of still images in sequence, typically at least twenty-four per second. Fortunately, digital compression techniques exists that can make images and video files much smaller. These techniques take advantage of repetition and patterns within and across images and remove small differences that can not easily be seen by the human eye.
Sound & Samples
How sound is represented digitally is a bit more complex than text and images. To get a better impression of this, imagine you are sitting on a swing. You are swinging right to left from my perspective. Let’s say that I want to take pictures, so I can later play them back as a filmstrip. A choice I have to make is how many pictures I take relative to the time you are swinging.
If it takes you ten seconds to swing back and forth, and I only take a picture once every ten seconds, all pictures would have you frozen in the exact same place, since I am missing the the 9.99 seconds where you are actually swinging. If I take a picture every second, I’d see you swing, but it would look a bit choppy. Taking pictures in more rapid succession would fix this and yield smooth motion. However, the minimum amount of pictures I’d need to snap to be able to see you move is actually longer than one second: half the time it takes for a swing, which would be every 5 seconds. I’d catch you at times 0, 5 and 10, respectively for ‘up to the left’, ‘centered’, and ‘up to the right’, and so forth. Differently worded: I’d need to snap pictures twice as fast as the rate at which you are swinging.
The process of determining how many pictures to take in an audio context is called sampling, and instead of pictures we record the amplitude of the audio signal at specific points in time. The signal consists of a mix of sound frequencies that relate to the pitch of what you are hearing. The speed of the swing is comparable with the highest possible frequency of the audio signal. Frequencies are expressed in Hertz, 1 Hertz = 1 swing per second.
If you imagine many people sitting on swings next to you, going back and forth at different speeds, we’d need to snap pictures so we can catch the fastest one: snapping twice as fast as that speed. This corresponds to the highest frequency for audio: we need to record a certain minimum amount of samples in order to reconstruct the original recording. If our sampling rate is too low, we will not be able to record sounds with a higher frequency. Like with the swings: we need to sample at least twice the rate of the highest frequency we want to be able to record. This is one of the main reasons that telephone audio sounds rather poor: the sampling rate is low: 8000 samples are taken each second, limiting the range of the actual audio to 4000 Hertz. Human hearing can distinguish sounds from 1 up to 20 000 Hertz. This is the reason Compact Disc audio sounds so good: it takes 44 100 samples per second, enabling an actual range up to 22 050 Hertz.
This does not get us the actual bytes yet that we need for audio, which takes us to the other part of representing audio digitally: how precise a value we are going to record for each sample. The more bits we use for a sample, the more accurately we can model the original signal. If we would only use one bit we can only record either no sound, or the loudest possible sound: it’s on or off. With two bits we can record four levels, etc. Compact Disc quality audio records samples with 16 bit resolution, giving possible levels.
Like with video files, sound files also quickly grow large. In Compact Disc quality we need to store two bytes for every sample of which there are many thousands every second. Fortunately, as with video there are compression techniques to reduce the sizes of such files, the most famous of which is undoubtedly mp3 which takes advantage of differences that the human ear can not easily distinguish.
In this post you have learned about counting in the familiar decimal system, but also in the likely less familiar binary counting system. You have also seen how a number raised to a certain exponent relates to the counting system: the decimal system uses ten as the base number and the binary system uses two. These two possibilities act like an on/off switch, and each of these switches is referred to as a bit. You have an understanding of bitwise operations that can be performed to implement basic arithmetic. Finally, we have seen how groups of bits form bytes, and how bytes are often mapped to various things such as characters in text, colours in images and samples in sound. I hope this gives you a better feeling of how the basic primitives of modern computing, bits and bytes, relate to the things you see and read on screens and hear through speakers and headphones.