When I lived at my parent’s place, I was always amazed by how much time my father spend on the daily news. Once during breakfast, usually in the form of a newspaper. A second time during lunch, and a third time by watching the evening news. He must have spent upwards an hour or so every day consuming news. Why did he do that?

In this post I will dig into attention mechanisms. Specifically: where they come from, how they work and what you can do to better cope with them. To get a better grip on this, we will first dive into some history.


In 1833 the leading news paper of New York was the New York Enquirer, costing about six cents. Not very expensive by today’s standards, but for that time it was considered a luxury item. An amount not many ordinary people had to spare. Hence, it is no surprise that someone saw a business opportunity in this [1].

Benjamin Day

That someone was Benjamin Day. In that same year he launched a new newspaper: the New York Sun. By late 1834 his new paper had become the leading one in the city, amassing five thousand daily readers. Why? A copy of the New York Sun cost only one cent, vastly more affordable for the average person back then. His rivals were amazed, how could he produce a newspaper that cheaply, below the price of the cost of printing? One word: advertisement. Day was not in the business of selling a newspaper to his readers, he was in the business of selling the attention of his readers to his advertisers. He did this by mixing news, often news of the sensational kind that easily catches attention, with advertisements.

Advertising of course is nothing new. Though, back then it was mostly just text, reminiscent of personal ads. That quickly changed throughout the years. During the forties and fifties, large billboards became increasingly common. Though our homes were still a sanctuary devoid of much of that visual onslaught, this quickly changed as televisions became commonplace. Advertisements sneaked inside our living room, and that was not the end of it. The last two decades have seen ads move even physically closer: into our very hands, in the form of one of the most effective advertisement delivery devices invented: smartphones.

Britney Spears in a Pepsi television ad

Nevertheless, let’s be honest: no one really wants to see those ads. People do not voluntarily choose to watch advertisements. So, how do we get them to do this anyway?


To understand the mechanism behind this, we go back in history once again. This time to the early nineteen-sixties. Meet B. F. Skinner working, at the time, at Harvard University. Skinner had ambitions to become a writer, but became disillusioned with the craft, and instead ended up becoming an influential figure in behavioural psychology. During the sixties he conducted experiments with animals, and one such experiment focused on pigeons.

B. F. Skinner and his pigeons

In this particular experiment Skinner placed a pigeon in a box. This box contained a small circular disc that could be pecked. In the first condition pecking this disc would release a little food pellet for the pigeon to eat. In the second condition it would sometimes release a food pellet, and sometimes it would not: an unpredictable pattern. What condition do you think made the pigeon learn the behaviour of pecking the disc the fastest? That’s right, it was the second condition, the one where whether or not the behaviour would be rewarded with a food pallet was unpredictable.

Do those results on pigeons translate to humans? What’s the closest analogue we have? That would be gambling. Remember those rows of old ladies sitting at one armed bandits in casino’s? The reason they keep sitting there, gambling away their money, is the exact same reason those pigeons kept hitting that circular disc: a variable reward. Sometimes you get something, sometimes you lose something, and you don’t know which one is coming next.


Now you may be thinking: I don’t gamble, I never go to the casino, so: no risk for me. Well, you may not go to the casino, but there is a game you play every day. I am referring to the game of scrolling through news feeds, responding to notifications on your telephone, and checking if you have any new e-mails. All of these share the same characteristic: sometimes there’s something interesting for you, sometimes there’s not, and you don’t know which one is coming next.

I think it is this variable award that got my father addicted to watching the news so often: even if most of it is not interesting, or something seen or read before on the same day: that’s not the point. If every so often there is something new in there, that’s what kept him coming back for more.

Ads in the mix

Now, think back to the advertisements. Like a news reel, or newspaper, a news feed is the prime example of something that gives you a variable reward. It keeps you scrolling so you can get a small dopamine reward for finding those interesting things. If you would put some advertisements in that news feed, regardless of whether they are interesting or relevant, people would be exposed to them. Giving you the perfect ‘voluntary’ ad delivery mechanism.

A rule of advertising is that the more times people see your ad the better the memorization effect is. That’s the reason why the same commercial is repeated on broadcast television in the same block. It does not matter what you think of the ad, it matters that you start recognizing it and that it starts to feel familiar.

The combination of advertising with other content of mixed relevance to the reader is what drives many modern social media platforms: Facebook, Instagram, LinkedIn, anything with a feed. I hope you see that this is essentially no different from the mechanism Benjamin Day used with the New York Sun to outwit his competition. If you are not paying for it, you can be sure that it is your attention that is being sold to advertisers.


Diving deeper: how would you design a product that exploits this variable reward vulnerability that people have, both in news feeds and beyond? There is a recipe for this that has proven it works time and time again called the hook model. It consists of four steps: trigger, action, reward and investment. Let’s go through these steps by using an example.

The Hook Model [2]

A trigger may be notification sound made by your telephone, a message that then pops-up telling you that you have a new message from someone, and that possibly even shows a part of that message to draw you in further. This is your first little hit of dopamine, similar to what you get when you scroll through a feed.

The action is your response. You open the message, read it and start typing something back. You re-read what you typed, think if it is an appropriate response, and then finally hit the send button.

Now comes the reward. You get a message back, or if you have posted something on a social network, you may get likes or comments on your post. That feels good: getting approval from others also gives you little shots of dopamine.

All of this feels so good, that it is now time for the last step: you invest. You post something new: a picture, a video or a message to get more of that dopamine. Your content is now in the app that you are using, regardless of whether it is Facebook, Instagram or WhatsApp. All use the same basic mechanism, some in more elaborate ways than others.

Since you are now invested, you are more likely to respond to the next trigger, and you go through the loop again, ad infinitum, reinforcing your behaviour. Feels creepy perhaps, but sounds familiar?


Now, you may ask: okay, but is it all that bad? Because: I get a lot of value out of the news and social media. Well, that depends on how you look at it. This graph from the economist shows that the more time people spend on their social media apps, the less happy they are about it.

The findings are based on a time tracking app called Moment. This is admittedly not the easiest to interpret chart, but there are some clear patterns to spot. Most social media score quite poorly in terms of how happy people are with the amount of time they spent on these platforms, for example a full two thirds of Instagram users is unhappy with the amount of time they spent there. Furthermore, we see the more general trend that more time spent translates to less happiness for nearly all apps.

Graphs like this always raise the valid question: what is the direction of causality? Perhaps unhappy people are simply drawn to these platforms and stay there for a longer amount of time, or is it the other way around: does staying on these platforms make people less happy? There seems to be at least some evidence towards this latter conclusion [5].

What can you do?

Going back in time, what would I recommend my father to do in terms of breaking his news watching habit? Well, the first step is awareness. Nowadays, that awareness applies not only to television and news served on the remains of dead trees, but particularly to the kind that comes to you via screens, be it laptops, tablets or smartphones. You have already taken the first step by reading this post, so you are at least aware of the what and the how: good.

In a broader sense: there is genuine concern nowadays for people having short(er) attention spans, and showing addictive behaviours to their social media feeds, all based on this exploitable mass vulnerability to variable rewards [7]. Fortunately, there are also some concrete things that you can do. I’ll leave you with three tips.

Infinite Feed by Attila Kristó

Three tips

Firstly, limit your time scrolling through feeds. Recognize when you are doing it, realize you are reinforcing the behaviour and ask yourself the question: what am I accomplishing by doing this? People far too often grab their phone or tablet, because they don’t dare allow themselves to get bored. Do the opposite: embrace boredom.

Secondly, turn off as many notifications as you can. Particularly all notifications not generated by another human being, by which I mean: either an algorithm, or those not specifically directed to you, like app groups that you are part of.

Thirdly, I realize that it may not be easy, but it really does not hurt to put away your smartphone or tablet for a while and do something else. The best way to cure yourself and others of these reinforced behaviours is simply to stop responding, and have something better to do.

In closing, remember: your life is what you pay attention to, so make sure you pay attention to the right things, those things that really matter to you.


  1. Wu, T. (2016). The Attention Merchants.
  2. Eyal, N. (2013). Hooked.
  3. Holiday, R. (2014). If you watch the news …
  4. Cain, D. (2016). Five things you notice when you quit the news.
  5. Economist (2018). How heavy use of social media is linked to mental illness.
  6. Tigelaar, A. S. (2016). Breaking free: how to rewire your brain.
  7. Center for Humane Technology (2018). App Ratings.


Best Documentaries 2018

I like documentaries as much as I do (fiction) movies, particularly if they make me think, are well crafted and have a coherent narrative. Here are my recommendations based on documentaries that I have watched in the past year.

1. Avicii: True Stories

Avicii: True Stories focuses on the Swedish DJ Tim Bergling, better known as Avicii. In contrast to the uplifting music he made, this film highlights the darker side of the trade-off of becoming famous. An artist that cares about music that gets dropped into a shark tank where the sharks care about money, and money only. If anything it serves as a big warning to not make your passion your profession. Contrast that with popular advice! It is similar in spirit to Amy and Cobain: Montage of Heck, taking the same personal perspective. This makes it all the more impactful. Sadly, like Amy Winehouse and Kurt Cobain, Avicii also joined the list of young people that took their own life. Watching this, one can’t help but think: wouldn’t he have been better off being less famous?

2. Generation Wealth

Documentary maker Lauren Greenfield explores excessive wealth with a surprisingly personal perspective. Make no mistake: much of Generation Wealth is Lauren’s self-reflection, her choices in life, relationship with her family and participation in the ‘rat race’. That may sound corny, but for this subject matter: it really does work. If anything, she tries to understand her subjects through her own lens. By doing so, she reveals truths about human nature that may not be surprising in the end, but that are good to be reminded of.

3. HyperNormalisation

If you don’t mind something a bit longer that uses its time to make you think, then look no further than this 2016 documentary. Expertly crafted by Adam Curtis, HyperNormalisation leads us through a large part of modern history to make one key point about the world: the difference between how we perceive it versus how things really are, and the consequences of this. Curtis interleaves major global developments: technological, political and cultural, to construct a tapestry of arguments to support his conclusions. Whether you agree with these or not does not really matter, as what he offers is a perspective that leaves you pondering for days about its deeper implications.


Representing Data: Bits and Bytes

If you use a computer regularly, you know that every document you write, every music file you listen to and every photo you see is stored as a file. You may also know that a file consists of bytes and that bytes consist of smaller building blocks called bits. But, what are bits and bytes really? How do they work, why do they work that way, and how does storing them create text, sound and images? To find out we have to start with something very basic: counting.

Decimal Counting

Consider what happens if you count using ten fingers. Once you reach ten you have no more fingers left and thus, when you reach eleven, you remember one times ten and keep up one finger. This same thing happens over and over again as you keep counting upwards. This is also reflected in the way that we count when we write down numbers. Though in that case we use ten symbols instead of ten fingers: 0 up to including 9. Once we reach nine and want to express ten, we no longer have any symbols. So, we remember 1, which for writing means: we shift the number 1 to the left, and start over again with a 0, giving us 10. The difference with finger counting is that for zero we use no fingers at all, whereas when writing numbers we use the symbol 0 to denote zero.

Counting Conventions

The way we write down numbers is just a convention. The number system that we use most often is the decimal numeral system, because it has, as discussed, ten symbols that we enumerate. Consider that there is nothing stopping us from defining a counting system with only five symbols: 0 up to including 4. In this case we will run out of symbols when we want to express five, and we will have to do the same thing we did before: shift 1 to the left and then start over again. This gives us (again) 10. But since we now count with five symbols, this means that the two symbols 10 actually represent the value five. If you find this confusing, try counting from zero to five using your fingers on only one hand and use the first finger for the zero. Notice that reaching five forces you to remember one and continue counting by resetting your hand to one finger again for zero.

Binary Counting

The ten-symbol decimal counting system is just one of an infinite number of possible counting systems. However, there are only several such systems in common usage. Namely, the octal system that uses eight symbols, the hexadecimal system which uses sixteen symbols (letters A through F are used to denote 10 up to 16) and also the binary system, which uses only two symbols: 0 and 1. Having only two symbols is similar to counting with only two fingers. So, how would we count from zero to three using only two symbols? Zero would just be 0, one would just be 1. For two it becomes more complicated. Since we have now run out of symbols we need to shift 1 left and start over with zero, giving us 10. Finally, for three: since we still have a next symbol for the rightmost position, we only have to replace the 0 with a 1 to express three in binary, giving us 11. If we now want to go up to four, we can not increase the existing symbols anymore. Hence, we have to set them to 0, and add a new position with a 1, giving us 100. In the table below we show the values for counting up to including five. If this is not immediately clear: do not worry, a more intuitive explanation follows.



The binary system brings us to our first definition: the binary digit, commonly shortened to ‘bit’. One bit is thus simply a single-digit binary number. Glancing at the table above we see that higher decimal numbers need more bits to be expressed in binary notation. Specifically, to store the number two or three we need two bits, and to store the number four or five we need three bits. The key insight is that by concatenating bits, that themselves can only take two values, we can represent any whole number, no matter how small or large, by using a sufficient amount of bits.

there are only 10 kinds of people in the world: those who understand binary, and those who do notnumeral base joke

An intuitive way to think of bits is as switches that turn on or off values. Take a look at the table below. The first row contains numbers, the second row contains a 0 if the number is ‘off’ and a 1 if the number is ‘on’. Try to work out the decimal value by summing the values of the first row for all the ‘on’ switches.

Solution (click to expand)

128 * 0 + 64 * 1 + 32 * 0 + 16 * 0 + 8 * 1 + 4 * 0 + 2 * 1 + 1 * 0 = 64 + 8 + 2 = 74



The second row represents a number (01001010) written in binary notation. The first row consists of powers of two if you consider them from right to left. In fact we can simply rewrite the first row of the table as follows:


Why powers of two? Well, since we have two symbols: 0 and 1. Starting at the right, each step towards the left increases the value by one power of two. If we would have three symbols, each position step would represent a power of three. Consider what happens when we have not two, but ten symbols, as in the discussed decimal system: each step towards the left is then an increase of a power of ten. An example:

10^3 = 100010^2 = 10010^1 = 1010^0 = 1


If we try to express this as a decimal number we get 1000 + 100 = 1100, which in fact is exactly the same as the on/off switches concatenated (1100), as each switch represents a power of ten.

Bitwise Operations

Back to bits: it may seem as though storing numbers in binary is inefficient, as we need more bits to store higher numbers. However, the number of bits required does not increase linearly, instead it decreases exponentially. To see this, let us write the decimal number ten in binary: 1010. This requires four bits, now a thousand: 11 1110 1000, requires only ten positions despite being a magnitude hundred larger than ten. This in another key insight: we can represent very large numbers with relatively few bits.

You can think of a bit as the smallest possible piece of information. Many things map nicely on the two states that a single bit can have. For example: a logic statement can be either true or false. However, by adding additional bits we can express more states than just two and perform actual arithmetic with them. If we want to add the numbers 1 and 2 in binary we can simply do this by turning on all the switches that are ‘on’ in the representation of both 1 and 2 of the result. If we look at 1 and 2 in the table below, we see that to form 3, the sum of these numbers, we can look at the two rows above 3 and set the bit to 1 if either the bit in the row of 1 OR the row of 2 is one. This is why this is called an OR operation.

1 =0001
2 =0010
3 =0011


There are in fact four of these logical operations we can do on bits:

  • OR : turns on the switch in the result if either of the two input switches are on.
  • AND : turns on the switch in the result if both of the two input switches are on.
  • XOR : turns on the switch in the result if either of the two input switches are on, but not both. This is called an eXclusive-OR.
  • NOT : takes one input row and inverts all the switches, turning all zeros to 1 and all ones to 0.

With these four operations we can implement common mathematical operations on whole numbers. A hardware implementation of one of these operations is called a logic gate and your computer has many of them: billions.

Bytes & Text

Now we have some feeling for bits, let us turn to bytes. Computers only work with these binary numbers expressed as groups of bits. So, how do we get from groups of bits to stuff such as text, audio and images? The answer: mappings. You are probably familiar with many systems in the real world where a number ‘maps’ to something else. For example: a telephone number maps to a person or a company, a postal code maps to some area within a city, and in some restaurants a number maps to a specific dish. The same mechanism of mapping a number to something with a specific semantic interpretation is used in computers.


For storing text on a computer we need to map the numbers to the alphabet. There a twenty-six letters in the alphabet. However, if we need to express both upper and lowercase letters, we would need to double that amount: fifty-two. How many bits would that take? The closest power of two is 64 which is 2^6, thus we need at least 6 bits. However, we also need to express many other symbols. Take a look at your keyboard and you will find that it contains at least a 100 or so keys, and some combinations of keys can produce yet other symbols. Though, historically six bits were indeed used to store text, over time many more symbols were added. This eventually led to an 7-bit standard known as ASCII, commonly extended with 1 extra bit for additional language-specific symbols. The modern day successor of this is Unicode which can use up to 32 bits per character, allowing for many more symbols. Yet, the first 7-bits of Unicode still map to the same symbols as ASCII.

Processing Units

Early microprocessors were designed to operate on eight bits at a time. Early personal computers used eight bits as well. Since these groups of eight bits form a natural unit we refer to them as a byte. A byte can take 2^8 = 256 distinct numeric values, which in practice means 0 up to including 255. Half a byte, four bits, is sometimes referred to as a nibble, which is a tong-in-the-cheek reference to the larger ‘bite’.

A byte is thus just a group of eight bits. Any meaning that a byte has really depends on what we are expressing. In a text file the bytes are characters, in a music file they are audio samples and in a digital photo they represent the colours of a specific part of the image.


Image files are quite intuitive to understand. If you take a photo and put a grid over it, you can give each small square in the grid a specific colour. Zoom very far into a digital photo, and you will see this grid, the small squares are often referred to as pixels: a portmanteau of ‘picture element’. If a digital photo has more pixels, it contains more information and we say it has a higher resolution.


The colours of pixels can be encoded in bits. Indeed, colours are often expressed in the Red Green Blue (RGB) format. In a commonly used variant of this format each colour is one byte and can take a value from 0 to 255 that maps to the intensity of the colour. So, for each square in the grid we use three bytes: twenty-four bits. For example the colour (R=0, G=0, B=0) would represent pure white and (R=255, G=255, B=255) is pure black as in these cases all colours are mixed evenly. However, (R=255, B=0, G=0) would be bright red, and so on.


If you have ever edited a web page and had to change a colour, you probably noticed that colours can be expressed as a hashtags followed by some combination of characters and numbers, for example: \#\textrm{FF0000}. This takes us back to the discussion of counting systems. The hexadecimal system is another way to represent numbers. For this system we count 0 through 9 and then A through F, giving us 16 possibilities in total. Perhaps you already see what those symbols behind the hashtag mean.

Let us look at \#\textrm{FF0000}. This actually consists of three groups of two symbols. The first group is FF, we know that F maps to 15 in decimal, so the hexadecimal value of FF is 15 * 16^1 + 15 * 16^0 = 255. The other two groups are 0. These three groups in this notation are the three colours! The first group is red, with value 255 (\#\boxed{\mathbf{FF}}0000), the second is green with a value of 0, and the last is blue, also with value 0 (\#\textrm{FF}00\boxed{\mathbf{00}}). Hence, again we have bright red (R=255, G=0, B=0), but now expressed in hexadecimal as \#\textrm{FF0000}.

Practical Considerations

Modern images take up a lot of storage space when we would just store them in RGB format. A modern camera easily snaps photos at 16 million pixels, for each of these pixels we need to store three bytes. So that’s over 48 million bytes (megabytes) for one photo. Storing video is similarly challenging, as that is essentially a collection of still images in sequence, typically at least twenty-four per second. Fortunately, digital compression techniques exists that can make images and video files much smaller. These techniques take advantage of repetition and patterns within and across images and remove small differences that can not easily be seen by the human eye.

Sound & Samples

How sound is represented digitally is a bit more complex than text and images. To get a better impression of this, imagine you are sitting on a swing. You are swinging right to left from my perspective. Let’s say that I want to take pictures, so I can later play them back as a filmstrip. A choice I have to make is how many pictures I take relative to the time you are swinging.

If it takes you ten seconds to swing back and forth, and I only take a picture once every ten seconds, all pictures would have you frozen in the exact same place, since I am missing the the 9.99 seconds where you are actually swinging. If I take a picture every second, I’d see you swing, but it would look a bit choppy. Taking pictures in more rapid succession would fix this and yield smooth motion. However, the minimum amount of pictures I’d need to snap to be able to see you move is actually longer than one second: half the time it takes for a swing, which would be every 5 seconds. I’d catch you at times 0, 5 and 10, respectively for ‘up to the left’, ‘centered’, and ‘up to the right’, and so forth. Differently worded: I’d need to snap pictures twice as fast as the rate at which you are swinging.


The process of determining how many pictures to take in an audio context is called sampling, and instead of pictures we record the amplitude of the audio signal at specific points in time. The signal consists of a mix of sound frequencies that relate to the pitch of what you are hearing. The speed of the swing is comparable with the highest possible frequency of the audio signal. Frequencies are expressed in Hertz, 1 Hertz = 1 swing per second.

If you imagine many people sitting on swings next to you, going back and forth at different speeds, we’d need to snap pictures so we can catch the fastest one: snapping twice as fast as that speed. This corresponds to the highest frequency for audio: we need to record a certain minimum amount of samples in order to reconstruct the original recording. If our sampling rate is too low, we will not be able to record sounds with a higher frequency. Like with the swings: we need to sample at least twice the rate of the highest frequency we want to be able to record. This is one of the main reasons that telephone audio sounds rather poor: the sampling rate is low: 8000 samples are taken each second, limiting the range of the actual audio to 4000 Hertz. Human hearing can distinguish sounds from 1 up to 20 000 Hertz. This is the reason Compact Disc audio sounds so good: it takes 44 100 samples per second, enabling an actual range up to 22 050 Hertz.


This does not get us the actual bytes yet that we need for audio, which takes us to the other part of representing audio digitally: how precise a value we are going to record for each sample. The more bits we use for a sample, the more accurately we can model the original signal. If we would only use one bit we can only record either no sound, or the loudest possible sound: it’s on or off. With two bits we can record four levels, etc. Compact Disc quality audio records samples with 16 bit resolution, giving 2^{16} = 65536 possible levels.

Practical Considerations

Like with video files, sound files also quickly grow large. In Compact Disc quality we need to store two bytes for every sample of which there are many thousands every second. Fortunately, as with video there are compression techniques to reduce the sizes of such files, the most famous of which is undoubtedly mp3 which takes advantage of differences that the human ear can not easily distinguish.


In this post you have learned about counting in the familiar decimal system, but also in the likely less familiar binary counting system. You have also seen how a number raised to a certain exponent relates to the counting system: the decimal system uses ten as the base number and the binary system uses two. These two possibilities act like an on/off switch, and each of these switches is referred to as a bit. You have an understanding of bitwise operations that can be performed to implement basic arithmetic. Finally, we have seen how groups of bits form bytes, and how bytes are often mapped to various things such as characters in text, colours in images and samples in sound. I hope this gives you a better feeling of how the basic primitives of modern computing, bits and bytes, relate to the things you see and read on screens and hear through speakers and headphones.