A golden ratio for line charts with truncated y-axis

While we hold that bar charts should start at zero, there is no such expectation for line charts. But you’ll always find someone on social media to denounce a “dishonest” line chart that enhances the variations by truncating the y-axis. There is usually ample pushback, but it often leads to the question: Where should it start then?

The simple answer should be: low enough. As I wrote in this tweetstorm (I should have blogged), the scale is a matter of context and it’s hard to make a rule without knowing the data and the reality. It should be low enough so that the perception of the graph is close to that of the reality it seeks to represent.

But that sounds too vague and people love rules, so I’m turning it into one:

The bottom third of the plot area should remain empty for a line chart that doesn't start at zero.

Now let’s see why.

Here is a data set with high numbers and relatively low variability plotted on a line chart with a scale starting at zero.

2 Zero baseline.png

Seems like not much happened in the last 20 years because the full scale hides relatively small variations (which may be the point, but that’s for another time). It is nearly useless as a visualization and a table would tell us more.

Now let’s go to the other extreme and set the minimum value of the scale at the minimum value of the data.

1 Down to zero.png

At first glance and not in a post about scales, you would think that the widget sales crashed to zero before going back up. That is, until you read the scale and realized that it starts at 55 000. This misperception is a kind of Stroop effect. Based on your experience of reality, the baseline of a chart appears to be ground zero when it is visually used like this.

We need a middle point. A range that is small enough to show variations, but with a starting point that is low enough to avoid the misperception that the data reached zero. That’s where I suggest that we design the range and the starting point so that no data appear in the bottom third of the plotting area. It looks like this.

3 Voila rule.png

Why one third? Perhaps because it’s the golden ratio, where the empty space at the bottom is one half of the graphing space at the top. It also corresponds to the rule of thirds in visual arts. By following this rule, two thirds of the plotting area are dedicated to the data and one third to the secondary fact that the line is floating into space far above the ground. I believe that data visualization has a lot to learn about older arts, languages and other practices and this just might be one example.

And finally, according to a few quick tests, it seems to be the default on MS Excel, so maybe they took it somewhere (although it sometimes truncates the y-axis for bar charts too).

Compare this with an empty space at 20% of the plotting area (which was my initial suggestion before I started testing and thinking it through).

5 20 percent.png

It is a matter of judgement, but it does appear that the line veers very close to the bottom, like a plane that’s flying too low. The eye is drawn to the baseline and the perception of ground zero is triggered again. One third seems to preserve enough space so that our eyes don’t go to the baseline and connect it to the data line.

Now, I would need 37 random students from the University of Austin to test this and truly turn it into an unimpeachable truth, but in the meantime we can say that we apply the golden ratio.


Now with annotations and calculations.

Here is the formula to calculate the value of the scale.

Formula text.png

Expressed in algebra:

 
Formula.png
 

And here is the minimum value of the scale (b) isolated:

 
formula simplified.png
 

In my example, it works like this:

 
formula simplified applied.png
 


So my scale will end at 57,000 and start at 54,000.

Thanks to my friend Robert Couillard, Eng., for isolating b more elegantly than the calculator did.


Thanks to Lisa Charlotte Rost for prompting me to think about this a little further and write about it when I tweeted something in jest in response to a Datawrapper update.

Redesigning an important graph about a carbon tax

Data visualisation is a language for humans. Computers and machines are fine with raw data. On the other hand, humans are often interested in data simply because it is visualized. Many will understand the underlying phenomenon — a trend, an outlier, a missed target — when they see it in a graph.

Humans speak data visualization from birth and through learning. There are right and wrong ways of doing it. When trying to communicate, it’s as important to know what works and what doesn’t, just as it is with any other language.

Which brings me to this chart, included in a report from Resources from the Future, an independent, nonprofit research institution in Washington, DC.

 Page 53

Page 53

I became aware of it when this tweet appeared in my timeline. It’s not from the report author, but from another climate specialist that marvels at the finding he sees in the graph.

One look at the graph and I knew it would be hard to decipher. Also, it looks like it was done quickly as it’s just a quick Excel graph with all the default features. This upset me because it seemed an important piece of information and this issue is very close to my heart.

Here is what’s wrong with this graph, in no particular order.

Appearance

The look of the graph is the first thing that made me realize something was not quite right. It uses the Excel defaults, with some drop shadows. It tells me that little care has been put into preparing this graph for publication. When reaching out to an external audience with the goal of attracting and retaining their attention, and to be taken seriously, aesthetics is not a luxury, it’s a step you can’t miss.

Labelling

It took me a while to realize that Q1-Q2-Q3-Q4-Q5 do not represent quarters. Our natural instinct is often to assume that the horizontal axis represents time, until proven otherwise. Here, Q1 means Quantile 1, but does Q1 means the top quintile or the bottom quintile? For people who do not study economics, it may not be obvious. This could be made clear by labeling the Q1 “Poorest” and Q5 “Richest”.

Q1 Q2.png

Title

Noah Kaufman had to translate the graph for his Twitter audience in part because the graph did not make clear what it was showing. The title is purely descriptive, something newspapers would never do since it’s their job to communicate clearly, even in their reporting sections.

Title.png

Here it seems that the title should be something more analytical like “The poorest will benefit from any of the measures”, to follow Noah Kaufman’s analysis.

Annotations

What annotations? There are no annotations to help us understand what’s going on, no pointers at the relevant parts of the graph for instance. Annotations are especially important in graphs that do not make it clear where we should be looking for what.

Vertical text.png

Text orientation

Research suggests that it takes twice the time to read vertical text, over horizontal. But the real issue is that we simply don’t read vertical text. Our eyes glaze over these meaningless shapes in a way that they can’t when text is horizontal. The vertical axis has a vertical label and it has taken me several attempts and the resolve to understand this graph before I truly read and understood it.

Colors

Colors are not neutral. Green is associated with positive meaning, like money, growth, and ecology. Red’s connotations in Western culture are negative: blood, violence, losses (and love, yes). It should not be used randomly, but in the present case it simply identifies one of the approaches (“Payroll”) that is no better or worse than the other three. It creates a Stroop effet where we initially assume something before having to discard it.

Legend colors.png

Horizontal axis

One of the main features of this graph is to differentiate between positive and negative impacts. The baseline should be a strong visual feature so that the eye immediately sees where it’s crossed. In reality, it’s barely differentiated from the grid. The shadows obscure it further.

Horizontal axis.png
Vertical axis.png

Vertical axis

The scale is inconsistent: it starts with a 0.25% interval and then moves to a 0.5% interval. It’s best to use consistent and intuitive increments. Another interpretation is that the scale has increments of 0.5% (-0.25 to 0.25% to 0.75% etc.) but that the zero baseline is completely ignored, which is not much better.


Redesign

Here is my suggested redesign, with the caveat that I did not speak with the authors, nor have I read the entire report in details. Also, I couldn’t find the data so I eyeballed it from the original graph.

A few observations:

  • Font: Dax

  • Colors: Wes Anderson palette from a scene in Hotel Chevalier.

  • Green for positive values, yellow for negative ones help the eyes immediately identify this crucial distinction.

  • With a panel chart, I like to use a faint background for the graph areas to help the eye differentiate them.

  • There are no annotations because they don’t seem necessary now that the chart is clearer.

  • The horizontal axis does not carry the label because it is mentioned in the graph’s subtitle.

  • Using horizontal bar charts gives space for the labels, without having to repeat them, nor use different colors for the different measures.

  • I would have preferred to use a vertical layout, with the graph for the poorest quintile at the bottom, because that’s how we think of income groups (top 1%, bottom 50%, etc) but it put the most important graph at the bottom, so it was not worth it.

  • It was all created in Excel; no need for fancy tools.

Many countries are considering a carbon tax to address climate change. In fact, Canada is embarking on a major political debate about a carbon tax and its distributional impact. One year ahead of the election, the party in power appears willing to bet the house on a measure that might be good for the environment but politically controversial. This type of communication can make a difference in how the public understands what they are voting about.

The latest report from the IPCC makes it clear that the future and safety of millions is at risk because humanity is failing to take decisive action to address climate change. The work that RFF and the likes does is tremendously important and helpful. This issue deserves to be explained and promoted with the same care that we, silly humans, put in advertising cars.

Refonte graphique: Passer du chiffre à la réalité

Les données ne sont pas que des chiffres.

Alors qu’un chiffre est qu’une abstraction, une donnée est la quantification d’une réalité ou d’une idée. On pense à vingt-cinq pour cent de femmes, à un million de dollars, à 130 km/h.

La visualisation de données est donc dépendante du sujet que représentent les données. Elle doit donner forme à cette réalité et respecter le sujet. Le designer d’information doit donc prendre en considération la nature de la donnée lorsqu’il fait des choix de design.

Prenons par exemple le graphique ci-dessous tiré d’un article de Radio-Canada sur la proportion d’hommes parmi les enseignants du primaire au Québec. Il s’agit d’une représentation correcte des chiffres qui sous-tendent le graphique. Pourtant, quelque chose manque, car le visuel ne renforce pas le message, soit que bien peu d’hommes enseignent au primaire. Il ne dit qu’une chose, soit que la tendance est assez stable, ce que l’article traite comme une information secondaire.

 

Ce qu’il manque est une échelle qui mettrait en perspective visuelle cette proportion d’hommes, qui la mettrait en contexte en gardant en tête la réalité que représente le chiffre. La refonte ci-dessous met l’accent sur la petite taille du 12%, puisqu’il s’oppose aux 88% de femmes.

La même situation se produit plus loin dans l’article où un graphique à ligne brisée similaire rapporte le pourcentage d’étudiants masculins en enseignement. Ironiquement, la ligne apparaît plus haute dans le graphique parce que l’échelle est moindre, alors que le message est au contraire que cette proportion est encore moindre que celle des enseignants en poste. Au moins cette fois, la stabilité est mentionnée dans l’article comme une donnée importante.

L’adoption d’un graphique à colonnes empilées règle encore là le problème, mettant en exergue la petitesse de la proportion d’étudiants masculins, tout en utilisant la même échelle que l’autre graphique (0-100%), montrant donc que cette proportion est encore moindre. Remarquez aussi qu’en gardant la largeur des colonnes à peu près similaire, on indique automatiquement à l’auteur que ce graphique contient plus de données que le précédent, ce qui n’était pas visible dans l’original.

Les deux graphiques originaux de Radio-Canada n’étaient pas faux. Mais ils traitaient les données comme de simples chiffres, obscurcissant leur message et la réalité qu’elles représentent.

Eyeo 2018: Conferencing in the age of the Internet

Holding my plate, I spotted a participant eating alone. “Thanks for saving me a seat”, I said as I sat with him. As we engaged in a lively conversation about online groceries, what’s recyclable and, of course, our jobs and the conference, four people sat next to us looking either at their phone or their computer. 

All the talks from Eyeo Festival 2018 will be available online. Why come here and not engage?

This is a tech conference unlike many others as questions of ethics, bias, inclusion and impact are brought up in a large proportion of sessions (one exception that stood out for me was that of David Ha on neural networks).

Setting the tone, Manoush Zomorodi’s keynote shed a light on the ways technology can worsen or enable our worst human traits. Our mobile phones can turn into time-suck and not surprisingly her projects found a segment of the population with an appetite for relief. One could sense agreement in the crowd, but maybe it was guilt. 

Eyeo is a special moment in time. It’s one of the most hyped conferences in several fields and it sells out quickly. Yet, so many of us stand in the middle of this rare mix of people, looking at our phones. Having conversations with people at home. Keeping strangers at a safe distance on Twitter. Watching or interacting mildly with acquaintances on Facebook. Plunging back into work on Slack, as if our brains didn’t remain there afterwards.

I asked one participant what she thought of the previous talk that we had both attended: “I zoned out. I was thinking about work…” she confessed.

As with the joke about the aliens, confused about who’s the master between the dog or the human picking up the poop, an observer could wonder who’s in control: technology or the user.

The irony of course is that that power was the underlying question of so many talks: Who holds the power in tech? How is it yielded? To the benefit of who?

Jane Friedhoff turned the power fantasies on their head in her games, defining the audience not as a benevolent majority gracefully willing to empathize, but as the members of the opressed group that needs catharsis. To redistribute power, Matt Mitchell teaches cryptography to African American populations accustomed to be monitored. Meredith Whittaker conveyed a sense of urgency about the biases in the data we feed to our new overlords of artificial intelligence. 

In the description of their talk, Dynamicland lay it plainly: “Increasingly, working on a computer isolates us more than it connects us”. But it doesn’t have to be like this, so they take the technology out of the computers and into the real world. No more screens: let the humans share the space with technology. And it works. 

The Eyeo organizers have gone to great lengths to make the experience of attending more real and less virtual. The workshops on Monday were very interactive, forcing the participants to get to know one another from the beginning. The delightful personalized button designed by Giorgia Lupi based on our answers to a survey were playful conversation starters. The program of the conference printed on the back of our name tags gave us one less excuse to pull out our phones to check the schedule and then slip into email or social media.

The Eyeo app created a private space for the participants to connect, away from the chaos of Twitter. There was no live-tweeting function, thank goodness, so the speakers were given the full 45 minutes to communicate one on one with each member of their audience before we have our opinions distracted and shaped by the perceptions of others. Both sides deserved this time.

The app was also the place to organize meetups of people with shared interests, to find the show and tell sessions of the participants, or to organize spontaneous dinner plans. All ways to ricochet on the virtual and back into the real world.

Efforts to reach out to strangers were richly rewarded. I was energized by the enthusiasm of Hannah for data visualization, seemingly unaware of her talent as we discussed her recent piece. How can I not be short on time when discussing with Kim, an architect who now teaches board game design? It was exciting to see the possibilities in Jamie’s project of moving to New York City. Did I even scratch the surface with Anni, a coding artist from Mexico who briefly lived in South Korea? How much more could I have learned from Brian’s experience of leading a dataviz team? I could go on.

I walked up to Amanda Cox, Casey Reas, Giorgia Lupi and Stephanie Posavec because they give me inspiration and energy and I wanted to gather some of it live but also to give back just a little by thanking them in person. I’m taking the memory of our interactions home with me.

So many speakers have challenged us to think about how the tech we create distributes power amongst owners and users. We also need to think as users about how much power we yield to tech. As I looked around at people standing side by side, staring at their phones, I asked myself: What will it take?

The upward climb of Max Verstappen in Monaco

It doesn't matter if you kick a ball, ride a horse, dance with skates or race down a hill: all sports tell the same stories. The underdog. The comeback. The dynasty. The near-miss. The feuds. The rivalry. To get interested in any sport, get to know its stories.

One such story is the young prodigy. A new competitor, often young, who comes with tremendous talent. In Formula 1 racing, it is the recent story of Max Verstappen. The son of a former Formula 1 driver (a whole other story), he was the youngest F1 driver in history, at seventeen and a half years old. I pass the mic to Wikipedia:

He is also the youngest driver to lead a lap during a Formula One Grand Prix, youngest driver to set the fastest lap during a Formula One Grand Prix, youngest driver to score points, youngest driver to secure a podium and youngest Formula One Grand Prix winner in history.
— Wikipedia

He's generating a lot of excitement, needless to say. His aggressive style of driving is responsible both for his successes and failures and it will be interesting to see if he manages to adjust.

At the most prestigious grand prix of the season, in Monaco, he made one such mistake and crashed his car during qualifications, meaning that he had to start from the back of the grid. This is very bad news in Monaco where passing is notoriously difficult because it takes place in the middle of a city. 

This is what makes the performance of Max Verstappen so interesting. He started in 20th position and ended in 9th, meaning that he scored points towards championship, which is very important for a team and a career.

The official graph of the race prepared by the Fédération internationale de l'automobile, overseeing F1, does not really show this, even if it shows all the data (as with many bad graphs). Verstappen is the electric blue line, but its progression gets lost in the traffic.

 Lap Chart by FIA

Lap Chart by FIA

It seemed like removing some information irrelevant to the story at hand would help to make it more visible. For instance, we are here interested in a single driver. Also, we are only comparing the start and finish positions.  It seemed like a slopegraph would do the job just well, as it conveys the idea of ranking clearly, on a vertical axis. I used the colours of Verstappen's racing team, Red Bull, because they are bright and attractive and that this graph is about him.

Picture1.png

I worked quite a bit on the title because it was going to give the spin to the graph, to reveal what's interesting about it. The result is a title that both conveys the scale of the challenge and the boldness of Verstappen. Then my subtitle is more descriptive, getting at the point.

I did not write the positions of any other driver. The line for Verstappen is very contrasting, in dynamic red. It passes over several lines and its angle is steeper than any other line. It conveys a very simple message: he started in the back and made it unusually high in the ranks. Also, the horizontal lines of the top 6 racing drivers suggest that it's a race where the starting order tends to determine the finishing order.

It turned out that Cole Nussbaumer chose slopegraphs for her #swdchallenge a few days later, so that's my submission.