What happened when I used an inclusion rider at a data conference

This is the blog post that I wish I would have found when I got invited to speak at a data conference and decided to send them an inclusion rider for women and minorities. I hope this is how you found it.

(Also, obviously, I’m virtue signalling and if that bothers you, you can stop reading here.)

I was quite happy to be invited to speak about data visualization in front of a large crowd (turned out to be about around 150 people). It had been my goal recently to speak beyond my training classes, to be recognized not just for my expertise, but for my capacity to inspire others.

What I’m trying to say is that this was valuable to me. I was a tad hesitant to risk it with an inclusion rider — I’m no celebrity for whom a conference would bend over backwards. But I knew I couldn’t live with myself if I didn’t even try.

Writing the inclusion rider

I had been inspired by the discussions in the film industry, but also by those in my field, notably by Jon Schwabish (here’s an insightful blog post about his attempt to foster diversity and inclusion at a conference). So on the same day that I accepted the invitation, I started looking around for resources with a call for examples on Twitter.

I did not find a ready-made letter, so I wrote my own. I was inspired by that of Kyle McDonald and the Annenberg Inclusion Initiative of the University of Southern California School For communication and Journalism that provide a template (PDF) for the film industry.

My main challenge was to ask clearly enough without coming across as imposing on them. Perhaps I had every reason to be forceful because it’s the right thing, but I could only go as far as saying that this is important to me as I hoped it was to them, and I would like to know what they were doing about it. I drew my line at no more than 75% white man in the speakers line up, an easy threshold to clear given that they already had 26% women or minorities the previous year.

I shared my idea and draft with people close to me and all were supportive, except for one person who felt it is detrimental to open debates to take identity into account. We had a productive discussion where we saw better one another’s perspective and I’m happy to report that I did not lose this friend. Others were new to the idea and thought it was interesting and a step in the right direction. One person asked if I had enough star power to make such demands, and another wanted to know if I should be clearer in my letter as to the consequences if the conference was not diverse enough for my taste.

You can read my inclusion rider here, in English and in the original French. It took me a couple of hours to put it together.

Discussions with the organizers

I sent the letter by email the day after I accepted the invitation. A week later, I had a call with an organizer to discuss the content of my talk. I brought up the inclusion rider at the end and he explained that they shared my concern and that a woman on the organizing committee was forceful about it. I thanked him and said that I would be keeping up with their efforts.

About a week later, I received an email response from another organizer. She thanked me for bringing up the issue and said that they were inspired to see speakers take concrete steps. She went on a tangent about other events organized by the same people that were promoting women in IT. About the conference at hand, she explained that they had already 25% of women speakers and that 50% of their pending invitations were with women. She also mentioned that they bring up the Women in Tech Manifesto with their sponsors (IT companies), encouraging them to send as many women representatives as possible.

I was pleased that they had taken the time to think about it and reply at length. I was a tad disappointed that they were still only at 25% women and that they did not bring up the question of minorities. I wanted to reply as such to them, but after hesitating on the right approach, it slipped my mind.

tarte.png

Since my talk was about data visualization, I decided to use the data about the final gender distribution of the speakers for one of my slides. Turns out that it reached almost 32% of women, which is an improvement over the previous year, and does not count the representation of minorities — which is harder for me to measure as an outsider.

While at the conference, I discovered that my letter had started a conversation among the organizing committee. They had questioned how much they do to include women and it led to the creation of a panel on women in IT. The panel was comprised of four women, lasted 35 minutes and was held in the largest conference room of the event, under the theme of “inspiring conferences”. I was not able to attend unfortunately. One organizer also said that he, also a white male, would consider having an inclusion rider next time he’s invited to speak.

An assessment of the experience

I’m a person with very little power, not a celebrity and not an A-lister anywhere but in my family’s heart. Yet, I’m a white man. By bringing this up, I showed that this is an issue that bothers more than the people that are excluded.

I’m satisfied with the outcome, especially that it started a conversation among the committee members leading to tangible action. I want to praise the Datavore team for their openness and how they reacted in general. I did not take it for granted when I sent my letter.

I could have done better. I did not go very far in my inclusion rider. I simply asked them to match their performance of the previous year and I did not follow up on the question of the minorities. Another issue of course is that this is my story, that of a white man again, and that the focus should be on the missing women and minorities, not me.

Still, I hope though that this post can inspire and inform others who may have heard of inclusion riders and wondering if it is something that they can do to contribute to improving diversity and inclusion in their field. My advice is this: just do it.

A golden ratio for line charts with truncated y-axis

While we hold that bar charts should start at zero, there is no such expectation for line charts. But you’ll always find someone on social media to denounce a “dishonest” line chart that enhances the variations by truncating the y-axis. There is usually ample pushback, but it often leads to the question: Where should it start then?

The simple answer should be: low enough. As I wrote in this tweetstorm (I should have blogged), the scale is a matter of context and it’s hard to make a rule without knowing the data and the reality. It should be low enough so that the perception of the graph is close to that of the reality it seeks to represent.

But that sounds too vague and people love rules, so I’m turning it into one:

The bottom third of the plot area should remain empty for a line chart that doesn't start at zero.

Now let’s see why.

Here is a data set with high numbers and relatively low variability plotted on a line chart with a scale starting at zero.

2 Zero baseline.png

Seems like not much happened in the last 20 years because the full scale hides relatively small variations (which may be the point, but that’s for another time). It is nearly useless as a visualization and a table would tell us more.

Now let’s go to the other extreme and set the minimum value of the scale at the minimum value of the data.

1 Down to zero.png

At first glance and not in a post about scales, you would think that the widget sales crashed to zero before going back up. That is, until you read the scale and realized that it starts at 55 000. This misperception is a kind of Stroop effect. Based on your experience of reality, the baseline of a chart appears to be ground zero when it is visually used like this.

We need a middle point. A range that is small enough to show variations, but with a starting point that is low enough to avoid the misperception that the data reached zero. That’s where I suggest that we design the range and the starting point so that no data appear in the bottom third of the plotting area. It looks like this.

3 Voila rule.png

Why one third? Perhaps because it’s the golden ratio, where the empty space at the bottom is one half of the graphing space at the top. It also corresponds to the rule of thirds in visual arts. By following this rule, two thirds of the plotting area are dedicated to the data and one third to the secondary fact that the line is floating into space far above the ground. I believe that data visualization has a lot to learn about older arts, languages and other practices and this just might be one example.

And finally, according to a few quick tests, it seems to be the default on MS Excel, so maybe they took it somewhere (although it sometimes truncates the y-axis for bar charts too).

Compare this with an empty space at 20% of the plotting area (which was my initial suggestion before I started testing and thinking it through).

5 20 percent.png

It is a matter of judgement, but it does appear that the line veers very close to the bottom, like a plane that’s flying too low. The eye is drawn to the baseline and the perception of ground zero is triggered again. One third seems to preserve enough space so that our eyes don’t go to the baseline and connect it to the data line.

Now, I would need 37 random students from the University of Austin to test this and truly turn it into an unimpeachable truth, but in the meantime we can say that we apply the golden ratio.


Now with annotations and calculations.

Here is the formula to calculate the value of the scale.

Formula text.png

Expressed in algebra:

 
Formula.png
 

And here is the minimum value of the scale (b) isolated:

 
formula simplified.png
 

In my example, it works like this:

 
formula simplified applied.png
 


So my scale will end at 57,000 and start at 54,000.

Thanks to my friend Robert Couillard, Eng., for isolating b more elegantly than the calculator did.


Thanks to Lisa Charlotte Rost for prompting me to think about this a little further and write about it when I tweeted something in jest in response to a Datawrapper update.


This blog post was quoted by The Economist in their review of their data visualization in March 2019.

Redesigning an important graph about a carbon tax

Data visualisation is a language for humans. Computers and machines are fine with raw data. On the other hand, humans are often interested in data simply because it is visualized. Many will understand the underlying phenomenon — a trend, an outlier, a missed target — when they see it in a graph.

Humans speak data visualization from birth and through learning. There are right and wrong ways of doing it. When trying to communicate, it’s as important to know what works and what doesn’t, just as it is with any other language.

Which brings me to this chart, included in a report from Resources from the Future, an independent, nonprofit research institution in Washington, DC.

Page 53

Page 53

I became aware of it when this tweet appeared in my timeline. It’s not from the report author, but from another climate specialist that marvels at the finding he sees in the graph.

One look at the graph and I knew it would be hard to decipher. Also, it looks like it was done quickly as it’s just a quick Excel graph with all the default features. This upset me because it seemed an important piece of information and this issue is very close to my heart.

Here is what’s wrong with this graph, in no particular order.

Appearance

The look of the graph is the first thing that made me realize something was not quite right. It uses the Excel defaults, with some drop shadows. It tells me that little care has been put into preparing this graph for publication. When reaching out to an external audience with the goal of attracting and retaining their attention, and to be taken seriously, aesthetics is not a luxury, it’s a step you can’t miss.

Labelling

It took me a while to realize that Q1-Q2-Q3-Q4-Q5 do not represent quarters. Our natural instinct is often to assume that the horizontal axis represents time, until proven otherwise. Here, Q1 means Quantile 1, but does Q1 means the top quintile or the bottom quintile? For people who do not study economics, it may not be obvious. This could be made clear by labeling the Q1 “Poorest” and Q5 “Richest”.

Q1 Q2.png

Title

Noah Kaufman had to translate the graph for his Twitter audience in part because the graph did not make clear what it was showing. The title is purely descriptive, something newspapers would never do since it’s their job to communicate clearly, even in their reporting sections.

Title.png

Here it seems that the title should be something more analytical like “The poorest will benefit from any of the measures”, to follow Noah Kaufman’s analysis.

Annotations

What annotations? There are no annotations to help us understand what’s going on, no pointers at the relevant parts of the graph for instance. Annotations are especially important in graphs that do not make it clear where we should be looking for what.

Vertical text.png

Text orientation

Research suggests that it takes twice the time to read vertical text, over horizontal. But the real issue is that we simply don’t read vertical text. Our eyes glaze over these meaningless shapes in a way that they can’t when text is horizontal. The vertical axis has a vertical label and it has taken me several attempts and the resolve to understand this graph before I truly read and understood it.

Colors

Colors are not neutral. Green is associated with positive meaning, like money, growth, and ecology. Red’s connotations in Western culture are negative: blood, violence, losses (and love, yes). It should not be used randomly, but in the present case it simply identifies one of the approaches (“Payroll”) that is no better or worse than the other three. It creates a Stroop effet where we initially assume something before having to discard it.

Legend colors.png

Horizontal axis

One of the main features of this graph is to differentiate between positive and negative impacts. The baseline should be a strong visual feature so that the eye immediately sees where it’s crossed. In reality, it’s barely differentiated from the grid. The shadows obscure it further.

Horizontal axis.png
Vertical axis.png

Vertical axis

The scale is inconsistent: it starts with a 0.25% interval and then moves to a 0.5% interval. It’s best to use consistent and intuitive increments. Another interpretation is that the scale has increments of 0.5% (-0.25 to 0.25% to 0.75% etc.) but that the zero baseline is completely ignored, which is not much better.


Redesign

Here is my suggested redesign, with the caveat that I did not speak with the authors, nor have I read the entire report in details. Also, I couldn’t find the data so I eyeballed it from the original graph.

A few observations:

  • Font: Dax

  • Colors: Wes Anderson palette from a scene in Hotel Chevalier.

  • Green for positive values, yellow for negative ones help the eyes immediately identify this crucial distinction.

  • With a panel chart, I like to use a faint background for the graph areas to help the eye differentiate them.

  • There are no annotations because they don’t seem necessary now that the chart is clearer.

  • The horizontal axis does not carry the label because it is mentioned in the graph’s subtitle.

  • Using horizontal bar charts gives space for the labels, without having to repeat them, nor use different colors for the different measures.

  • I would have preferred to use a vertical layout, with the graph for the poorest quintile at the bottom, because that’s how we think of income groups (top 1%, bottom 50%, etc) but it put the most important graph at the bottom, so it was not worth it.

  • It was all created in Excel; no need for fancy tools.

Many countries are considering a carbon tax to address climate change. In fact, Canada is embarking on a major political debate about a carbon tax and its distributional impact. One year ahead of the election, the party in power appears willing to bet the house on a measure that might be good for the environment but politically controversial. This type of communication can make a difference in how the public understands what they are voting about.

The latest report from the IPCC makes it clear that the future and safety of millions is at risk because humanity is failing to take decisive action to address climate change. The work that RFF and the likes does is tremendously important and helpful. This issue deserves to be explained and promoted with the same care that we, silly humans, put in advertising cars.

Refonte graphique: Passer du chiffre à la réalité

Les données ne sont pas que des chiffres.

Alors qu’un chiffre est qu’une abstraction, une donnée est la quantification d’une réalité ou d’une idée. On pense à vingt-cinq pour cent de femmes, à un million de dollars, à 130 km/h.

La visualisation de données est donc dépendante du sujet que représentent les données. Elle doit donner forme à cette réalité et respecter le sujet. Le designer d’information doit donc prendre en considération la nature de la donnée lorsqu’il fait des choix de design.

Prenons par exemple le graphique ci-dessous tiré d’un article de Radio-Canada sur la proportion d’hommes parmi les enseignants du primaire au Québec. Il s’agit d’une représentation correcte des chiffres qui sous-tendent le graphique. Pourtant, quelque chose manque, car le visuel ne renforce pas le message, soit que bien peu d’hommes enseignent au primaire. Il ne dit qu’une chose, soit que la tendance est assez stable, ce que l’article traite comme une information secondaire.

 

Ce qu’il manque est une échelle qui mettrait en perspective visuelle cette proportion d’hommes, qui la mettrait en contexte en gardant en tête la réalité que représente le chiffre. La refonte ci-dessous met l’accent sur la petite taille du 12%, puisqu’il s’oppose aux 88% de femmes.

La même situation se produit plus loin dans l’article où un graphique à ligne brisée similaire rapporte le pourcentage d’étudiants masculins en enseignement. Ironiquement, la ligne apparaît plus haute dans le graphique parce que l’échelle est moindre, alors que le message est au contraire que cette proportion est encore moindre que celle des enseignants en poste. Au moins cette fois, la stabilité est mentionnée dans l’article comme une donnée importante.

L’adoption d’un graphique à colonnes empilées règle encore là le problème, mettant en exergue la petitesse de la proportion d’étudiants masculins, tout en utilisant la même échelle que l’autre graphique (0-100%), montrant donc que cette proportion est encore moindre. Remarquez aussi qu’en gardant la largeur des colonnes à peu près similaire, on indique automatiquement à l’auteur que ce graphique contient plus de données que le précédent, ce qui n’était pas visible dans l’original.

Les deux graphiques originaux de Radio-Canada n’étaient pas faux. Mais ils traitaient les données comme de simples chiffres, obscurcissant leur message et la réalité qu’elles représentent.

Eyeo 2018: Conferencing in the age of the Internet

Holding my plate, I spotted a participant eating alone. “Thanks for saving me a seat”, I said as I sat with him. As we engaged in a lively conversation about online groceries, what’s recyclable and, of course, our jobs and the conference, four people sat next to us looking either at their phone or their computer. 

All the talks from Eyeo Festival 2018 will be available online. Why come here and not engage?

This is a tech conference unlike many others as questions of ethics, bias, inclusion and impact are brought up in a large proportion of sessions (one exception that stood out for me was that of David Ha on neural networks).

Setting the tone, Manoush Zomorodi’s keynote shed a light on the ways technology can worsen or enable our worst human traits. Our mobile phones can turn into time-suck and not surprisingly her projects found a segment of the population with an appetite for relief. One could sense agreement in the crowd, but maybe it was guilt. 

Eyeo is a special moment in time. It’s one of the most hyped conferences in several fields and it sells out quickly. Yet, so many of us stand in the middle of this rare mix of people, looking at our phones. Having conversations with people at home. Keeping strangers at a safe distance on Twitter. Watching or interacting mildly with acquaintances on Facebook. Plunging back into work on Slack, as if our brains didn’t remain there afterwards.

I asked one participant what she thought of the previous talk that we had both attended: “I zoned out. I was thinking about work…” she confessed.

As with the joke about the aliens, confused about who’s the master between the dog or the human picking up the poop, an observer could wonder who’s in control: technology or the user.

The irony of course is that that power was the underlying question of so many talks: Who holds the power in tech? How is it yielded? To the benefit of who?

Jane Friedhoff turned the power fantasies on their head in her games, defining the audience not as a benevolent majority gracefully willing to empathize, but as the members of the opressed group that needs catharsis. To redistribute power, Matt Mitchell teaches cryptography to African American populations accustomed to be monitored. Meredith Whittaker conveyed a sense of urgency about the biases in the data we feed to our new overlords of artificial intelligence. 

In the description of their talk, Dynamicland lay it plainly: “Increasingly, working on a computer isolates us more than it connects us”. But it doesn’t have to be like this, so they take the technology out of the computers and into the real world. No more screens: let the humans share the space with technology. And it works. 

The Eyeo organizers have gone to great lengths to make the experience of attending more real and less virtual. The workshops on Monday were very interactive, forcing the participants to get to know one another from the beginning. The delightful personalized button designed by Giorgia Lupi based on our answers to a survey were playful conversation starters. The program of the conference printed on the back of our name tags gave us one less excuse to pull out our phones to check the schedule and then slip into email or social media.

The Eyeo app created a private space for the participants to connect, away from the chaos of Twitter. There was no live-tweeting function, thank goodness, so the speakers were given the full 45 minutes to communicate one on one with each member of their audience before we have our opinions distracted and shaped by the perceptions of others. Both sides deserved this time.

The app was also the place to organize meetups of people with shared interests, to find the show and tell sessions of the participants, or to organize spontaneous dinner plans. All ways to ricochet on the virtual and back into the real world.

Efforts to reach out to strangers were richly rewarded. I was energized by the enthusiasm of Hannah for data visualization, seemingly unaware of her talent as we discussed her recent piece. How can I not be short on time when discussing with Kim, an architect who now teaches board game design? It was exciting to see the possibilities in Jamie’s project of moving to New York City. Did I even scratch the surface with Anni, a coding artist from Mexico who briefly lived in South Korea? How much more could I have learned from Brian’s experience of leading a dataviz team? I could go on.

I walked up to Amanda Cox, Casey Reas, Giorgia Lupi and Stephanie Posavec because they give me inspiration and energy and I wanted to gather some of it live but also to give back just a little by thanking them in person. I’m taking the memory of our interactions home with me.

So many speakers have challenged us to think about how the tech we create distributes power amongst owners and users. We also need to think as users about how much power we yield to tech. As I looked around at people standing side by side, staring at their phones, I asked myself: What will it take?