How we visualized world inequality

In a way, this blog post started over 15 years ago when I studied inequality and the United Nations’ Human Development Index. So, naturally, the data visualization challenge about inequality announced by the UN caught my eye. It seemed like the perfect opportunity to collaborate with Erica, a recent graduate of the computer engineering program at Polytechnique Montreal, who came to work at Voilà for a few weeks. We could discuss the topic, iterate on the concept and Erica could practice her programming skillschops.

The final project is available here and below is a presentation of our process. The code is available on GitHub.

- Francis


final-vis.PNG

by Erica Bugden and Francis Gagnon

Inequality is both numbers and reality: abstract enough to be invisible, but real enough to be experienced. A perfect topic for a visualization.

The UN wanted to make these numbers more real for a global audience using visualizations with some interactivity so that viewers would engage with the data themselves.

So here is our process and how we went from one concept to another, changed our mind, went back, took too long to find obvious solutions and ended up with a final product that could never really be final.

Getting started: Brainstorming and sketches

From the start, we knew that we would focus on economic inequalities. We had to keep it simple because of our limited means (two people with other (paid) things to do) and time (two weeks). Here are a few ideas that we sketched as we were thinking out loud.

small-multiples.PNG

Panel charts allow for a high density of visualized data without the presentation becoming too noisy. They also allow for reasonably good high level comparisons between different countries. With so many countries to compare, it could be a valuable technique.

different-plots.PNG

It can also be interesting to combine several types of plots or measurements to get a more complete picture of each country.

We explored a couple different chart designs for showing the evolution of inequality, including a connected scatter plot showing the relationship between inequality and growth over time, inspired by the work of Hannah Fairfield and Alberto Cairo, but we eventually gravitated towards a series of bars.

connected-scatter-plot.PNG

Sequential bars could show the evolution of disparities while keeping time as the horizontal axis. This could make the visualization more intuitive because, when showing an evolution, time is often assumed to be the horizontal axis.

sequential-bars.PNG

Quintiles are the five equal groups into which a population can be divided based on a particular variable. They are often used to compare richer and poorer groups when discussing economic inequalities.

middle60-1.PNG

We thought it would be interesting to show the evolution of the income share of each quintile by stacking the percentages owned by each group. This sketch shows two years of data for a given country. The scribbled values are just us trying to make sense of the numbers.

We initially leaned towards showing the evolution of the proportion owned by the three middle quintiles and having the top and bottom quintiles invisible. It would create floating blocks that could be visually interesting. The smaller the block, the higher the inequality.

middle60-2.PNG
middle60-3.PNG
top-bottom-20.PNG

However, doing the opposite, that is showing the top and bottom quintiles, seemed more relevant to focus on the inequality between the richest and poorest. The eye could compare the height of the top columns to that of the bottom columns, and their respective evolutions of course. We were getting somewhere.

We had a concept for visualizing shares, but it didn’t show how relatively rich a country is, limiting the usefulness of international comparisons in our panel charts. So we thought we could add annual income to our graphs, adjusted for purchasing power parity.

These bars can be stacked in the same way as the income percentages except that, in contrast with the percentages of income which always add up to 100%, the total height of the bar has no meaning (it’s only the sum of the five quintile averages). If comparable currency is used to calculate the average income, then the salaries in a poorer country will be visibly much lower than those of a richer country.

avg-income-1.PNG
avg-income-2.PNG

At this point, we felt that we had our concept down and we were ready to start playing with the data.

Data processing

All the data processing in this project was done using Python scripts. The data was sourced from the World Bank Open Data database, to which the UN database was connected. Describing the income distribution meant we needed to know the following tidbits of information:

To be able to calculate the annual incomes for each quintile we also needed to know how many people were living in the country and how much money the country was making:

As is often the case when playing with data, we quickly saw the datasets’ limitations, especially in terms of completeness (thankfully, World Bank data is as good as it gets in the real world) so we had to make choices.

Given that inequality data became more available over time, we had to choose a cut off date at which to start our visualization. So, we created a histogram in Excel showing the number of countries included in the database per year. We choose 1992 as the start date because of the drop in numbers before that and that it would give us a nice round 25 years of data.

The second issue was that some countries simply didn’t have a lot of data available during that period. We wanted to have enough data in each country to see representative trends so we decided that 10 years of data was a strict minimum. This led us to exclude some (normally) data-rich countries like Canada and Israel as well as large ones like India.

With this, we had our concept and our data. We were ready to start visualizing.

Prototyping

Having already developed a concept on paper, we made some quick visualization prototypes using the real data and Plotly’s Python graphing library in order to get a feel for the effectiveness of the different options.

Income share, version 1 (bad)

We started with some ideas that turned out to make little sense once we saw them. One of our first shots in the dark shows the top quintile’s income share with an invisible cutout at the bottom corresponding to the bottom quintile’s share. In practice, the length of the blue bars shows the difference between the shares owned by the top and the bottom 20%. But the top of the bar really only shows the top 20% share.

income-share-bad1.png
income-share-bad2.png

The result is a bar that floats but the floating is barely visible in countries where the poorest have a very little share.

Income share, version 2 (good)

Here we show the distribution of income (%) in terms of percentage between the top 20% (orange), the bottom 20% (blue) and the middle 60% (the space between the orange and blue). This would be our basic design going forward.

income-share-good-1.png
income-share-good2.png

Average incomes

Here, we show the average income ($) for that year of the bottom 20% (blue), the middle 60% (invisible) and the top 20% (orange).

avg-income1.png

In this second design, we have an idea of how rich the country is in comparison to other countries, but it is difficult to understand the distribution of income within the country. Also, the total length of each stack has no particular meaning apart from giving an idea of how rich the country is.

Both seemed interesting and the challenge called for interactivity, so we eventually decided to show both thanks to a toggle button.

Developing the design

Some might have already recognized our divergent color palette from ColorBrewer. At this stage, we wanted something that would just work, knowing that we would later adjust it for aesthetics. But first, we wanted to experiment and see if we could effectively encode both share and levels of wealth in the same graph.

Lightness for income levels

First we tried varying the lightness of the bars based on a single scale that varied linearly from the minimum measured income to the maximum measured income out of all the different countries. Lighter bars meant that a country was poorer while darker bars meant that a country was richer.

lightness.PNG

We found that because the differences between the incomes of the richest and the poorest countries are so big it was too difficult to see the evolution of the income individual countries when using a linear scale because the changes were so slight.

Lightness for income groups

Since a linear scale made the changes too subtle we tried using a discrete scale instead. More specifically, the bars were colored based on the country’s World Bank income group classification which splits countries into four income groups: low, lower-middle, upper-middle, and high.

income-groups.PNG

This approach shows the changes in income group of a particular country and can be compared with other countries. One minor issue with this iteration was that the dark orange used for the rich countries looked red which had a negative feel even though a country having high income is not fundamentally negative.

Red line for a threshold

We also wanted to incorporate more visual references to give viewers a better idea of whether or not a country had a reasonable distribution of income.

First we tried the simple solution of placing a red line so that it is easy to see if the top 20% had more than 40% of the income.

red-line.PNG

However, this does not solve the problem of the rich countries looking red. Also, it draws a lot of attention to this made-up threshold.

Color for concentration of income

For this iteration, we added another nuance to the lightness of the income groups. We used colour to encode the concentration of income, duplicating the encoding in the bar length. Purple bars mean that the top 20% had less than 40% of the income. Lightness still refers to income groups.

purple-bars.PNG

This representation was too difficult to understand and to read. The colour lightness scale represents the income group of the country, but confusingly the hue change from orange to purple indicates having passed an income distribution threshold which has nothing to do with the country’s overall income. Additionally, the colour changes can look really drastic when a country briefly dips below the inequality threshold even if it is just a slight variation.

Finally, we also abandoned the idea of using bar colour to show inequality between countries because we couldn’t find a representation that felt sufficiently intuitive.

Some other fill options

Here are some other fill options that were considered for showing the severity of the inequality. All of them rely on some visual transition at 20%, 40% and 60% of income concentration. In the example on the right, the color changes at 40% because it means that the richest quintile has more than twice its equal share of income.

other-fills.PNG

In the end, it did not seem right to define a firm but arbitrary threshold for acceptable inequality. We wanted something that would be more neutral in terms of judgement, but that still helped people quickly and intuitively evaluate the severity of the inequality.

Colour as grid

Our solution was to simply add more reference lines. We used the colour of the bar itself to show the severity of the inequality. This approach quickly draws attention to where inequalities are more severe.

color-grid.PNG

In this first version, the colour of the top 20% section of the bar is blue from 0% to 20% to mirror the blue used for the bottom 20% and changes from orange to red when the percentage owned by the top 20% surpasses 40% of the country’s overall income.

Lovecraftian programming

We have very minimal web programming experience and this will probably be apparent in this section as we attempt to describe a few of the technical details concerning how the visualizations are implemented. It’s important to note that during the development process we made frequent use of the following reference books:

orly-books.PNG
feature-complete.PNG

… usually in that order. All that to say that there are some unspeakable horrors in the code that would probably drive Lovecraft himself to madness.

The final individual charts are generated using the outdated third version of D3 (they’re at version five now), while the rest of the chart layout and the chart toggle is just plain old HTML, CSS and Javascript. We used this older version of D3 for the predictable reason that this is the version used in the example that inspired much of the code and we didn’t get around to figuring out what changes were needed to implement it in a more recent version of D3.

When the transition was made from prototyping with Plotly to using D3, the Python scripts used to process the data were adjusted to organize the data into JSON objects that could easily be plotted with D3. Not having to manipulate large amounts of data in the browser makes generating the visualizations faster.

The “temporary workaround” present in the final code that is arguably the most interesting is what was used to colour the top 20% bars. In brief, the colour of the bars is defined using a linearGradient element. By specifying that a transition starts and ends at the exact same position you get an instantaneous colour switch and each colour switch in the bars (including the “gridlines”) is a transition that is specified in this way. Here’s what defining the blue and orange bar sections looks like in the code:

        // Define bar colors in the most elegant way possible...
	gradient
		.append('stop')
		.attr('stop-color', blue)
		.attr('offset', '19.5%');
	gradient
		.append('stop')
		.attr('stop-color', gridlineColor)
		.attr('offset', '19.5%');
	gradient
		.append('stop')
		.attr('stop-color', gridlineColor)
		.attr('offset', '20%');
	gradient
		.append('stop')
		.attr('stop-color', orange)
		.attr('offset', '20%');
	gradient
		.append('stop')
		.attr('stop-color', orange)
		.attr('offset', '39.5%’);

To make sure only the top 20% bars are coloured using this gradient, we use a set of rectangular clipPath elements corresponding to the top 20% bars to define where the gradient should show through. Though this hack works fairly well when the visualization doesn’t require any interactivity, it made trying to implement hovering on the bars more complicated than we had time for. The subtitle on the third reference book above was more relevant than expected: “Not thinking about how much pain this is going to cause in the future”.

All joking aside, although there is much room for improvement, the code remains quite legible and our coding skills have definitely improved over the course of this project. After implementing the full D3 version of the visualization, we moved on to making the last visual adjustments.

Polishing the design

The “feature complete” iteration of the visualization below has a colour change every 20%. The chart layout no longer includes gridlines as they are essentially built into the visualization. Additionally, the charts have grey country names as well as a light grey background to group the title with the chart as well to define the chart area since there are no visible axes.

Colors

We ended up developing our own color palette with Adobe Color. Color Brewer is an excellent technical tool to choose colours with high contrast and legibility. In the case of this project however, such characteristics were not so necessary and we meant to have a more original look with a stronger accent on aesthetics. We were still trying to convey a certain progression from acceptable to bad. We chose the palette on the left.

color-palette.png

The visualization of income per capita uses the same colours but differently. We colour the bars in the annual income version (below right) the same colour as the bottom of the top 20%’s bar in the distribution visualization (below left). Conveying information about the income concentration in the annual income charts creates a visual and logical link between the two styles of charts.

avg-income-bar-color.PNG

Organization and layout

The countries are ordered from most unequal to least unequal according to the most recent data available. The size of the charts is kept small enough to allow for comparing several countries without having to scroll, yet large enough to remain legible.

Below is an excerpt from the final visualization.

Share of income

final-share-income.png

Income per capita

final-avg-income.png

In a way, the process is more valuable than the final version of the graph because we used this project to experiment a little and to learn about our own processes, our tools and ourselves. Still, we are happy that we have something to show for it.

What happened when I used an inclusion rider at a data conference

This is the blog post that I wish I would have found when I got invited to speak at a data conference and decided to send them an inclusion rider for women and minorities. I hope this is how you found it.

(Also, obviously, I’m virtue signalling and if that bothers you, you can stop reading here.)

I was quite happy to be invited to speak about data visualization in front of a large crowd (turned out to be about around 150 people). It had been my goal recently to speak beyond my training classes, to be recognized not just for my expertise, but for my capacity to inspire others.

What I’m trying to say is that this was valuable to me. I was a tad hesitant to risk it with an inclusion rider — I’m no celebrity for whom a conference would bend over backwards. But I knew I couldn’t live with myself if I didn’t even try.

Writing the inclusion rider

I had been inspired by the discussions in the film industry, but also by those in my field, notably by Jon Schwabish (here’s an insightful blog post about his attempt to foster diversity and inclusion at a conference). So on the same day that I accepted the invitation, I started looking around for resources with a call for examples on Twitter.

I did not find a ready-made letter, so I wrote my own. I was inspired by that of Kyle McDonald and the Annenberg Inclusion Initiative of the University of Southern California School For communication and Journalism that provide a template (PDF) for the film industry.

My main challenge was to ask clearly enough without coming across as imposing on them. Perhaps I had every reason to be forceful because it’s the right thing, but I could only go as far as saying that this is important to me as I hoped it was to them, and I would like to know what they were doing about it. I drew my line at no more than 75% white man in the speakers line up, an easy threshold to clear given that they already had 26% women or minorities the previous year.

I shared my idea and draft with people close to me and all were supportive, except for one person who felt it is detrimental to open debates to take identity into account. We had a productive discussion where we saw better one another’s perspective and I’m happy to report that I did not lose this friend. Others were new to the idea and thought it was interesting and a step in the right direction. One person asked if I had enough star power to make such demands, and another wanted to know if I should be clearer in my letter as to the consequences if the conference was not diverse enough for my taste.

You can read my inclusion rider here, in English and in the original French. It took me a couple of hours to put it together.

Discussions with the organizers

I sent the letter by email the day after I accepted the invitation. A week later, I had a call with an organizer to discuss the content of my talk. I brought up the inclusion rider at the end and he explained that they shared my concern and that a woman on the organizing committee was forceful about it. I thanked him and said that I would be keeping up with their efforts.

About a week later, I received an email response from another organizer. She thanked me for bringing up the issue and said that they were inspired to see speakers take concrete steps. She went on a tangent about other events organized by the same people that were promoting women in IT. About the conference at hand, she explained that they had already 25% of women speakers and that 50% of their pending invitations were with women. She also mentioned that they bring up the Women in Tech Manifesto with their sponsors (IT companies), encouraging them to send as many women representatives as possible.

I was pleased that they had taken the time to think about it and reply at length. I was a tad disappointed that they were still only at 25% women and that they did not bring up the question of minorities. I wanted to reply as such to them, but after hesitating on the right approach, it slipped my mind.

tarte.png

Since my talk was about data visualization, I decided to use the data about the final gender distribution of the speakers for one of my slides. Turns out that it reached almost 32% of women, which is an improvement over the previous year, and does not count the representation of minorities — which is harder for me to measure as an outsider.

While at the conference, I discovered that my letter had started a conversation among the organizing committee. They had questioned how much they do to include women and it led to the creation of a panel on women in IT. The panel was comprised of four women, lasted 35 minutes and was held in the largest conference room of the event, under the theme of “inspiring conferences”. I was not able to attend unfortunately. One organizer also said that he, also a white male, would consider having an inclusion rider next time he’s invited to speak.

An assessment of the experience

I’m a person with very little power, not a celebrity and not an A-lister anywhere but in my family’s heart. Yet, I’m a white man. By bringing this up, I showed that this is an issue that bothers more than the people that are excluded.

I’m satisfied with the outcome, especially that it started a conversation among the committee members leading to tangible action. I want to praise the Datavore team for their openness and how they reacted in general. I did not take it for granted when I sent my letter.

I could have done better. I did not go very far in my inclusion rider. I simply asked them to match their performance of the previous year and I did not follow up on the question of the minorities. Another issue of course is that this is my story, that of a white man again, and that the focus should be on the missing women and minorities, not me.

Still, I hope though that this post can inspire and inform others who may have heard of inclusion riders and wondering if it is something that they can do to contribute to improving diversity and inclusion in their field. My advice is this: just do it.

A golden ratio for line charts with truncated y-axis

While we hold that bar charts should start at zero, there is no such expectation for line charts. But you’ll always find someone on social media to denounce a “dishonest” line chart that enhances the variations by truncating the y-axis. There is usually ample pushback, but it often leads to the question: Where should it start then?

The simple answer should be: low enough. As I wrote in this tweetstorm (I should have blogged), the scale is a matter of context and it’s hard to make a rule without knowing the data and the reality. It should be low enough so that the perception of the graph is close to that of the reality it seeks to represent.

But that sounds too vague and people love rules, so I’m turning it into one:

The bottom third of the plot area should remain empty for a line chart that doesn't start at zero.

Now let’s see why.

Here is a data set with high numbers and relatively low variability plotted on a line chart with a scale starting at zero.

2 Zero baseline.png

Seems like not much happened in the last 20 years because the full scale hides relatively small variations (which may be the point, but that’s for another time). It is nearly useless as a visualization and a table would tell us more.

Now let’s go to the other extreme and set the minimum value of the scale at the minimum value of the data.

1 Down to zero.png

At first glance and not in a post about scales, you would think that the widget sales crashed to zero before going back up. That is, until you read the scale and realized that it starts at 55 000. This misperception is a kind of Stroop effect. Based on your experience of reality, the baseline of a chart appears to be ground zero when it is visually used like this.

We need a middle point. A range that is small enough to show variations, but with a starting point that is low enough to avoid the misperception that the data reached zero. That’s where I suggest that we design the range and the starting point so that no data appear in the bottom third of the plotting area. It looks like this.

3 Voila rule.png

Why one third? Perhaps because it’s the golden ratio, where the empty space at the bottom is one half of the graphing space at the top. It also corresponds to the rule of thirds in visual arts. By following this rule, two thirds of the plotting area are dedicated to the data and one third to the secondary fact that the line is floating into space far above the ground. I believe that data visualization has a lot to learn about older arts, languages and other practices and this just might be one example.

And finally, according to a few quick tests, it seems to be the default on MS Excel, so maybe they took it somewhere (although it sometimes truncates the y-axis for bar charts too).

Compare this with an empty space at 20% of the plotting area (which was my initial suggestion before I started testing and thinking it through).

5 20 percent.png

It is a matter of judgement, but it does appear that the line veers very close to the bottom, like a plane that’s flying too low. The eye is drawn to the baseline and the perception of ground zero is triggered again. One third seems to preserve enough space so that our eyes don’t go to the baseline and connect it to the data line.

Now, I would need 37 random students from the University of Austin to test this and truly turn it into an unimpeachable truth, but in the meantime we can say that we apply the golden ratio.


Now with annotations and calculations.

Here is the formula to calculate the value of the scale.

Formula text.png

Expressed in algebra:

 
Formula.png
 

And here is the minimum value of the scale (b) isolated:

 
formula simplified.png
 

In my example, it works like this:

 
formula simplified applied.png
 


So my scale will end at 57,000 and start at 54,000.

Thanks to my friend Robert Couillard, Eng., for isolating b more elegantly than the calculator did.


Thanks to Lisa Charlotte Rost for prompting me to think about this a little further and write about it when I tweeted something in jest in response to a Datawrapper update.


This blog post was quoted by The Economist in their review of their data visualization in March 2019.

Redesigning an important graph about a carbon tax

Data visualisation is a language for humans. Computers and machines are fine with raw data. On the other hand, humans are often interested in data simply because it is visualized. Many will understand the underlying phenomenon — a trend, an outlier, a missed target — when they see it in a graph.

Humans speak data visualization from birth and through learning. There are right and wrong ways of doing it. When trying to communicate, it’s as important to know what works and what doesn’t, just as it is with any other language.

Which brings me to this chart, included in a report from Resources from the Future, an independent, nonprofit research institution in Washington, DC.

Page 53

Page 53

I became aware of it when this tweet appeared in my timeline. It’s not from the report author, but from another climate specialist that marvels at the finding he sees in the graph.

One look at the graph and I knew it would be hard to decipher. Also, it looks like it was done quickly as it’s just a quick Excel graph with all the default features. This upset me because it seemed an important piece of information and this issue is very close to my heart.

Here is what’s wrong with this graph, in no particular order.

Appearance

The look of the graph is the first thing that made me realize something was not quite right. It uses the Excel defaults, with some drop shadows. It tells me that little care has been put into preparing this graph for publication. When reaching out to an external audience with the goal of attracting and retaining their attention, and to be taken seriously, aesthetics is not a luxury, it’s a step you can’t miss.

Labelling

It took me a while to realize that Q1-Q2-Q3-Q4-Q5 do not represent quarters. Our natural instinct is often to assume that the horizontal axis represents time, until proven otherwise. Here, Q1 means Quantile 1, but does Q1 means the top quintile or the bottom quintile? For people who do not study economics, it may not be obvious. This could be made clear by labeling the Q1 “Poorest” and Q5 “Richest”.

Q1 Q2.png

Title

Noah Kaufman had to translate the graph for his Twitter audience in part because the graph did not make clear what it was showing. The title is purely descriptive, something newspapers would never do since it’s their job to communicate clearly, even in their reporting sections.

Title.png

Here it seems that the title should be something more analytical like “The poorest will benefit from any of the measures”, to follow Noah Kaufman’s analysis.

Annotations

What annotations? There are no annotations to help us understand what’s going on, no pointers at the relevant parts of the graph for instance. Annotations are especially important in graphs that do not make it clear where we should be looking for what.

Vertical text.png

Text orientation

Research suggests that it takes twice the time to read vertical text, over horizontal. But the real issue is that we simply don’t read vertical text. Our eyes glaze over these meaningless shapes in a way that they can’t when text is horizontal. The vertical axis has a vertical label and it has taken me several attempts and the resolve to understand this graph before I truly read and understood it.

Colors

Colors are not neutral. Green is associated with positive meaning, like money, growth, and ecology. Red’s connotations in Western culture are negative: blood, violence, losses (and love, yes). It should not be used randomly, but in the present case it simply identifies one of the approaches (“Payroll”) that is no better or worse than the other three. It creates a Stroop effet where we initially assume something before having to discard it.

Legend colors.png

Horizontal axis

One of the main features of this graph is to differentiate between positive and negative impacts. The baseline should be a strong visual feature so that the eye immediately sees where it’s crossed. In reality, it’s barely differentiated from the grid. The shadows obscure it further.

Horizontal axis.png
Vertical axis.png

Vertical axis

The scale is inconsistent: it starts with a 0.25% interval and then moves to a 0.5% interval. It’s best to use consistent and intuitive increments. Another interpretation is that the scale has increments of 0.5% (-0.25 to 0.25% to 0.75% etc.) but that the zero baseline is completely ignored, which is not much better.


Redesign

Here is my suggested redesign, with the caveat that I did not speak with the authors, nor have I read the entire report in details. Also, I couldn’t find the data so I eyeballed it from the original graph.

A few observations:

  • Font: Dax

  • Colors: Wes Anderson palette from a scene in Hotel Chevalier.

  • Green for positive values, yellow for negative ones help the eyes immediately identify this crucial distinction.

  • With a panel chart, I like to use a faint background for the graph areas to help the eye differentiate them.

  • There are no annotations because they don’t seem necessary now that the chart is clearer.

  • The horizontal axis does not carry the label because it is mentioned in the graph’s subtitle.

  • Using horizontal bar charts gives space for the labels, without having to repeat them, nor use different colors for the different measures.

  • I would have preferred to use a vertical layout, with the graph for the poorest quintile at the bottom, because that’s how we think of income groups (top 1%, bottom 50%, etc) but it put the most important graph at the bottom, so it was not worth it.

  • It was all created in Excel; no need for fancy tools.

Many countries are considering a carbon tax to address climate change. In fact, Canada is embarking on a major political debate about a carbon tax and its distributional impact. One year ahead of the election, the party in power appears willing to bet the house on a measure that might be good for the environment but politically controversial. This type of communication can make a difference in how the public understands what they are voting about.

The latest report from the IPCC makes it clear that the future and safety of millions is at risk because humanity is failing to take decisive action to address climate change. The work that RFF and the likes does is tremendously important and helpful. This issue deserves to be explained and promoted with the same care that we, silly humans, put in advertising cars.

Refonte graphique: Passer du chiffre à la réalité

Les données ne sont pas que des chiffres.

Alors qu’un chiffre est qu’une abstraction, une donnée est la quantification d’une réalité ou d’une idée. On pense à vingt-cinq pour cent de femmes, à un million de dollars, à 130 km/h.

La visualisation de données est donc dépendante du sujet que représentent les données. Elle doit donner forme à cette réalité et respecter le sujet. Le designer d’information doit donc prendre en considération la nature de la donnée lorsqu’il fait des choix de design.

Prenons par exemple le graphique ci-dessous tiré d’un article de Radio-Canada sur la proportion d’hommes parmi les enseignants du primaire au Québec. Il s’agit d’une représentation correcte des chiffres qui sous-tendent le graphique. Pourtant, quelque chose manque, car le visuel ne renforce pas le message, soit que bien peu d’hommes enseignent au primaire. Il ne dit qu’une chose, soit que la tendance est assez stable, ce que l’article traite comme une information secondaire.

 

Ce qu’il manque est une échelle qui mettrait en perspective visuelle cette proportion d’hommes, qui la mettrait en contexte en gardant en tête la réalité que représente le chiffre. La refonte ci-dessous met l’accent sur la petite taille du 12%, puisqu’il s’oppose aux 88% de femmes.

La même situation se produit plus loin dans l’article où un graphique à ligne brisée similaire rapporte le pourcentage d’étudiants masculins en enseignement. Ironiquement, la ligne apparaît plus haute dans le graphique parce que l’échelle est moindre, alors que le message est au contraire que cette proportion est encore moindre que celle des enseignants en poste. Au moins cette fois, la stabilité est mentionnée dans l’article comme une donnée importante.

L’adoption d’un graphique à colonnes empilées règle encore là le problème, mettant en exergue la petitesse de la proportion d’étudiants masculins, tout en utilisant la même échelle que l’autre graphique (0-100%), montrant donc que cette proportion est encore moindre. Remarquez aussi qu’en gardant la largeur des colonnes à peu près similaire, on indique automatiquement à l’auteur que ce graphique contient plus de données que le précédent, ce qui n’était pas visible dans l’original.

Les deux graphiques originaux de Radio-Canada n’étaient pas faux. Mais ils traitaient les données comme de simples chiffres, obscurcissant leur message et la réalité qu’elles représentent.