Interview: Stephanie Evergreen wants to build a 30-foot dataviz out of clay

I got to cross an item off the bucket list when offered the opportunity to work with another data visualization professional. Luck stroke when a client invited me to work on a project with Stephanie Evergreen. And it was all that I was hoping: inspiring, professional, dynamic. 

Just look at what she's done: she's written one of the books in my library, she delivers workshops around the world, blogs profusely, is cooking up a dataviz academy, and more. She's been on the PolicyViz podcast of course. But now is when she really achieves fame by agreeing to answer a few questions for this blog. 

1. Of the last 50 tweets in your home timeline, which is your favorite and why?

My friend Chris Lysy posted a cartoon and short blog about the drama I recently had in which some pure statisticians got angry over the way guests on my site worded their regression results. It’s adding humor to something that became a little stressful and I always appreciate that.

2. What’s the one thing you would improve on the Minard map? 

A “Translate This” button, because I don’t know French.

3. Who needs data visualization most and doesn’t know it?

The guy sitting next to me in first class. I tweeted a pic of one of these situations:

4. If you were to organize a conference on data visualization, how would you call it and why?

I’d call it the “Free Kittens Conference” and make it such that everyone sends the one person they know who really needs to learn more about data visualization but is reluctant to change. Do you think the guy in first class will come to a Free Kittens Conference? Maybe it needs to be called Free Martinis.

5. What else would you do if you were not remotely in this field?

Nature guide. I have a knack for identifying flora and fauna. I can barely remember my mother’s birthday but I can detail how to distinguish a vulture from a hawk from an eagle when they fly overhead (hint: it's in the wing tips).

6. How would you introduce yourself to Edward Tufte?

Hard pass on this question, Francis.

7. Who has less than 300 followers on Twitter and should have much more? 

Deven Wisner.  He’s been blogging on dataviz lately and he’s got solid info to share. 




8. (From Andy Kirk) What’s the one data visualisation passion project that you’ve never had the time, opportunity or capability to undertake but would dearly love to do so one day?

After launching the Data Viz Academy and the Chart Chooser Cards, I’ve gotten to realize many of my dreams. But I’ve always wanted to work with a client who would let me build a 30 foot data visualization out of clay, showcasing some great results, for display in their lobby. 

9. What question should I ask my next guest? 

What’s the freshest good data visualization idea you’ve heard lately?

Tapestry 2017: Connecting the Dots

Data visualization is not an easy job. This can get lost in the quibbles about whether bar charts are boring and pie charts are useless, as we risk thinking that our role is limited to finding which encoding is perceived most precisely by the human eye. In fact, it is to give humans access to something written in a machine language: data.

The richness of a conference like Tapestry is a reminder of the multifaceted job of information designers. This year, the thread that ran through the conference seemed to be the multiple connections that we need to establish to hope succeed at our task.

The data visualization community is a motley crew and Catherine d’Ignazio made the most of it as she trained art students to play with data and illustrate it with animated GIFs. The result was engaging, amusing even. How else would responses to a survey became a spectacular, months-long exhibit in a public hall?

This happened because the data had a purpose — in this case, public transit. In fact, it seems a sine qua non if we want our content to connect with our audience. Finding something important to say about schools and discrimination in Florida is what got the Tampa Bay Times a Pulitzer price for Failure Factories, the project shown by Nathaniel Lash.

An important topic is not enough to guarantee a connection. How matters. Our tool is data visualization and our audience is humans — certain ways will connect the two better than others. Michelle Borkin brought the valuable perspective from academia as she explored what visual elements are most easily recalled in a visualization. Again, the findings were about connection: people would recall a title that was more story than description, and human recognizable objects such as animals and silhouettes. Who would have remembered anything from Cole Nussbaumer’s “typical business presentation” if she hadn’t translated it later into a story about vocal and insightful dissatisfied customers? The fact that her satire was a reflection of the real world reminds us that understanding a story might be intuitive, but knowing how to tell one is learnt.

Jewel Loree opened up about her stumbles in sharing her enthusiasm for data about a local radio station. Fellow fans of the station weren't thrilled by the mere existence of the data and its superficial findings. She had to dig deeper until a story about the connection between the artists, the DJs and the audience emerged. 

The conference started and ended by challenging us to connect with ourselves. Self-awareness is a rhetorical tool and not the least. Still, it remains elusive. Are we even aware of our own biases and emotions, asked Lena Groeger and Neil Halloran? Are we ready to face them? The question of biases seems especially relevant to a field where many of us are working alone or in very small teams where diversity is either impossible or very difficult, but no less important. Exposing ourselves to different perspectives, through reading, listening and experiencing, is merely a start.

On the day of my return from Tapestry, I witnessed a grown man, a professional hockey player, cry in front of thousands and millions on television because he was reminded of a past connection to a city where he lived in for years. Connection is what moves us and, as we attempt to move people, we’ll have to learn to connect: to our subjects, to our audiences and to ourselves.

Much like the meal at a dinner with friends, talks are central to a conference, but they remain an excuse to meet people. For a lone freelancer, these opportunities to get together with peers are a fountain of youth, to use a metaphor local to our host city. Sinuous career paths caught in the attraction of a passion, infectious enthusiasm, and similar interest in mundane details are all things that remind me of my connection to a community. For these reasons, I can’t stress enough how important it was to meet again with some and to get to know others. With the certainty of overlooking some, let me say another hello to Andy Kirk, my partner in crime for the poster, Cole Nussbaumer Knaflic, Jon Schwabish (and his mom), Neil Halloran (and his dad), Andy Cotgreave, RJ Andrews, Domonique Meeks, Ben Jones, Chad Skelton, Catherine Madden, Enrico Bertini, Robert Kosara, Jewel Loree, Lena Groeger, Alberto Cairo, Chris Mast, Naomi Robbins, Jeffrey Osborn, Jeffrey Shaffer and my rideshare team: Chrys Wu, Blake Esselstyn and Lori Navarro. I came to see you. Thanks to the organizers for inviting me.

Also: My thoughts about Tapestry 2014.

What can we do?

See Andy Kirk's own blog post about this topic.

Truthiness was coined in 2005 and was awarded word of the year in 2006. We had almost gotten used to climate change deniers and vaxxers that challenge scientific consensus et empirical evidence. Now, it seems like daily we are asked to ponder what's real.

How many people affected by the US Muslim ban? How high is the murder rate in the US? How much money is there for the NHS in Brexit? Soon: What is the unemployment rate? Growth rate? And this is not to mention minor topics like crowd sizes and extent of media coverage.

Wrapped in the daily puzzlement is a challenge to fundamental values. Not long ago, refugees were people in dire need deserving of our greatest generosity. Now, they are presented as the threat they are running from. The right to vote was the very foundation of democracy and today some are openly trying to limit it for political or even racist reasons.

In the Western world, the ideals of liberal democracy and the Enlightenment are under pressure in ways few of us can remember. 

Just one look at the Twitter feed of the Tapestry 2017 participants and the anxiety is palpable in the dataviz community. Here are the first few tweets as I'm writing this on a Monday morning:

These are the first four tweets, no selection. That's not surprising for a community whose business is often to present facts, to apply a certain intellectual discipline to analysis, to discover news in data. The civic streak of data visualization practioners run deep, from the historical work of John Snow and Florence Nightingale to that of W.E.B. Du Bois and more recently of Hans Rosling who promoted a "fact-based worldview".

Naturally, this topic immediately came up when chatting with Andy Kirk about our participation to Tapestry on March 1st. What can be done about it? Instead of pontificating, we thought it best to use this rare opportunity of having so many dataviz people together to ask them what they think we can do.

Our poster idea was accepted by the organizers: a mostly-blank poster with a single question:

What can we do?

As citizens but especially as data visualization specialists, what can we do to promote progress through knowledge, to ensure a fair and free press, to defend the values that have brought us peace and prosperity?

We'll provide post-its and pens to the participants to write what we can do more of and less of. After the conference, we'll collate the answers and make them available to the community. Until then and after, we hope that the discussion can continue on social media with the hashtag #whatcanwedo.

Pardon the cliché, but really: we can't wait to hear what you have to propose.

How to create parallel coordinates in Excel

tl;dr: Draw a line graph based on normalized data for each category.

What are parallel coordinates?

Parallel coordinates resemble line graphs for time series, except that the horizontal axis represents discrete categories rather than time. While they can appear confusing at first sight, especially given our familiarity with time series, they can often be quite rich on closer inspection.

The first example of parallel coordinates on Wikipedia as of this writing.

The current dataset

This graph is meant to show how wages in Nigeria compare to other countries across several professions related to the housing market. Each country is represented by a color, in default Excel fashion.

This graph shows all the data but it is difficult to read, let alone to see how any given country compares. At first, this seems like a good opportunity for small multiples, with either jobs or country groupings. But reading the text, it appears that the goal is to compare Nigeria against other countries for each profession, and not across professions.

Labor costs in Nigeria are not high. At N2,500 or US$16 per day (US$2 per hour) for artisans and N1,500 or US$10 per day (US$1.25 per hour) for laborers, labor rates in Nigeria appear to be low in comparison with competitor countries (Figure 5).

The bar graph does not show clearly that costs in Nigeria are lower across professions than in most other markets. On a parallel coordinates graph though it becomes quite clear where Nigeria stands against other countries:

By highlighting Nigeria, the message becomes clearly visible. The reader is no longer invited to parse the entire data set to find something of relevance: the graph makes it clear and corresponds to the accompanying text. The other countries are in shades of gray, with minimal differentiation, because they are only relevant in their position relative to Nigeria.

If parallel coordinates can work so well, why do we see them so seldom? First, they have the counterintuitive feature that lines (series) do not represent a progression. The categories are nominal and can be reordered, changing the shape of each series. Also, I personally rarely come across a dataset that lends itself to a good parallel coordinates graphs. Finally, I'm not aware of a popular software with built-in capacities for such graphs. This is why we'll have to trick MS Excel a little to create this one.

How to do it in Excel

Prepare the dataset

The first step is to prepare the dataset. The original one looks like this:

Original dataset

It is in the right structure and can be used to create the graph below. The ranges of the data vary somewhat, with professions reaching $28/hour while others top at $6/hour. 

Single vertical range

The range for the site foreman data distorts the rest of the data, compressing the data in a reduced portion of the vertical axis. Most importantly, parallel coordinates are not meant to compare amplitudes across categories, but position along a range for each category. In fact, parallel coordinates are often used to compare categories with different measures, such as in this example about cars

To make full use of this type of graph, we need to normalize the data so that each profession use the full range of the vertical axis. To do this, start by copying your entire dataset below the existing one. Then in the copy, change the value of the first data point with a formula to normalize. Here's a short explanation of how to do it. Here's what it looks like in my spreadsheet.

After you've applied the formula and copied it to the entire table, it will look like this:

Table with normalized data.

Note how each category has a 1 value, representing the country with the highest value in each category (profession), and a 0 value, representing the country with the lowest value. 

We'll immediately add two rows to our data that will be used later for labelling purposes:

  • Top: 1
  • Bottom: 0

It looks like this:

Draw the graph

Technically, a parallel coordinates is a simple line graph. Select your entire dataset and create a line graph. It will look like this after you've switched rows and columns.

The basics are there, but it needs some love.

  1. Remove the horizontal gridlines.
  2. Add vertical gridlines, one per category.
  3. Make it so that the axis position is on tick marks ("Format Axis")
  4. Change the color scheme to monochromatic gray.
  5. Reduce the size of the lines to 0.75 pts.
  6. Set the range of the vertical axis from 0 to 1, then make it disappear.
  7. Erase the legend.
  8. Make the plot area border and the horizontal axis line disappear.
  9. Choose your font. In my case, it's Source Sans Pro 8.

By now, my graph looks like this.

Next step: labeling:

  1. Label the last point of each series with the series name (excluding "Top" and "Bottom" of course).
  2. Adjust the width of the plot area and graph to place the labels on the outside of the plot area.
  3. To label the top of the vertical axis, select the "Top" series and label it with its value.
  4. Manually change the value to the maximum value of each range (site foreman: 28, plumber: 11, etc.) by double-clicking on each number.
  5. To label the bottom of the vertical axis, select the "Bottom" series and label it with its value.
  6. Manually change the value to the minimum value of each range (site foreman: 1.7, plumber: 1.2, etc.) by double-clicking on each number.
  7. Distance your category label away from the axis (value: 600), so that the bottom labels no longer overlap.

My graph now looks like this:

Next step is colouring.

  1. Set the width of the "Top" and "Bottom" lines to 0 to make them disappear.
  2. Use 50% grey for your axis top and bottom values.
  3. Give a contrasting color to your vertical gridlines. The point is to make them stand out, as opposed to normal gridlines that act as mere reference points.
  4. If relevant, give a contrasting color to your main series.
  5. Adjust the shades and dashes of all series for clarity. Color your series labels accordingly.

It now looks like this:

The final touch is to add a title that clarifies content and message:

This graph can be found in the report "Housing Finance in Nigeria", World Bank Group, 2016.

Thank you, Hans Rosling

I have just learned of Hans Rosling's passing and this is as close to a professional bereavement as it gets. There's no doubt that he is among the few that brought me to the field of information design.

Long before I was working full time in that field, I had been struck by his original TED talk, one of their original viral videos. A few years later, I was thrilled to see that he was on the schedule of a philanthropy conference I had to attend for work. I was hoping to catch a glimpse of him live, like some would want to hear their favorite artist live. As I recall, we arrived too late for his talk. But later, luck stroke and I saw him walking by, so I broke from my group to introduce myself and talk to him.

I wanted to tell him all the potential that I saw for his methods to teach in multiple fields. What an idiot: what I had to do was listen. And listen I did because as soon as he heard that I was working for the World Bank Group, he launched into a spirited speech about the importance of freeing their data. It was before the heydays of open data at the World Bank Group and Rosling was incensed that researchers and the general public had to pay to access the datasets. He told me about well-meaning people with wrong ideas. He saw it as a moral obligation for the Bank, a key part of its mandate to reduce poverty by spreading knowledge. And he was so clear-minded and enthusiastic about it, he made me an internal advocate for the cause. It turns out that he was very right: some time later, after the Bank created its portal for open data, it became the most visited visited page, ahead of the home page and the jobs page (!).

I teach data visualization to managers and the first part of my session is spent justifying why they should care about it. We discuss examples of leaders who use it to make their mark and the high point is a video from the BBC where Rosling summarizes his famous talk on development trends.

A few second after the start of the video, I pause to draw attention to his first two sentences.

Visualization is right at the heart of my own work tool. I teach global health. And I know, having the data is not enough. I have to show it in a way people both enjoy and understand.

Who would have thought that teaching global health meant that data visualization is "right at the heart" of their job? Yet, this realization is what made him special and a thought leader. This is food for thought for everyone who think they don't need good data visualization. And communicating data is not enough: doing it right ca be the difference between being heard and not.

This video is a turning point where participants understand that communicating with data does not have to be boring, that it can be interesting and influential. Suddenly they can see themselves being heard by doing this right.

Rosling was first and foremost a storyteller. He would use whatever means he needed to get a point across — be it boxes or a washing machine. He stumbled onto data visualization because that's what he needed to convey what he saw in the world. In fact, he developed the software behind Gapminder with his son and daughter-in-law, as a true example of that one must do "whatever it takes" to explain.

For the data visualization community, losing Rosling is the equivalent of losing Robin Williams: he was a star with a big heart and a big talent. We can be thankful he was one of us and showed us a thing or two.