How to create parallel coordinates in Excel

tl;dr: Draw a line graph based on normalized data for each category.

What are parallel coordinates?

Parallel coordinates resemble line graphs for time series, except that the horizontal axis represents discrete categories rather than time. While they can appear confusing at first sight, especially given our familiarity with time series, they can often be quite rich on closer inspection.

The first example of parallel coordinates on Wikipedia as of this writing.

The current dataset

This graph is meant to show how wages in Nigeria compare to other countries across several professions related to the housing market. Each country is represented by a color, in default Excel fashion.

This graph shows all the data but it is difficult to read, let alone to see how any given country compares. At first, this seems like a good opportunity for small multiples, with either jobs or country groupings. But reading the text, it appears that the goal is to compare Nigeria against other countries for each profession, and not across professions.

Labor costs in Nigeria are not high. At N2,500 or US$16 per day (US$2 per hour) for artisans and N1,500 or US$10 per day (US$1.25 per hour) for laborers, labor rates in Nigeria appear to be low in comparison with competitor countries (Figure 5).

The bar graph does not show clearly that costs in Nigeria are lower across professions than in most other markets. On a parallel coordinates graph though it becomes quite clear where Nigeria stands against other countries:

By highlighting Nigeria, the message becomes clearly visible. The reader is no longer invited to parse the entire data set to find something of relevance: the graph makes it clear and corresponds to the accompanying text. The other countries are in shades of gray, with minimal differentiation, because they are only relevant in their position relative to Nigeria.

If parallel coordinates can work so well, why do we see them so seldom? First, they have the counterintuitive feature that lines (series) do not represent a progression. The categories are nominal and can be reordered, changing the shape of each series. Also, I personally rarely come across a dataset that lends itself to a good parallel coordinates graphs. Finally, I'm not aware of a popular software with built-in capacities for such graphs. This is why we'll have to trick MS Excel a little to create this one.

How to do it in Excel

Prepare the dataset

The first step is to prepare the dataset. The original one looks like this:

Original dataset

It is in the right structure and can be used to create the graph below. The ranges of the data vary somewhat, with professions reaching $28/hour while others top at $6/hour. 

Single vertical range

The range for the site foreman data distorts the rest of the data, compressing the data in a reduced portion of the vertical axis. Most importantly, parallel coordinates are not meant to compare amplitudes across categories, but position along a range for each category. In fact, parallel coordinates are often used to compare categories with different measures, such as in this example about cars

To make full use of this type of graph, we need to normalize the data so that each profession use the full range of the vertical axis. To do this, start by copying your entire dataset below the existing one. Then in the copy, change the value of the first data point with a formula to normalize. Here's a short explanation of how to do it. Here's what it looks like in my spreadsheet.

After you've applied the formula and copied it to the entire table, it will look like this:

Table with normalized data.

Note how each category has a 1 value, representing the country with the highest value in each category (profession), and a 0 value, representing the country with the lowest value. 

We'll immediately add two rows to our data that will be used later for labelling purposes:

  • Top: 1
  • Bottom: 0

It looks like this:

Draw the graph

Technically, a parallel coordinates is a simple line graph. Select your entire dataset and create a line graph. It will look like this after you've switched rows and columns.

The basics are there, but it needs some love.

  1. Remove the horizontal gridlines.
  2. Add vertical gridlines, one per category.
  3. Make it so that the axis position is on tick marks ("Format Axis")
  4. Change the color scheme to monochromatic gray.
  5. Reduce the size of the lines to 0.75 pts.
  6. Set the range of the vertical axis from 0 to 1, then make it disappear.
  7. Erase the legend.
  8. Make the plot area border and the horizontal axis line disappear.
  9. Choose your font. In my case, it's Source Sans Pro 8.

By now, my graph looks like this.

Next step: labeling:

  1. Label the last point of each series with the series name (excluding "Top" and "Bottom" of course).
  2. Adjust the width of the plot area and graph to place the labels on the outside of the plot area.
  3. To label the top of the vertical axis, select the "Top" series and label it with its value.
  4. Manually change the value to the maximum value of each range (site foreman: 28, plumber: 11, etc.) by double-clicking on each number.
  5. To label the bottom of the vertical axis, select the "Bottom" series and label it with its value.
  6. Manually change the value to the minimum value of each range (site foreman: 1.7, plumber: 1.2, etc.) by double-clicking on each number.
  7. Distance your category label away from the axis (value: 600), so that the bottom labels no longer overlap.

My graph now looks like this:

Next step is colouring.

  1. Set the width of the "Top" and "Bottom" lines to 0 to make them disappear.
  2. Use 50% grey for your axis top and bottom values.
  3. Give a contrasting color to your vertical gridlines. The point is to make them stand out, as opposed to normal gridlines that act as mere reference points.
  4. If relevant, give a contrasting color to your main series.
  5. Adjust the shades and dashes of all series for clarity. Color your series labels accordingly.

It now looks like this:

The final touch is to add a title that clarifies content and message:

This graph can be found in the report "Housing Finance in Nigeria", World Bank Group, 2016.

Thank you, Hans Rosling

I have just learned of Hans Rosling's passing and this is as close to a professional bereavement as it gets. There's no doubt that he is among the few that brought me to the field of information design.

Long before I was working full time in that field, I had been struck by his original TED talk, one of their original viral videos. A few years later, I was thrilled to see that he was on the schedule of a philanthropy conference I had to attend for work. I was hoping to catch a glimpse of him live, like some would want to hear their favorite artist live. As I recall, we arrived too late for his talk. But later, luck stroke and I saw him walking by, so I broke from my group to introduce myself and talk to him.

I wanted to tell him all the potential that I saw for his methods to teach in multiple fields. What an idiot: what I had to do was listen. And listen I did because as soon as he heard that I was working for the World Bank Group, he launched into a spirited speech about the importance of freeing their data. It was before the heydays of open data at the World Bank Group and Rosling was incensed that researchers and the general public had to pay to access the datasets. He told me about well-meaning people with wrong ideas. He saw it as a moral obligation for the Bank, a key part of its mandate to reduce poverty by spreading knowledge. And he was so clear-minded and enthusiastic about it, he made me an internal advocate for the cause. It turns out that he was very right: some time later, after the Bank created its portal for open data, it became the most visited visited page, ahead of the home page and the jobs page (!).

I teach data visualization to managers and the first part of my session is spent justifying why they should care about it. We discuss examples of leaders who use it to make their mark and the high point is a video from the BBC where Rosling summarizes his famous talk on development trends.

A few second after the start of the video, I pause to draw attention to his first two sentences.

Visualization is right at the heart of my own work tool. I teach global health. And I know, having the data is not enough. I have to show it in a way people both enjoy and understand.

Who would have thought that teaching global health meant that data visualization is "right at the heart" of their job? Yet, this realization is what made him special and a thought leader. This is food for thought for everyone who think they don't need good data visualization. And communicating data is not enough: doing it right ca be the difference between being heard and not.

This video is a turning point where participants understand that communicating with data does not have to be boring, that it can be interesting and influential. Suddenly they can see themselves being heard by doing this right.

Rosling was first and foremost a storyteller. He would use whatever means he needed to get a point across — be it boxes or a washing machine. He stumbled onto data visualization because that's what he needed to convey what he saw in the world. In fact, he developed the software behind Gapminder with his son and daughter-in-law, as a true example of that one must do "whatever it takes" to explain.

For the data visualization community, losing Rosling is the equivalent of losing Robin Williams: he was a star with a big heart and a big talent. We can be thankful he was one of us and showed us a thing or two.

Interview: Andy Kirk's favorite tweet and dataviz pick up lines

Andy Kirk, from VisualisingData.com, is very well known in the data visualization community, as shown by his 14,400 Twitter followers. He built his fame by giving public and private workshops and writing a book (he's working on the next one). His monthly best-of has become a reference since he started it in February 2010. There are also the speaking slots at conferences, the most popular likely being his talk on the visualization of nothing at OpenVis 2014 in Boston.

In fact, he is so famous that it didn't seem useful to ask him the usual questions about him and his work. He's given interviews, participated to podcasts, has a complete bio on his website.  At the same time, I was trying to refresh the format of the interviews I've conducted so far. Doubtful that people make it to the end, I wanted something less predictable. Hopefully, the format remains relevant and you'll learn something about Andy and data visualization.

1. Of the last 50 tweets in your home timeline, which is your favorite and why?

On the morning of 21st July, I chose this: 

The reason is simply that I see this is as a positive sign that football is increasingly embracing the value of data, analytics and visualisation. Congratulations to Bruno and the Metrica Sports team.

2. What's the one thing you would improve on the Minard map? 

Well you’ve covered off most big issues in your (may I say) excellent ’13 facts’ article so let me focus on a pet hate – double spacing between words. There is a glaring double-space felony screaming out between “au dessous” and “de zero”. The lack of salience between the chart boundary lines and amongst the horizontal and vertical gridlines kind of irks me too but I’ll give Charlie a free pass on that one.

3. Who needs data visualization most and doesn't know it?

Well that’s a super question. My first instinct was to consider those in the world who might be considered information poor, perhaps looking at populations from countries considered to belong to the 3rd world. That’s quite a lazy answer though. Instead, I’ll turn towards Whitney Houston – something I suspect not many data visualisation interviewees do – who once stated “I believe the children are the future”. Whitney really was on to something and I think it is particularly resonant for visualisation. The next generation of kids going through all levels of education need to have access to better visualisation literacy, both as consumers and creators. Data will never go away. Tools will come and go but we’ll always need humans in some capacity to read and write visual portrayals of data. So let’s help get the next generation heading in the right direction.

4. If you were to organize a conference on data visualization, how would you call it and why?

Blame Word and me, not Andy.

Blame Word and me, not Andy.

I went through all sorts of derivatives of things like “The Assembly of Visualisers” and “The Congress of Visualisers”. I did the obligatory “VisFest” and even plummeted to the depths of “Izzy Wizzy, Let's Get Vizzy”. But I think I will settle on “The Data Visualisation Show” because I would look to organize a particularly-strongly curated and choreographed “show”, something that would be an extravaganza of creativity (music, performance, dance, all the senses) showcasing data visualisation across the entire spectrum of its application. Also has the (really very clever) double implication of the word “show”.

5. What else would you do if you were not remotely in this field?

My previous life was in Information Management and Data Analysis so, along with my undergraduate study in Operational Research, I’ve never been too far away from some of the core disciplines of this game. I’ll therefore answer this question more like ‘in another life maybe I would have been…’. If I was to take a leap of imagination, I think I would like to have had a go at being a detective (Columbo more than a Rust Cohle type) where I think some of my instincts might serve me well, plus I’ve a good eye for facial recognition. Journalist would be another but that’s too close as is, I guess, being in advertising. Sometimes, on a nice spring/summer day I would like to just drive heavy duty mower and cut the grass on football fields or golf courses. That would be quite pleasant for a short while. I’ve had many ideas for business ventures and inventions down the years but thankfully I never pursued them. They would have been moderate failures at best. Maybe a football pundit. Let’s go with that.

6. How would you introduce yourself to Edward Tufte?

Genuinely, I would thank him for his early books, explain they were very influential in helping me fall in love with this subject. Then I’d shake his hand and walk away to avoid risking any further interaction that might tarnish the experience.

7. What would be your pickup line for a data visualization geek?

AndyKirk-ProfilePic3.jpeg

“Well, hi, really nice to BUMP into you. I’m not from the AREA but I saw you in the LINE for the BAR. You popped up on my RADAR when I heard you ordering the bottle of BUBBLEy. Sorry if I sound like I’ve lost the PLOT, I’m a bit SCATTERY and this probably sounds like a STREAM of nonsense – maybe it is the HEAT – but would you be interested in going for a PIE sometime? I fully understand if I’m not your ISOTYPE but, if you fancy it, VENN would be a good time to meet up?”

Smooooth.

8. Who has less than 300 followers on Twitter and should have much more? 

As you have not offered any constraints on this question I will select four very smart and impressive individuals.

9. What question should I ask to my next guest? 

What’s the one data visualisation passion project that you’ve never had the time, opportunity or capability to undertake but would dearly love to do so one day?