Mis-employing radar charts to distinguish multidimensional data

in #art7 years ago

One alternate approach to this problem is a parallel coordinates chart which is approximately an unrolled version of our hypothetical chart:


Source: https://en.wikipedia.org/wiki/Parallel_coordinates#/media/File:ParCorFisherIris.png

One reason I like the radar version better is because it results in a compact representation — the polygon that the radial line forms is almost like a data-generated symbol. This allows you to use a sparkline-like condensed representations that can be used within another application. Another reason is that I don’t think individual lines are as memorable as polygon shapes.

Another interesting option is Chernoff faces, which are in a similar vein and are hilarious, but hard for people to take seriously:


Multivariate data abound!

Apparently, there are also Chernoff fish. Anyway, excited by the idea, I got to work. I started out with Nadieh Bremer’s excellent radar chart article and sample code using d3:


The code was so nicely commented!

I removed the radial grid circles to discourage the sensation of continuity between the axes — after all, they now represent different quantities. I also got rid of most of the aesthetic nuances that won’t really be relevant when you scale the graph size down.

In a similar vein, I toyed with the idea of also removing the radial lines / areas between the axes entirely and just having large dots on each axis, but in the end the lines helped as a visual aid to generate an overall polygon that seemed easier to compare than just colored dots.

I had to change a lot of the code to support multiple different axes. Also, configuring each axis one by one based on the dataset sounded painful, so I added some code to automatically generate scales and axes based on the dataset itself: with some simple criteria (currently the type of the variable — is it a string, boolean or a number?), you can make a semi-educated decision about what kind of axis to use. You can also find the minimum and maximum values (in the case of a linear scale) or all unique values (in the case of a categorical variable). You could ostensibly extend this to be smarter, e.g. you could measure the skewness of a variable, and decide to plot with log axes instead.

Here’s an intermediate result, plotting multiple data points on the same set of axes, some categorical, some linear:


This isn’t really practical for more than a few data points of course, since the polygons start to overlap. More importantly, let’s look at a small multiples example with some actual data. My colleague was trying to visualize different configuration parameters for data structures for a research project. Because I know a bit about the subject matter of this dataset, I would expect to see distinct groups of graphs that look similar to each other but different from the others.


I think it mostly accomplishes the main goal we set out for: nodes that have drastically different settings look drastically different (e.g. Trie vs Range Partitioning) and nodes that have similar settings look similar (e.g. all the B-tree variants, all the data page variants, linked list and skip list).

Will it scale up to 40? I think that’s dubious:


This isn’t that surprising — 40 is a lot. I suspect this varies based on the structure of the data you’re looking at. Removing dimensions does feel a bit like cheating, but we are still showing many more dimensions than most alternative visualization types.

One thing to consider is the ordering of the axes: since there is no specific relationship between the variables, it’s unclear what ordering is optimal. However, keeping related axes close to each other might help with generating more distinguishable shapes. Or perhaps grouping categorical and then ordinal axes nearby might help.

Another aspect is color — making similar shapes look similar colors could be a great improvement, e.g. a North-South heavy instances like linked list and skip list could be made to look different than East-West heavy instances like datapage and trie. It’s non-obvious how best to do this.

What I don’t like about this chart is that it makes categorical variables seem ordinal. You have to have a defined order of your category options on the axis line, one way or another. This is more of a semantic nitpick — since the aim is to make different data points look different, this doesn’t necessarily matter that much: the individual values matter less than a successful comparison.

We can do something smarter: the least common option can go closest to the origin point while the most common can go farthest, meaning the most “average” data points would have something like a large even-radiused polygon while the uncommon variables of that data point will show up as “dips” that are easy to tell apart.


Slightly subtler styling. The first 4 shapes remind me of Darwin’s finches for some reason.

All in all though, not bad for a relatively small amount of effort! If you want to try it, here’s some demos and the code:

It’s not bulletproof as I didn’t get to spend much time on it (“the last 20% takes 80% of the time”), but it only is a few lines of code to draw a chart, and it should do the job!

Acknowledgements: Thanks to Nicky Case, John Christian, Katherina Nguyen, Will Strimling and Justin Woodbridge for providing valuable help, advice, and feedback!



Posted from my blog with SteemPress : https://selfscroll.com/mis-employing-radar-charts-to-distinguish-multidimensional-data/

Coin Marketplace

STEEM 0.17
TRX 0.24
JST 0.034
BTC 96422.40
ETH 2763.88
SBD 0.67