Last week the NDSR Boston cohort visited with Helen Bailey, a digital curation analyst at MIT. In her spare time, Helen has become a data visualization expert. Helen provides data visualization support to the MIT Libraries and is sharing her knowledge of data visualization through presentations and workshops. If you think you are unfamiliar with data visualization, think again. I guarantee you have used data visualizations and maybe even created a few.
To set the record straight, let’s define the term before giving some examples and talking about why data visualizations are useful and what it takes to produce them. Helen offered the following two definitions:
“Information visualization is a mapping between discrete data and a visual representation.” from Lev Manovich, “What Is Visualization”
“Information visualization is a set of technologies that use visual computing to amplify human cognition with abstract information.” from Stuart Card, “Information Visualization”
While both definitions make sense, I prefer the second definition because a well-chosen visualization really provides meaning and understanding where there might otherwise be only information overload. It seems to me that the age old saying, “a picture is worth a thousand words” is appropriate when discussing the purpose and usefulness of data visualizations.
Helen notes that visualizing data can be used to summarize a data set, highlight specific aspects of the data, and identify patterns and outliers. Data, typically organized in tables or spreadsheets, can be almost impossible to digest, especially in large quantities. Even smaller data sets contain so many rows and columns they literally run right off your screen making it difficult to draw conclusions or spot trends. Organizing the raw data into visual representations is really the only practical way to make the data useful.
The first steps in creating data visualizations are to determine:
- What questions will the data answer?
- How the visualization will be used?
- What is the best type of visualization to use?
- Who is the target audience using the visualization?
It’s important to answer each of these questions because there are so many types of visualizations available. You can’t be a one-trick pony, reusing the same representation for all occasions. Certain representations work better for temporal (when), geospatial (where), topical/statistical (what, how much), relational (with whom) and hierarchical (ordered relationships) data.
The types of representations range from simple to complex and from traditional to innovative. The names of the visualizations in the list below will either allow you to draw a picture in your mind’s eye or send you for a dictionary. A few types of the many available to consider are:
- Gantt Charts, Stream Graphs and Alluvial Diagrams for temporal representations
- Choropleths and Cartograms for geospatial representations
- Histograms, Pie Charts and Heat Maps for topical/statistical representations
- Node-link, Chord and Arc Diagrams for relational representations
- Dendograms, Treemaps and Radial Trees for hierarchical representations
It’s easy to be overwhelmed by the choices. Helen presented a decision tree designed to help identify which representation to use depending on the parameters of your project. Do you need to show comparisons in your data over time with just a few periods of time but with many categories? Try a Column or Line Chart. But remember what Ben Fry mentions in Visualizing Data, data visualization is just another form of communication and it will only be successful if the representations make sense to your audience.
Are you interested in creating a data visualization for your project? Four members of our group were, and one of us has already created one of her own. Simple data visualizations, line, bar and pie charts, can be created with the spreadsheet application installed on your computer. If you have more complex data and feel like challenging yourself, there are several online tools available. Helen recommended and gave brief introductions to Voyager (http://vega.github.io/voyager), Tableau (http://www.tableau.com) and RAW (http://raw.densitydesign.org/) to mention only a few. Do be forewarned though, some of these data visualization tools have a steep learning curve and may be easier to use if you have some experience with coding and scripting.
If all else fails, use your Photoshop skills and convert your favorite data visualization into a piece of modern art or a poster to hang on your wall.
Thank you Helen Bailey for introducing NDSR Boston to data vis! And thank you for reading.
- Spreadsheet image produced from a data set from the US Dept. of Labor, Bureau of Labor Statistics retrieved from http://www.bls.gov
- U.S. map image, Unemployment data visualization, created by Mike Bostock, retrieved from http://bl.ocks.org/mbostock/4060606
- Alluvial Diagram image retrieved from http://www.mapequation.org/apps/AlluvialGenerator.html
- Cartogram image retrieved from http://www.stephabegg.com/home/projects/cartograms
- Arc Diagram image retrieved from http://www.chrisharrison.net/index.php/Visualizations/BibleViz