Skip to Main Content

Data Visualization

A Guide for Creating Great Vissualizations

Introduction to EDA

Exploratory data analysis is a state of mind, a way of thinking about data analysis--and also a way of doing it. Certain techniques facilitate the exploration of data, but their use alone does not make on an exploratory data analyst. Instead, it requires a certain approach to the analysis of data, a certain perspective.      - Frederick Hartwig and Brian Dearing

Before visualizing a dataset, a technique known as exploratory data analysis (EDA) can help you to make sense of your data. EDA can help an analyst to find mistakes in their data, check currently held assumptions, and select appropriate visualizations. 

Many of the tools that help you build data visualizations can also be used to explore data. Analysts also rely on more statistically-focused tools, including MATLAB, SPSS, SAS, and JMP. JMP and MATLAB are available free of charge through TERPware; SPSS and SAS are both available at a reduced rate to UMD students and staff. 

Basic Techniques

In his 2018 open-source textbook, Howard Seltman of Carnegie Mellon University identifies some basic techniques for exploring different types of data. Some examples of his technique are summarized below.

1. For categorical data, begin by finding the frequency of data in each category:

undefined

2. For a single variable with a range of values, observe the data's spread by plotting a histogram. Experiment with various bin widths in order to understand any potential deviations within this distribution. For the histograms below, temperature distributions are plotted with bin widths of both 1 degree and 5 degrees F:

undefined

 

3. Box-and-whisker plots can show the distribution of many variables at once:

undefined

4. Scatterplots can show correlation between multiple variables. 

undefined

Additional Resources

add tuftes exploratory book