
Data visualization through graphs, charts, and other visualization tools is essential for understanding the full story behind the data.
The course “The Complete Tableau Bootcamp for Data Visualization” provides two excellent examples that illustrate the importance of data visualization.
The first set, known as the Anscombe’s Quartet developed by statistician Francis Anscombe, is comprised of four distinct datasets. While the values in each dataset are different, descriptive statistics of each dataset is exactly the same. Therefore, standard convention would have us believe that these datasets would look very similar. When the datasets are graphed, however, the graphs for all four datasets appear very different irrespective of their similar summary statistics. This example shows that, with descriptive statistics and numerical information alone, we cannot always see the full picture.
The second set is known as the Datasaurus Dozen. It was created by Justin Matejka and George Fitzmaurice and inspired by the original Datasaurus dataset created by Alberto Cairo. The Datatsaurus is a set data points that exhibit normal-seeming statistics. However, when these data points are plotted, they show the shape of a dinosaur. The Datasaurus Dozen set is composed of 13 datasets (the Datasaurus, plus 12 others). While each data set to appears drastically different from the other data sets, they all have the same summary statistics (X/Y mean, X/Y standard deviation, and Pearson’s correlation) to the second decimal point.
These examples demonstrate that we should not rely on summary statistics alone and that data visualization tools are critical to proper data analysis. By visualizing data, similarities and differences among data sets become apparent.
To illustrate importance of data visualization, I created the “datasaurus” plots in Tableau using the data provided by the authors mentioned above:
- Datasaurus
- Datasaurus Dozen (use the slider to switch among 13 datasets)