Skip to Main Content

Keep Calm and Become Data Savvy

This workshop will help you fill in the context that makes numerical data and statistics more than something to skip over or run away from.

What does it say?

When reading data tables, charts, or graphs in a published article, consider these three steps:

  1. What information is presented: title, units, caption, source?
  2. How does the author describe both the data and the way it's presented in the table, chart or graph, in the text of the article?
  3. How does the author interpret or analyze the data, in the text of the article?

Example: What does this data table say?

Ignatow, Gabe, Sarah M. Webb, Michelle Poulin, et al. 2012. Public Libraries and Democratization in Three Developing Countries: Exploring the Role of Social Capital. Libri. 62(1): 67-80.

Example: What does this graph say?

screen capture from 200 countries in 200 years video

What's going on with these numbers?

First things first: slow down. Don't focus on the numbers in the table right away. Instead, carefully review the details around the edges: what information is given by the title or header? What are the row and column labels? Are there any footnotes or references underneath the table? All of this information can help you understand the context of the numbers that are inside the table.

Questions to ask (and answer!) when looking at numerical data or statistics:

  • What's being counted or summarized?
  • What units are being used: thousands of dollars (CAD? USD?), individual spectators, percentage change (from what?), percentage of total, etc
  • Who collected and/or summarized the data?
  • What questions were asked or what sources used to find, solicit, compile, collect, create the data?
  • What was the purpose of collecting the data in the first place?
  • How does all of that fit with what YOU want to do with these numbers?

Limitations to keep in mind

Many factors can affect what data is collected and why, and other factors affect what can be shared with others. A few common issues that arise with published data and statistics  include 1. the need to protect privacy, 2. the effort to control for accuracy and precision, 3. mandated measurements (such as the census), and 4. pre-existing categories with which to organize the data. 

1. Because of privacy concerns, some data may be restricted because the population being counted is so small, it would be possible to identify an individual person or business.

2. If there are concerns regarding the methods used to collect the data or if it wasn't possible to confirm the accuracy of the data, it might not be made  available. Some statistical calculations require specific criteria to be considered valid: for example, if the number of data points is too small, or if the method of obtaining the data was inconsistent, the statistic calculation isn't considered to be accurate and may not be published.

3. Many surveys, including the national census, are required by law or regulation; in some cases, the specific questions and responses collected are explicitly outlined by a government agency or ruling. These regulations can change over time, so the questions asked 10 or 20 years ago might not be the same as those asked today. Consequently, comparing data over time may be complicated or impossible.

4. Standardized methods and categories are often used by many groups to more easily share and compare data sets and statistics. It is convenient to use these standards, but they might not perfectly match your specific question. For example, NAICS codes are commonly used in Canada and the US to collect economic and labour statistics based on industry. Each specialized industry will have a single NAICS code that is a subset of a larger category, which is in turn part of an even larger category, etc. The hierarchical arrangement defined by NAICS might not always fit the way you would like to categorize the industry. These codes can also change over time, something to keep in mind if you are looking at statistics from different decades: NAICS 1997 had 3 codes for internet-related industries. NAICS 2012 has 57.