What's in the Guide
This guide provides access and information about a range of resources for finding and accessing numerical data and statistics.
In addition to the resources available here, please also check the Business research guides for information about specific business and financial data sources. The Map Library has several guides and tutorials for finding and using geospatial data and GIS tools.
Is there a difference between statistics and data?
The two terms are often used interchangably. Although there are some commonly understood distinctions, there are also grey areas: statistics are certainly a kind of data, and data are used to generate statistics.
Statistics often are:
- facts or figures
- time series
- tables, charts, or graphs
- to support an argument
- 'ready to use'
Data can generally be used to:
- test hypotheses
- generate custom tables
- look at responses of individuals
- analyze in SPSS, SAS, or Stata
- do Regression, t-test, ANOVA, etc
What's going on with these numbers?
First things first: slow down. Don't focus on the numbers in the table right away. Instead, carefully review the details around the edges: what information is given by the title or header? What are the row and column labels? Are there any footnotes or references underneath the table? All of this information can help you understand the context of the numbers that are inside the table.
Questions to ask (and answer!) when looking at numerical data or statistics:
- What's being counted or summarized?
- What units are being used: thousands of dollars (CAD? USD?), individual spectators, percentage change (from what?), percentage of total, etc
- Who collected and/or summarized the data?
- What questions were asked or what sources used to find, solicit, compile, collect, create the data?
- What was the purpose of collecting the data in the first place?
- How does all of that fit with what YOU want to do with these numbers?
Limitations to keep in mind
Many factors can affect what data is collected and why, and other factors affect what can be shared with others. A few common issues that arise with published data and statistics include 1. the need to protect privacy, 2. the effort to control for accuracy and precision, 3. mandated measurements (such as the census), and 4. pre-existing categories with which to organize the data.
1. Because of privacy concerns, some data may be restricted because the population being counted is so small, it would be possible to identify an individual person or business.
2. If there are concerns regarding the methods used to collect the data or if it wasn't possible to confirm the accuracy of the data, it might not be made available. Some statistical calculations require specific criteria to be considered valid: for example, if the number of data points is too small, or if the method of obtaining the data was inconsistent, the statistic calculation isn't considered to be accurate and may not be published.
3. Many surveys, including the national census, are required by law or regulation; in some cases, the specific questions and responses collected are explicitly outlined by a government agency or ruling. These regulations can change over time, so the questions asked 10 or 20 years ago might not be the same as those asked today. Consequently, comparing data over time may be complicated or impossible.
4. Standardized methods and categories are often used by many groups to more easily share and compare data sets and statistics. It is convenient to use these standards, but they might not perfectly match your specific question. For example, NAICS codes are commonly used in Canada and the US to collect economic and labour statistics based on industry. Each specialized industry will have a single NAICS code that is a subset of a larger category, which is in turn part of an even larger category, etc. The hierarchical arrangement defined by NAICS might not always fit the way you would like to categorize the industry. These codes can also change over time, something to keep in mind if you are looking at statistics from different decades: NAICS 1997 had 3 codes for internet-related industries. NAICS 2012 has 57.
Maps, Data & GIS
Please feel free to contact anyone in Maps, Data & GIS; we want to hear from you:
General Inquiries: Maps & GIS
General Inquiries: Data
Linda Lowry, Business & Economics Librarian
on leave July 2014 - June 2015
Evelyn Smith, Data Research Assistant
Phone: 905-688-5550 Ext. 4897
Sharon Janzen, Map Library Associate/Geospatial Data Coordinator
Phone: 905-688-5550 Ext. 5890