- Library
- Research Guides
- Data & Statistics
- Data Literacy

Sources and support for finding and using numerical, spatial, and statistical data.

The two terms are often used interchangably. Although there are some commonly understood distinctions, there are also grey areas: statistics are certainly a kind of data, and data are used to generate statistics.

Statistics often are:

- facts or figures
- time series
- tables, charts, or graphs
- to support an argument
- 'ready to use'

Data can generally be used to:

- test hypotheses
- generate custom tables
- look at responses of individuals
- analyze in SPSS, SAS, or Stata
- do Regression, t-test, ANOVA, etc

Another distinction to consider is whether you need microdata or aggregate data. Microdata is the original, unprocessed (except to protect privacy of participants) information: for example, income reported by each household, height and species of each tree in a park. Aggregate data is summarized and combined in some way: average income in a census block, number of oak trees in a city park.

First things first: **slow down**. Don't focus on the numbers in the table right away. Instead, carefully review the details around the edges: what information is given by the title or header? What are the row and column labels? Are there any footnotes or references underneath the table? All of this information can help you understand the context of the numbers that are inside the table.

Questions to ask (and answer!) when looking at numerical data or statistics:

- What's being counted or summarized?
- What units are being used: thousands of dollars (CAD? USD?), individual spectators, percentage change (from what?), percentage of total, etc
- Who collected and/or summarized the data?
- What questions were asked or what sources used to find, solicit, compile, collect, create the data?
- What was the purpose of collecting the data in the first place?
- How does all of that fit with what
**YOU**want to do with these numbers?

Many factors can affect what data is collected and why, and other factors affect what can be shared with others. A few common issues that arise with published data and statistics include 1. the need to protect privacy, 2. the effort to control for accuracy and precision, 3. mandated measurements (such as the census), and 4. pre-existing categories with which to organize the data.

1. Because of privacy concerns, some data may be restricted because the population being counted is so small, it would be possible to identify an individual person or business.

2. If there are concerns regarding the methods used to collect the data or if it wasn't possible to confirm the accuracy of the data, it might not be made available. Some statistical calculations require specific criteria to be considered valid: for example, if the number of data points is too small, or if the method of obtaining the data was inconsistent, the statistic calculation isn't considered to be accurate and may not be published.

3. Many surveys, including the national census, are required by law or regulation; in some cases, the specific questions and responses collected are explicitly outlined by a government agency or ruling. These regulations can change over time, so the questions asked 10 or 20 years ago might not be the same as those asked today. Consequently, comparing data over time may be complicated or impossible.

4. Standardized methods and categories are often used by many groups to more easily share and compare data sets and statistics. It is convenient to use these standards, but they might not perfectly match your specific question. For example, NAICS codes are commonly used in Canada and the US to collect economic and labour statistics based on industry. Each specialized industry will have a single NAICS code that is a subset of a larger category, which is in turn part of an even larger category, etc. The hierarchical arrangement defined by NAICS might not always fit the way you would like to categorize the industry. These codes can also change over time, something to keep in mind if you are looking at statistics from different decades: NAICS 1997 had 3 codes for internet-related industries. NAICS 2012 has 57.

- Last Updated: Sep 22, 2020 11:29 AM
- URL: https://researchguides.library.brocku.ca/data
- Print Page