MAT 221
January 20
th
, 2016
Elementary Probability and Statistics I
Overview

Statistics
is the science of learning from data

Statistics includes collecting data, organizing data, analyzing data, drawing conclusions
from data and presenting study results.

We often want to know a specific piece of information about a large group of people or
things. If we could get this information from every person or thing in the group, then we
would not need statistics
The two main activities of statistics

Estimating a characteristic of the population

Testing a hypothesis or claim about the population
Chapter 1: Looking at Data – Distribution
1.1 Data

In a study, we collect information(data) from cases. Cases can be individuals, companies
animals, plants, or any objects of interest.

A variable is any characteristic of a case. A variable varies among cases.

A label is a special variable used in some data sets to distinguish the different cases. Each
case has a unique label. Also, a label is not a variable that we are interested in studying. It
is only used to tell the cases apart.

The distribution of a variable tells us what values the variable takes and how often it
takes these values.

Data consists of numbers (or categories) recorded for the cases along with the context.

A quantitative variable is a variable that is given by numerical values for which
arithmetic operations, such as adding and averaging, make sense.

A categorical variable is a variable that is given by one of several categories. What can be
counted is the count or proportion of cases in each category
January 22
nd
, 2016
1.2 Displaying Distributions with Graphs

To present categorical data, use bar graphs and pie charts
o
All percentages of all categories must add up to 100

To correctly interpret a graph, you must analyze the numerical information given the
graph, so as not to be misled by the graph’s shape
o
Read labels and units on the axes

Numerical scales such as timelines should progress from left to right or bottom to top

The slices in a pie chart should represent nonoverlapping parts of a whole

To present quantitative data, use stem plots and histograms

Each vertical bar in a histogram is called a
bin
the width of each bin is called the
bin size
o
Rule of thumb
state with 5 to 10 bins look at the distribution and refine your bins
o
There isn’t a unique or “perfect” solution

Outliers
are observations (numbers) that lie outside the overall pattern of a distribution.