StatsIQ · AP Statistics

Lesson 1: What Is Statistics?

Unit 1 · Phase 1 · Statistical Practice:** 1 — Formulate Questions

Topics:** Categorical vs. quantitative variables (discrete vs. continuous), individuals/observational units, population vs. sample, parameter vs. statistic, descriptive vs. inferential statistics, what makes a good statistical question, formulating an investigative question, the investigative process (ask → collect → analyze → interpret)

Calculator:** Light intro — what a TI-84 will and won't do for you. No heavy computation this lesson.

Objectives:

Tell the difference between a categorical and a quantitative variable — and between discrete and continuous — for any data you meet.
Sort out who's who in a study: individuals, population, sample, parameter, and statistic, without mixing them up.
Turn a vague curiosity ("Are athletes healthier?") into a sharp, answerable investigative question that data can actually settle.

(a) Warm-Up

Open your phone's screen-time report. You're looking at a tiny statistical study — and you didn't even design it. Your phone recorded a number of minutes every day, sorted them by app, and handed you an average.

Now answer this honestly: is your screen time going up or down?

Here's the thing — to answer that, you need more than a number. You need to know which days got measured (just this week? this month?), what counts as screen time (does a podcast playing in your pocket count?), and what "going up" even means (up compared to what?).

That fuzzy feeling you have right now — "I'm not totally sure what I'm even being asked" — is exactly the problem statistics exists to solve. Every good statistical study starts by pinning down a vague wondering into a question that data can actually answer. That's where we start today, and it's the first thing the new AP exam tests.

(b) Core Concept

Statistics is the science of learning from data. But "data" doesn't mean a spreadsheet of random numbers — it means measurements taken on individuals to answer a question. Let's build the vocabulary, because on the AP exam, using the wrong word costs points.

Individuals and variables

The individuals (also called observational units) are the people, animals, or things you collect data about. They don't have to be people — they could be cars, countries, tweets, or basketball games.

A variable is any characteristic you record about each individual. If your individuals are students, your variables might be height, GPA, favorite sport, and number of siblings.

Variables come in two flavors, and telling them apart is a skill you'll use in every single lesson:

A categorical variable places each individual into a group or category. Eye color, jersey number's team, whether someone passed a test (yes/no), zip code. The values are labels, not amounts — and "doing math" on them (like averaging zip codes) is meaningless.
A quantitative variable is a numerical measurement where arithmetic makes sense. Height, number of siblings, points scored, temperature. You can meaningfully add them and average them.

Watch out: numbers aren't automatically quantitative. A jersey number is a label (categorical) even though it's written with digits. Ask yourself: "Does averaging these produce something meaningful?" Average jersey number = nonsense. Average height = useful. That's your test.

Quantitative variables split further:

Discrete variables can only take separated, countable values — usually whole numbers you get by counting. Number of siblings, number of texts sent today, number of cars in a lot. You can't have 2.5 siblings.
Continuous variables can take any value in an interval — usually things you get by measuring. Height, weight, time, temperature. Between 5.0 and 6.0 feet, infinitely many heights are possible (5.43 ft, 5.431 ft, ...).

Population vs. sample

The population is the entire group you want to know about. The sample is the part of the population you actually collect data from.

Why not just measure the whole population? Usually it's impossible or wildly expensive. You can't survey all 50 million U.S. high schoolers, so you study a sample of, say, 1,500 of them — and use that sample to learn about the whole population. That move, from sample to population, is the engine of the entire course.

Parameter vs. statistic

This is the distinction students mess up most, so slow down here.

A parameter is a number that describes the population. It's usually unknown — that's why we're doing statistics. We use Greek letters for parameters: population mean μ, population proportion p, population standard deviation σ.
A statistic is a number that describes a sample. It's something we can actually compute from data. We use Roman letters: sample mean x̄, sample proportion p̂, sample standard deviation s.

A memory hook: population goes with parameter; sample goes with statistic. Both pairs share a first letter.

Worked mini-example — parameter or statistic?

A polling company wants to know what fraction of all registered U.S. voters approve of a new law. The true (unknown) fraction is p, a parameter — it describes the whole population.

They can't ask everyone, so they survey 1,200 randomly chosen voters and find that 624 approve. The fraction 624/1200 = 0.52 is p̂ = 0.52, a statistic — it describes only the sample.

The whole point of the study is to use the statistic (0.52) to estimate the parameter (p, unknown). The statistic is what we have; the parameter is what we want.

Descriptive vs. inferential statistics

There are two big jobs in statistics:

Descriptive statistics summarize and display the data you have — means, graphs, percentages. You're describing this sample and nothing more. "In our sample of 1,200 voters, 52% approved." That's a fact about the sample.
Inferential statistics use the sample to draw conclusions about the larger population — with a stated level of uncertainty. "We're 95% confident that between 49% and 55% of all registered voters approve." That leap from sample to population is inference, and it's the heart of Units 3, 4, and 5.

A good statistical question

A statistical question is one that anticipates variability — it expects the answers to differ from individual to individual, and it can only be answered by collecting data.

"How tall is the principal?" is not a statistical question — there's one answer, no variability. But "How tall are the students at this school?" is statistical: heights vary, and you'd need to collect data to describe them.

A good statistical question is:

Answerable with data (not opinion or a single fact),
Specific about the variable, the individuals, and the population,
Anticipating variability in the responses.

NEW for 2027 — formulating an investigative question

The revised AP framework adds an explicit skill: formulating an investigative question as the first step of a real statistical study. This is Practice 1, and it shows up on FRQ 1. Don't skip it.

An investigative question is the precise, data-answerable question that drives an entire study. Going from a vague wondering to a sharp investigative question usually means nailing down four things:

The individuals — who or what are we measuring? ("U.S. high school seniors")
The variable(s) — exactly what are we recording, and how? ("hours of sleep on a typical school night, self-reported")
The population — what larger group do we want to generalize to?
The comparison or relationship — are we describing one group, comparing groups, or looking for a relationship?

Watch a vague wondering become an investigative question:

Vague: "Do athletes sleep better?"
Investigative: "Among students at Lincoln High, do varsity athletes get more hours of sleep on a typical school night than non-athletes?"

The second version names the individuals (Lincoln High students), the variable (hours of sleep on a typical school night), and the comparison (athletes vs. non-athletes). Now you could actually go collect data and answer it.

The investigative process

Every statistical study moves through four stages — this is your roadmap for the whole course:

Ask — formulate the investigative/statistical question (Practice 1).
Collect — design a study and gather data well (Practice 2, Lessons 6–8).
Analyze — graph and compute; build models (Practice 3, most of Units 1–5).
Interpret — draw conclusions in context, with honest uncertainty (Practice 4).

We'll spiral through these four moves again and again. Today is almost entirely about Ask — because a study built on a fuzzy question can't be rescued by fancy analysis later.

What the calculator does (and doesn't) do

A quick orientation, since the new exam allows a TI-84 on every question. Your calculator will crunch means, standard deviations, probabilities, and full inference procedures for you — we'll build those skills lesson by lesson.

What it will not do: decide whether a variable is categorical or quantitative, choose the right population, formulate your question, or interpret a result in context. The calculator gives numbers. You give meaning. That division of labor is the single most important idea in this course — and it's why "interpretation is the point" will be a drumbeat from here to the exam.

(c) Worked Examples

Example 1 (easy) — Classify the variables

Problem. A school records the following for each student: (i) grade level (9, 10, 11, 12), (ii) number of AP courses taken, (iii) height in centimeters, (iv) primary mode of transport to school (bus, car, walk, bike).

For each, state whether it's categorical or quantitative; if quantitative, state discrete or continuous.

Strategy. For each variable, ask: "Is averaging these meaningful?" If no → categorical. If yes → quantitative, then ask "counted (discrete) or measured (continuous)?"

Solution.

(i) Grade level — the values 9–12 look numeric, but they're really labels for class standing; averaging them isn't meaningful in context. Categorical. (Acceptable to treat as a category here — it's an ordered label.)
(ii) Number of AP courses — a count of whole units. Quantitative, discrete.
(iii) Height in cm — a measurement that can land anywhere in an interval. Quantitative, continuous.
(iv) Mode of transport — labels with no arithmetic. Categorical.

Interpretation. Notice (i): digits don't make a variable quantitative. Always check whether arithmetic means something.

Example 2 (easy–medium) — Population, sample, parameter, statistic

Problem. A streaming service wants to know the average number of hours all its 40 million subscribers watch per week. It pulls a random sample of 5,000 subscribers and finds their mean is x̄ = 11.2 hours.

Identify the (a) individuals, (b) population, (c) sample, (d) parameter, (e) statistic.

Solution.

(a) Individuals: the subscribers.
(b) Population: all 40 million subscribers.
(c) Sample: the 5,000 subscribers who were measured.
(d) Parameter: the mean weekly hours for all subscribers, μ — unknown.
(e) Statistic: the sample mean, x̄ = 11.2 hours — known.

Interpretation. The company would like to know μ (the parameter) but can only compute x̄ (the statistic). Using 11.2 hours to estimate μ is inference — exactly what Units 3–5 make rigorous.

Example 3 (medium) — Statistical question or not?

Problem. Classify each as a statistical question (anticipates variability, needs data) or not, and fix the non-statistical ones.

(i) "What was LeBron James's point total in last night's game?"

(ii) "How many points per game do NBA starters average this season?"

(iii) "What is my resting heart rate right now?"

Solution.

(i) Not statistical — one fixed answer, look it up, no variability. Fix: "How many points per game do NBA players score this season?"
(ii) Statistical — point totals vary across starters; you must collect data. ✓
(iii) Not statistical — a single measurement on one person at one moment. Fix: "How does my resting heart rate vary across the days of a month?" (now there's variability to study).

Interpretation. The test is variability: if every reasonable observation gives the same answer, it isn't a statistical question.

Example 4 (AP-style) — Build the investigative question

Problem. A health teacher wonders, "Does energy-drink use hurt students' grades?" Rewrite this as a precise investigative question suitable for a statistical study, and identify the individuals, variable(s), population, and the comparison or relationship being studied.

Strategy. Pin down the four pieces: individuals, variable(s) + how measured, population, and the comparison/relationship. Make every term something you could actually record.

Solution.

"Among the 1,800 students at Riverside High this semester, is there an association between the number of energy drinks a student consumes in a typical week (self-reported) and their semester GPA?"

Individuals: students at Riverside High.
Variables: energy drinks per typical week (quantitative, discrete) and semester GPA (quantitative, continuous).
Population: all 1,800 Riverside High students this semester.
Relationship studied: the association between energy-drink use and GPA.

Interpretation. The original wording ("hurt") sneaks in causation and is too vague to measure. The rewrite is measurable and honest — it asks about association, not cause. (Why we can't jump to "causes" yet is Lesson 8. For now: a good investigative question never claims more than the data can deliver.)

(d) Common Mistakes

1. Treating any number as quantitative.

Wrong: "Zip code is quantitative because it's a number." Why it's wrong: averaging zip codes is meaningless; the digits are a label. Fix: apply the averaging test — if the average is nonsense, it's categorical.

2. Swapping parameter and statistic.

Wrong: calling the sample mean a parameter. Why it's wrong: a parameter describes the population (usually unknown); a statistic describes the sample (computed from data). Fix: use the letter cues — Greek (μ, σ, p) = parameter; Roman (x̄, s, p̂) = statistic. population–parameter, sample–statistic.

3. Confusing the sample with the population.

Wrong: "The population is the 5,000 people surveyed." Why it's wrong: those 5,000 are the sample; the population is the whole group you want to learn about. Fix: ask "Who do we ultimately want a conclusion about?" — that's the population.

4. Writing a question with no variability.

Wrong: "How tall is the tallest player?" as a statistical question. Why it's wrong: it has one fixed answer. Fix: phrase it so answers vary across individuals ("How do the heights of the players vary?").

5. Sneaking causation into an investigative question too early.

Wrong: "Does sugar cause hyperactivity in kids?" from survey data. Why it's wrong: observational data usually can't establish cause (Lesson 8). Fix: ask about an association unless the study is a designed experiment.

(e) Practice Problems

Question 1

Which of the following is a categorical variable?

(A) Number of pets a person owns
(B) A person's blood type (A, B, AB, O)
(C) A person's height in inches
(D) The time it takes to run a mile

Question 2

Which is a quantitative discrete variable?

(A) Eye color
(B) Number of text messages sent yesterday
(C) Weight in kilograms
(D) Country of birth

Question 3

A researcher measures the exact reaction time (in seconds) of each participant. This variable is:

(A) Categorical
(B) Quantitative and discrete
(C) Quantitative and continuous
(D) Neither

Question 4

A factory makes 200,000 phone cases per day. An inspector examines 300 of them and finds 12 defective. The number 200,000 best describes the size of the:

(A) Sample
(B) Population
(C) Statistic
(D) Parameter

Question 5

In Problem 4, the proportion 12/300 = 0.04 defective is a:

(A) Parameter
(B) Population
(C) Statistic
(D) Variable

Question 6

The true proportion of all phone cases that are defective is denoted p. This p is a:

(A) Statistic
(B) Sample
(C) Parameter
(D) Individual

Question 7

Which statement describes inferential (not descriptive) statistics?

(A) "In our sample of 300 cases, 4% were defective."
(B) "We're 95% confident the defective rate for all cases is between 2% and 6%."
(C) "The 300 sampled cases had a mean weight of 31 g."
(D) "Here is a bar graph of defect types found in the sample."

Question 8

Which of the following is a statistical question?

(A) "What is the population of Texas?"
(B) "How many hours did I sleep last night?"
(C) "How much do students at our school vary in daily sleep?"
(D) "What is the boiling point of water at sea level?"

Question 9

A jersey number on a basketball team is best classified as:

(A) Quantitative and continuous
(B) Quantitative and discrete
(C) Categorical
(D) A parameter

Question 10

Which is the best investigative question for a statistical study?

(A) "Is coffee good for you?"
(B) "What is the caffeine content of one espresso shot?"
(C) "Among adults at our gym, is there an association between weekly coffee consumption and resting heart rate?"
(D) "Do you like coffee?"

A poll surveys 1,000 of a city's 600,000 adults about a transit plan. Identify each of the following as population, sample, parameter, or statistic:

(a) the 600,000 adults; (b) the 1,000 surveyed; (c) the true % of all adults who support the plan; (d) the 58% of surveyed adults who support it.

For each variable on hospital patients, state categorical or quantitative; if quantitative, discrete or continuous:

(a) number of nights stayed; (b) blood pressure (mm Hg); (c) insurance provider; (d) body temperature (°F).

(Interpretation) A nutritionist claims: "Based on my sample of 50 clients, people who eat breakfast weigh less." Explain, in context, the difference between what this describes about the sample and what it would take to make an inference about all people. Use the words descriptive, inferential, sample, and population.

(Interpretation) A student writes the investigative question: "Are phones bad?" Explain why this is not yet a usable statistical question, then rewrite it as a precise investigative question. Identify the individuals, the variable(s) and how you'd measure them, and the population.

Which step of the investigative process (ask → collect → analyze → interpret) does each describe?

(a) Choosing a random sample of 200 voters and recording their party.

(b) Writing the question "Do seniors and juniors differ in average commute time?"

(d) Making a boxplot of commute times by grade.

---

🔑 Answer Key

1. B. Blood type sorts people into labeled groups → categorical. A, C, D are all numerical measurements where arithmetic is meaningful → quantitative.

2. B. Number of texts is a count of whole units → quantitative discrete. A (eye color) and D (country) are categorical. C (weight) is quantitative but continuous, not discrete.

3. C. Reaction time is measured and can take any value in an interval → quantitative continuous. B is the classic trap: time is measured, not counted, so it's continuous, not discrete.

4. B. The 200,000 daily cases are the entire group of interest → population. A would be the 300 inspected; C/D are numbers describing data, not a group size.

5. C. 0.04 is computed from the sample of 300 → statistic. A (parameter) describes the whole population; B names a group, not a number; D a variable is a characteristic, not this computed value.

6. C. p is the proportion for all cases (the population), usually unknown → parameter. A is the sample analog (p̂); B/D name groups/individuals, not this number.

7. B. Stating confidence about all cases generalizes from sample to population → inferential. A, C, D only summarize the sample itself → descriptive.

8. C. Sleep varies across students and needs data → statistical question. A and D have single fixed answers; B is one value for one person on one night — no variability.

9. C. A jersey number is a label, not an amount (averaging jersey numbers is meaningless) → categorical. A/B wrongly treat the digits as quantitative; D a parameter is a population number, unrelated.

10. C. It names individuals (gym adults), measurable variables (weekly coffee, resting heart rate), and a relationship → a strong investigative question. A is vague and uses "good"; B has one fixed answer (not statistical); D is a yes/no opinion item, not a study question.

11. (a) population (all 600,000 adults); (b) sample (the 1,000 surveyed); (c) parameter (true % of all adults — describes the population, unknown); (d) statistic (58% computed from the sample).

12. (a) quantitative, discrete (counted nights); (b) quantitative, continuous (measured); (c) categorical (insurance provider is a label); (d) quantitative, continuous (measured temperature).

13. Sample answer. The claim "in my sample of 50 clients, breakfast-eaters weigh less" is descriptive — it summarizes a pattern within the sample of 50 and says nothing certain beyond them. To make an inferential claim — that breakfast-eaters weigh less among the whole population of people — she'd need a well-designed study (ideally random selection, and an experiment to address cause) and a procedure that accounts for uncertainty. With only 50 self-selected clients, she can describe the sample but cannot reliably generalize to the population. (Full credit: correctly separates a within-sample description from a population-level generalization using all four terms.)

14. Sample answer. "Are phones bad?" can't be answered with data: "bad" isn't a measurable variable, no individuals or population are specified, and there's no variable to record. A usable rewrite: "Among students at our school, is there an association between hours of daily phone use (self-reported) and self-reported hours of sleep on school nights?" — Individuals: students at our school. Variables: daily phone-use hours and nightly sleep hours, both quantitative, gathered by survey. Population: all students at our school. (Full credit: explains why "bad" isn't measurable, then gives a measurable question with individuals, measured variable(s), and population.)

15. (a) Collect; (b) Ask; (c) Interpret; (d) Analyze.

---

StatsIQ · Lesson 1 of 30 · Unit 1 · Aligned to the 2026–27 AP Statistics framework. Not affiliated with the College Board. AP is a registered trademark of the College Board. Content pending statistical-accuracy review (Isaac).

Lesson 2 →