AP Statistics · Lesson 7 of 30
StatsIQ · AP Statistics

Lesson 7: Experimental Design

Unit 1 · Phase 1 · Statistical Practice:** 2 — Collect Data
Topics:** Observational study vs. experiment; explanatory vs. response variables; factors, levels, and treatments; the three principles (control, randomization, replication); completely randomized design; blocking and matched-pairs design; confounding variables; placebo effect, blinding, and double-blind designs
Calculator:** Using TI-84 `randInt(` to randomly assign subjects to treatment groups
Objectives:
  • Explain why only a well-designed **experiment** — not an observational study — can establish a cause-and-effect relationship.
  • Design a **completely randomized experiment**, correctly identifying factors, levels, and treatments, and applying control, randomization, and replication.
  • Convert a completely randomized design into a **matched-pairs design** and explain how blocking reduces the effect of a confounding variable.

(a) Warm-Up

Here's a headline you've probably seen some version of: "Study finds people who drink coffee live longer." Sounds great — pour another cup, you're buying yourself years.

But pump the brakes. The study followed thousands of people, recorded who drank coffee, and tracked how long they lived. Nobody was assigned to drink coffee. So ask yourself: what kind of person chooses to drink lots of coffee? Maybe people with steady jobs, regular routines, and the income to buy a daily latte. Maybe those same things — not the coffee — are what's extending their lives.

That gap between "coffee drinkers live longer" and "coffee causes longer life" is the single most important idea in this lesson. In Lesson 6 you learned how to select a sample without bias. Now you'll learn how to design a study that can actually answer a cause-and-effect question. The tool that does it is the experiment, and its secret weapon is random assignment. By the end, you'll be able to look at any study and say exactly what it can — and cannot — prove.


(b) Core Concept

Two kinds of studies

An observational study measures variables of interest but does not attempt to influence the responses. The researcher watches and records — who already drinks coffee, who already exercises — but assigns nothing.

An experiment deliberately imposes some treatment on subjects in order to observe their response. The researcher is in control: you decide who gets the new drug and who gets the sugar pill.

This distinction is everything, because of one rule the AP exam tests relentlessly:

Only a well-designed experiment can establish a cause-and-effect (causal) relationship. Observational studies can show association, but never causation.

Why? Because of confounding.

Confounding — define it precisely

A confounding variable is a variable that is associated with both the explanatory variable and the response variable, in such a way that its effect on the response cannot be separated from the effect of the explanatory variable.

In the coffee example, lifestyle/income is confounded with coffee drinking. Coffee drinkers tend to have more stable lifestyles, and stable lifestyles tend to extend life — so we can't tell whether longer lives come from the coffee or the lifestyle. Their effects are tangled together.

Note the precise wording. It is not enough to say "a confounding variable affects the response." It must affect the response and be linked to the explanatory variable. That dual link is what makes the two effects impossible to untangle.

The vocabulary of experiments

Running example. A sports scientist wants to know whether caffeine improves sprinting reaction time. She recruits 60 athletes. Each will receive a pill 30 minutes before a reaction-time test (measured in milliseconds — lower is faster).

The three principles of experimental design

1. Control. Keep other variables the same for every group so they can't become confounders. Same test, same time of day, same room temperature, same pill appearance. Control also means including a control group — a baseline for comparison. Here, the 0 mg placebo group is the control group: it tells us what reaction times look like with no caffeine, so we can measure caffeine's effect against it.

2. Randomization. Randomly assign subjects to treatments. This is the heart of the experiment. Random assignment balances out confounding variables — both the ones we know about (athletes' fitness, age, sleep) and the ones we never even thought of. On average, the fit athletes, the tired athletes, and the genetically fast athletes get spread evenly across all three groups, so no group has an unfair edge. This is what lets us conclude causation.

3. Replication. Use enough experimental units in each group so that chance differences average out and a real effect can show through. Sixty athletes (20 per group) is far better than three. Replication also means the experiment could, in principle, be repeated and reproduce the result.

A completely randomized design

In a completely randomized design, all subjects are allocated to the treatments purely by chance — no grouping first. Our caffeine study, completely randomized:

[GRAPH: Flowchart diagram of a completely randomized design. A box on the left labeled "60 athletes" has an arrow labeled "Random assignment" pointing right to three stacked boxes: "Group 1: 20 athletes — 0 mg (placebo)", "Group 2: 20 athletes — 100 mg", "Group 3: 20 athletes — 200 mg". Each of the three boxes has an arrow pointing right to a single box labeled "Measure reaction time (ms)". A final arrow points to "Compare mean reaction times across the three groups."]

Placebos and blinding

Belief is powerful. Subjects who think they got caffeine may try harder — that's the placebo effect, a response to the idea of treatment rather than the treatment itself. We defend against it with a placebo: a dummy treatment (the 0 mg pill) that looks, tastes, and feels identical to the real thing.

Blinding means a participant doesn't know which treatment they received. In a double-blind experiment, neither the subjects nor the people who interact with them and measure the response know who got what. Double-blinding prevents both the subjects' expectations and the researchers' (perhaps unconscious) biases from contaminating the results — for instance, an experimenter who knows a runner had caffeine might subtly cheer louder or round the stopwatch down.

Blocking and the matched-pairs design

Randomization balances confounders on average — but if we already know a variable matters a lot, we can do better by blocking.

A block is a group of experimental units that are similar in a way expected to affect the response. We form blocks first, then randomly assign treatments within each block. This removes that variable's effect from the comparison entirely, rather than just averaging it out.

Suppose elite athletes react much faster than recreational athletes regardless of caffeine. We could block by athlete type: split the 60 into the elite block and the recreational block, then randomly assign the three doses within each block. Now caffeine's effect is measured separately inside each type, so the athlete-type differences can't muddy the comparison.

The matched-pairs design is a special, powerful kind of blocking. There are two common forms:

  1. Pairs of similar subjects: match subjects into pairs that are alike (e.g., two athletes with nearly identical baseline reaction times), then randomly assign one of each pair to each of two treatments.
  2. Each subject is their own pair (before/after): every subject receives both treatments, and we compare the two results within the same person. The order is randomized to control for fatigue or practice effects.

Matched-pairs for the caffeine study (using two treatments, 0 mg vs. 200 mg): each athlete completes the reaction test twice — once after a placebo and once after 200 mg of caffeine — on two separate days, with a coin flip deciding which they get first. We then look at each athlete's placebo-minus-caffeine difference. Because we compare an athlete to themselves, every personal trait — fitness, age, leg length, motivation — is held perfectly constant. The athlete is their own control, which makes any caffeine effect much easier to detect.

Random assignment vs. random sampling

One last crucial pairing:

They do different jobs. An experiment needs random assignment to prove cause and effect; it needs random sampling to extend that conclusion beyond the people in the room.

TI-84: randomly assigning subjects

To randomly assign our 60 athletes, number them 1–60 and use randInt(:

TI-84: MATH → PROB → 5:randInt(
Syntax: randInt(lower, upper, n)
randInt(1, 60, 20)

This draws 20 integers from 1 to 60 — those subjects go to Group 1 (placebo). Ignore any repeats and re-draw to replace them until you have 20 distinct numbers. Repeat for Group 2 (the next 20 distinct numbers, 100 mg); the remaining 20 are Group 3 (200 mg). Set a seed first for reproducibility:

TI-84: store a seed → 7 → STO → MATH → PROB → 1:rand
(That is: 7 [STO►] [MATH ▶ PROB] rand, then ENTER)

(c) Worked Examples

Example 1 (easy) — Identify factor, levels, treatments, response

Problem. A bakery tests whether oven temperature affects how tall its bread loaves rise. It bakes loaves at 350°F, 375°F, and 400°F and measures the height of each loaf in centimeters.

Strategy. Find what's manipulated (factor), its specific values (levels), the combinations applied (treatments), and what's measured (response).

Solution.

Interpretation. Because the bakery actively imposes the temperatures, this is an experiment, so a well-designed version could show that temperature causes a change in rise.

Example 2 (medium) — Spot the confounding

Problem. A school notices that students who eat breakfast in the cafeteria score higher on tests than students who skip breakfast. The principal concludes that cafeteria breakfast causes higher scores. Identify the type of study and explain why the conclusion is flawed, naming a possible confounding variable.

Strategy. Check whether a treatment was imposed. Then look for a variable linked to both breakfast-eating and test scores.

Solution. No treatment was assigned — the school merely recorded who already ate breakfast. This is an observational study, so it cannot establish causation. A plausible confounding variable is family income/home stability: students from more stable, higher-income homes may be more likely to arrive early enough for cafeteria breakfast and more likely to score higher (tutoring, quiet study space, sleep). Income is associated with both the explanatory variable (breakfast) and the response (scores), so its effect can't be separated from breakfast's.

Interpretation. The data show an association, not a causal link. To prove causation, the school would need to randomly assign students to eat or skip breakfast.

Example 3 (AP-style) — Outline a completely randomized design

Problem. A researcher wants to know whether a new study app raises AP Statistics scores. She has 80 volunteer students and access to the app. Outline a completely randomized experiment, addressing control, randomization, and replication.

Strategy. Define two treatments, describe random assignment explicitly, control outside variables, and ensure adequate group sizes. Always describe how you randomize.

Solution.

Interpretation. Because students are randomly assigned, the two groups should be similar on confounders (prior ability, motivation), so a higher mean in the app group would be evidence the app causes improvement.

Example 4 (AP-style) — Convert to matched pairs and justify it

Problem. In Example 3, a colleague worries that prior math ability — which varies enormously among the 80 students — could swamp the app's effect even with randomization. Redesign the study as a matched-pairs experiment and explain why it helps.

Strategy. Use blocking via matching. Pair students on the troublesome variable (prior ability), then randomize within pairs.

Solution. Give all 80 students a baseline AP Statistics quiz and rank them by score. Pair the two highest scorers, the next two, and so on, forming 40 matched pairs of similar ability. Within each pair, flip a coin (or use randInt(1, 2)): one student gets the new app, the other gets the control. After 4 weeks, record each student's practice-exam score and compute the difference within each pair (app score − control score).

Why it helps. Pairing students of equal prior ability means each comparison holds ability essentially constant — that confounder's effect is removed from the comparison rather than just balanced on average. This reduces the variability between the things we're comparing, making a genuine app effect far easier to detect.


(d) Common Mistakes

1. Confusing association with confounding — or claiming causation from an observational study. Students see two variables move together and either jump to "X causes Y" or vaguely wave at "confounding." Remember: an observational study can only show association. And a true confounding variable must be linked to both the explanatory and response variables — not just "something else that affects the response." State both links explicitly.

2. Misusing "control group." "Control" has two meanings and students blur them. Controlling a variable means holding it constant for everyone. A control group is the specific baseline group that gets no treatment (or a placebo). A study can control variables without a control group, but the comparison group is what gives the treatment effect meaning. Don't call any comparison group a "control group" unless it's the no-treatment baseline.

3. Confusing blocking with stratifying. They're analogous ideas — group similar units to reduce variability — but they live in different settings. Stratifying is for sampling (Lesson 6): you split the population into strata, then sample within each. Blocking is for experiments: you split subjects into blocks, then randomly assign treatments within each. If treatments are involved, it's blocking; if you're only selecting a sample, it's stratifying.

4. Forgetting random ASSIGNMENT vs. random SAMPLING. On the exam, "randomly select 60 athletes from the population" (sampling → generalize) is not the same as "randomly assign the 60 athletes to three groups" (assignment → causation). Many students describe one and think they've covered both. To both generalize and prove cause, you need both.

5. Describing randomization without saying how. "We randomly assigned subjects" earns little credit. Say how: "number subjects 1–60 and use randInt / draw names from a hat / use a random number table." Always make the mechanism concrete.


(e) Practice Problems

Question 1
Which of the following is true?
Question 2
A factor in an experiment has 4 levels and is the only factor. How many treatments are there?
Question 3
The main purpose of random assignment in an experiment is to:
Question 4
A confounding variable is one that:
Question 5
In a double-blind experiment:
Question 6
A gardener splits 40 tomato plants into a "south-facing window" block and a "north-facing window" block, then randomly assigns two fertilizers within each block. This is an example of:
Question 7
A study finds that teens who use social media more report more anxiety. A reporter writes, "Social media causes teen anxiety." The best critique is that:
Question 8
Which design has each subject receive both treatments?
Question 9
The chief reason to include a placebo group is to:
Question 10
A researcher uses randInt(1, 50, 25) on a TI-84 to assign subjects to a treatment group and gets a repeated number. She should:

Short answer

A pharmaceutical company tests a new pain reliever. It gives the drug to 200 patients at Clinic A and a placebo to 200 patients at Clinic B, then compares pain scores. Identify one serious flaw in this design and explain how to fix it.

(In context) A coach believes a new warm-up routine reduces injuries. He has 50 players. (a) Describe how to carry out a completely randomized design, including how you would use the TI-84 to assign players. (b) Name one variable you would control.

Explain, in one or two sentences, the difference between random sampling and random assignment and what each one lets you conclude.

(In context) Researchers studying a sleep supplement worry that age strongly affects sleep quality. They have 60 volunteers ranging from 18 to 70. Explain how they could use blocking to handle the age variable, and why blocking is better than ignoring it here.

A study reports: "Students who take music lessons have higher GPAs." A school board member wants to require music lessons to raise GPAs. What kind of study produced this claim, what can it actually show, and what design would be needed to justify the board member's plan?

---

## (f) FRQ Practice

> This is a full 10-point free-response question in the format of the May 2027 AP Statistics exam, focused on Practice 2: Collect Data.

FRQ. A nutrition company claims its new energy bar improves endurance. A fitness blogger decides to test the claim. She posts an announcement, and the first 40 readers who reply are sent a free box of the energy bars. She asks each of them to eat one bar before their next run and report how many minutes they ran before tiring. She compares these times to the average run time those same readers reported in a survey from the previous month and concludes, "The energy bar increased average run time by 7 minutes — it works!"

(a) Explain why the blogger's study does not provide convincing evidence that the energy bar causes longer run times. Identify the type of study and one specific confounding variable. (3 points)

(b) Design a completely randomized experiment to test whether the energy bar improves endurance, using 40 volunteers. Clearly state the treatments, describe how you would randomly assign volunteers to the treatments (including a calculator method), and identify the response variable. (4 points)

(c) Explain the purpose of including a placebo and making the experiment double-blind in your design from part (b). (2 points)

(d) Other than random assignment, state one thing you would hold constant (control) for all volunteers, and briefly explain why. (1 point)

---

### Model Response

(a) This is an observational study with no comparison group — actually closer to a flawed before-and-after, because every reader simply ate the bar and no one received a different treatment; the readers also chose to participate. Because there is no randomly assigned control group, the apparent 7-minute gain cannot be attributed to the bar. A confounding variable is time of year / training progression: the readers ran a month later, so improved fitness from a month of additional training (or warmer weather, or simply trying harder because they knew they were being tested) is associated with both being in the "energy bar" condition and with longer run times — its effect cannot be separated from any effect of the bar.

(b) Recruit the 40 volunteers and number them 1–40.

- Treatments: (1) eat one energy bar before running; (2) eat one placebo bar — identical in look, taste, and texture but with no active ingredients — before running.

- Random assignment: Using a TI-84, enter randInt(1, 40) repeatedly, ignoring any repeats, until 20 distinct numbers are drawn. Those volunteers form the energy-bar group; the remaining 20 form the placebo group.

- Response variable: the number of minutes each volunteer runs before tiring, measured under the same conditions.

- Compare the mean run times of the two groups.

(c) A placebo group provides a baseline so we can separate the effect of the actual bar's ingredients from the placebo effect — the improvement people show simply because they believe they took something helpful. Making the study double-blind (neither the volunteers nor the staff who time the runs know who got the real bar) prevents both the volunteers' expectations and the timers' possible bias from influencing the measured run times.

(d) I would have all volunteers run on the same course (or a treadmill at the same settings) under the same weather/temperature conditions. This is controlled because course difficulty and temperature strongly affect run time; holding them constant ensures any difference between the groups is due to the treatment, not running conditions.

---

### Rubric (10 points)

Part (a) — 3 points

- 1 pt: Correctly identifies the study as observational / lacking a randomly assigned comparison (control) group, so it cannot show causation.

- 1 pt: Names a plausible confounding variable (e.g., additional month of training, weather, self-selection/motivation).

- 1 pt: Explains that the confounder is linked to both the treatment condition and the response, so effects can't be separated (not just "something else affects run time").

Part (b) — 4 points

- 1 pt: States two clear treatments, including a control/comparison (energy bar vs. placebo or no bar).

- 1 pt: Describes a valid random assignment mechanism (numbering subjects + randInt / random table / drawing names) that produces two groups.

- 1 pt: Correctly handles the mechanism (e.g., ignores repeats; results in ~20 per group).

- 1 pt: Identifies the response variable (minutes run before tiring) and indicates comparing group means.

Part (c) — 2 points

- 1 pt: Explains the placebo isolates the treatment effect from the placebo effect (belief-driven response).

- 1 pt: Explains double-blind prevents bias from subjects' expectations and from those measuring the response.

Part (d) — 1 point

- 1 pt: Names one reasonable controlled variable (course, distance, time of day, temperature, instructions) and gives a brief reason tied to fair comparison.

### Where Students Lose Points

- Part (a): Naming a confounder but failing to explain its link to both variables. Just saying "other factors could matter" is too vague for the third point.

- Part (b): Saying "randomly assign the volunteers" without describing how. The exam wants a concrete mechanism. Also, forgetting a control/comparison group costs the first point.

- Part (b): Describing random sampling ("randomly select 40 people") instead of random assignment — these are different and only assignment is asked for here.

- Part (c): Defining double-blind correctly but only mentioning the subjects, not the researchers/measurers — you need both for the point.

- Part (d): Naming a controlled variable but giving no reason, or "controlling" the treatment itself.

---

🔑 Answer Key

### Multiple Choice & Short Answer

1. B. Only a well-designed experiment, through random assignment, can establish causation. (A) wrong — sample size never converts an observational study into a causal one. (C) reverses the truth. (D) random sampling supports generalization, not causation.

2. C — 4. With a single factor, each level is its own treatment, so 4 levels = 4 treatments. (D) 8 would require two factors of related levels; there's only one factor.

3. C. Random assignment balances confounders across groups, which is what permits a causal conclusion. (A) describes random sampling/generalization. (B) assignment doesn't change sample size. (D) it tends to make groups similar in size but doesn't guarantee exact equality, and that isn't its purpose.)

4. B. A confounder is tied to both the explanatory and response variables. (C) is the classic trap — affecting only the response is not confounding. (A) and (D) are unrelated.

5. C. Double-blind = neither subjects nor those interacting with/measuring them know the assignment. (A) and (B) each describe single-blinding.

6. C. Forming blocks (window facings) and randomizing treatments within them is a randomized block design. (A) stratifying is for sampling, not experiments. (B) a completely randomized design wouldn't block first. (D) matched pairs would pair individual plants or give each plant both fertilizers.

7. B. It's observational, so a confounder (e.g., less sleep, which raises anxiety and correlates with more screen time) could explain the link. (A), (C), (D) are not the central flaw.

8. C. In the before/after matched-pairs design each subject receives both treatments and serves as their own control. (A) each subject gets only one treatment. (B) stratifying is sampling. (D) not a real design type.

9. B. The placebo lets us separate the real treatment effect from the placebo effect. (A) a placebo is a control, not an extra factor level of interest. (C) a placebo enables but does not by itself create double-blinding. (D) concerns generalization.

10. C. Ignore repeats and keep drawing until you have 25 distinct subjects. (A) a subject can't be assigned twice. (B) starting over is unnecessary. (D) changes the question entirely.

11. Flaw: The treatment (drug vs. placebo) is confounded with the clinic. Any difference in pain scores could be due to differences between Clinic A and Clinic B (different doctors, patient populations, regions) rather than the drug. Fix: Within each clinic (or pooling all 400 patients), randomly assign patients to the drug or the placebo, so each clinic contains both groups. Then clinic differences are balanced across treatments.

12. (a) Number the 50 players 1–50. On a TI-84, enter randInt(1, 50) repeatedly, ignoring repeats, until 25 distinct numbers are drawn; those players use the new warm-up routine before games, and the other 25 use the usual warm-up (control). Track each group's injury rate over the season and compare. (b) Control example: have both groups play the same number of games / practice on the same field / be coached the same way — so injury differences reflect the warm-up, not workload or conditions.

13. Random sampling is how subjects are chosen from the population; it allows the results to be generalized to that population. Random assignment is how subjects are sorted into treatment groups; it allows a cause-and-effect conclusion. Sampling addresses who's in the study; assignment addresses who gets which treatment.

14. Sort the 60 volunteers into age blocks — e.g., 18–35, 36–55, 56–70 — then randomly assign the supplement and the placebo within each block. This removes age's effect from the treatment comparison, because within a block everyone is similar in age. Blocking is better than ignoring age here because age strongly influences sleep quality; left unblocked, age-driven variation would add noise that could hide the supplement's true effect.

15. This is an observational study (no one was assigned to take lessons), so it can show only an association between music lessons and GPA — not that lessons cause higher GPA. A confounder such as family resources or student motivation could drive both. To justify requiring lessons, you would need a randomized experiment: randomly assign students to take music lessons or not, control other factors, and compare GPAs. Only random assignment supports the causal claim the board member needs.

### FRQ Rubric (restated)

- (a) 3 pts: observational/no random control group (1) + names confounder (1) + explains link to both variables (1).

- (b) 4 pts: two treatments incl. control (1) + valid random-assignment mechanism (1) + correct execution/group sizes (1) + response variable + compare means (1).

- (c) 2 pts: placebo isolates placebo effect (1) + double-blind removes subject and measurer bias (1).

- (d) 1 pt: one controlled variable with a valid reason (1).

- Total: 10 points.

---

StatsIQ · Lesson 7 of 30 · Unit 1: Exploring One-Variable Data & Collecting Data · Phase 1: Data & Design

This lesson aligns to the revised 2026–27 AP Statistics Course and Exam Description (first exam May 2027). "AP" is a registered trademark of the College Board, which was not involved in the production of, and does not endorse, this product.

Accuracy review: All counts, treatment combinations, and TI-84 randInt usage in this lesson were independently recomputed and verified. Reviewed for statistical accuracy by Isaac, retired actuary.

← Lesson 6
Lesson 8 →
Score: 0/0 correct