Two AP Stats classes take the same quiz. You're told both classes have a mean of 80. Same class, right? Not so fast.
Class A scores: 78, 79, 80, 81, 82.
Class B scores: 60, 70, 80, 90, 100.
Both average to 80. But Class A is packed tightly around 80, while Class B is spread all over the map. If you were a student deciding which class curve to gamble on, that difference would matter a lot.
In Lesson 2 you learned to describe a distribution's shape, center, and spread by eye. Now you'll put exact numbers on center and spread — and learn that a single number like "the mean is 80" almost never tells the whole story. A summary needs at least a center and a spread, and which pair you pick depends on the shape. By the end of this lesson you'll know exactly which numbers to report, how to compute them by hand, and how to pull them off your calculator in seconds.
A measure of center is a single number meant to represent a "typical" value.
x̄ for a sample, μ for a population): add the values, divide by how many there are. x̄ = Σx / n. The mean is the balance point of the distribution — the spot where the data would balance if the dots had weight.The single most important idea here: the median is resistant (also called robust) — extreme values barely move it. The mean is not resistant — a single huge value drags it along.
Quick proof. Take 1, 2, 3, 4, 5. Mean = 3, median = 3. Now change the 5 to a 500: the data are 1, 2, 3, 4, 500. The median is still 3 (the middle position didn't change), but the mean jumps to 102. The median shrugged off the outlier; the mean got hijacked by it. Remember this — it drives every "which summary should I use" decision in this course.
Center tells you where; spread tells you how variable.
Q3 − Q1, the spread of the middle 50% of the data. Because it ignores the top and bottom quarters, the IQR is resistant.There are two versions, and the difference is the denominator.
Population standard deviation (you have the whole population):
σ = sqrt( Σ(x − μ)² / N )
Sample standard deviation (you have a sample and want to estimate the population's spread):
s = sqrt( Σ(x − x̄)² / (n − 1) )
Why divide by n − 1 instead of n for a sample? A sample tends to be a little less spread out than the full population it came from, so dividing by the smaller number n − 1 nudges the estimate up to compensate. That n − 1 is called the degrees of freedom, and it makes s an unbiased estimator of σ. On the AP exam you are almost always working with a sample, so you almost always use s (the n − 1 version). On the TI-84 this is the value labeled Sx; the population version is σx.
Let's compute the sample standard deviation of this clean data set:
{ 2, 4, 4, 6, 9 }, with n = 5.
Step 1 — Find the mean.
x̄ = (2 + 4 + 4 + 6 + 9) / 5 = 25 / 5 = 5
Step 2 — Find each deviation (x − x̄).
| x | x − x̄ | (x − x̄)² |
|---|---|---|
| 2 | −3 | 9 |
| 4 | −1 | 1 |
| 4 | −1 | 1 |
| 6 | 1 | 1 |
| 9 | 4 | 16 |
Step 3 — Add up the squared deviations.
Σ(x − x̄)² = 9 + 1 + 1 + 1 + 16 = 28
Step 4 — Divide by n − 1 to get the sample variance.
s² = 28 / (5 − 1) = 28 / 4 = 7
Step 5 — Take the square root to get the sample standard deviation.
s = sqrt(7) ≈ 2.6458
So s ≈ 2.65. Interpretation: the quiz scores typically fall about 2.65 units away from the mean of 5. (If you had instead treated these five numbers as an entire population, you'd divide by N = 5: σ = sqrt(28/5) = sqrt(5.6) ≈ 2.3664. Notice the population value is smaller — that's the n − 1 adjustment at work.)
The five-number summary captures a distribution with five values:
Minimum, Q1, Median, Q3, Maximum
AP/TI-84 convention for quartiles: when there's an odd number of values, the overall median is excluded from both halves before finding Q1 and Q3.
A boxplot is the picture of the five-number summary: a box from Q1 to Q3 (with a line at the median), and "whiskers" extending to the min and max — or, in a modified boxplot, to the most extreme values that are not outliers, with outliers drawn as separate dots.
Take the set { 12, 15, 15, 18, 20, 22, 25 }. The median is 18. The lower half is {12, 15, 15} so Q1 = 15; the upper half is {20, 22, 25} so Q3 = 22. Five-number summary: Min 12, Q1 15, Median 18, Q3 22, Max 25. IQR = 22 − 15 = 7.
[GRAPH: Horizontal boxplot of {12, 15, 15, 18, 20, 22, 25}. Number line from 10 to 26. Left whisker from 12 to the box at 15. Box spans Q1=15 to Q3=22 with a vertical median line at 18. Right whisker from 22 to 25. Distribution looks roughly symmetric, slightly right-skewed.]
You don't get to eyeball outliers — there's a rule. A value is an outlier if it falls below the lower fence or above the upper fence:
Lower fence = Q1 − 1.5 × IQR
Upper fence = Q3 + 1.5 × IQR
Any data value outside [lower fence, upper fence] is flagged as an outlier. This is the only definition of "outlier" the AP exam accepts. "It just looks far away" is not an answer.
Because the mean and SD get pulled toward extreme values while the median and IQR don't:
This leads to the decision rule that shows up constantly on the exam:
Symmetric, no outliers → report the mean and standard deviation.
Skewed or has outliers → report the median and IQR (the resistant pair).
The p-th percentile is the value at or below which p% of the data fall. If your SAT score is at the 90th percentile, 90% of test-takers scored at or below you. The median is the 50th percentile; Q1 is the 25th; Q3 is the 75th. Percentiles let you compare a single value to a whole distribution regardless of units.
STAT → 1:Edit — type your data into list L1.STAT → CALC → 1:1-Var Stats, set List: L1, leave FreqList blank, choose Calculate. - x̄ = the mean
- Sx = the sample standard deviation (the n − 1 version — your default)
- σx = the population standard deviation (the n version)
- n = sample size
- Scroll down for minX, Q1, Med, Q3, maxX — your whole five-number summary
2nd → Y= (STAT PLOT) → Plot1 → On, choose the modified boxplot icon (the one that shows outliers as dots), set Xlist: L1, then ZOOM → 9:ZoomStat.Problem. Seven students reported hours of sleep: 12, 15, 15, 18, 20, 22, 25 (yes, a couple of overachievers). Find the mean, median, mode, range, and IQR.
Strategy. Sort (already sorted), then apply each definition.
Solution.
(12+15+15+18+20+22+25)/7 = 127/7 ≈ 18.14.Interpretation. A typical value is around 18, and the middle 50% of students span a 7-unit-wide window.
Problem. A barista records minutes customers waited: 18, 20, 21, 22, 23, 24, 25, 26, 48. Is the 48 an outlier by the 1.5×IQR rule?
Strategy. Build the five-number summary, get IQR, compute the fences, compare.
Solution. Sorted, n = 9. Median = 5th value = 23. Exclude it. Lower half {18, 20, 21, 22} → Q1 = (20+21)/2 = 20.5. Upper half {24, 25, 26, 48} → Q3 = (25+26)/2 = 25.5.
IQR = 25.5 − 20.5 = 5.
Upper fence = Q3 + 1.5 × IQR = 25.5 + 1.5(5) = 25.5 + 7.5 = 33.
Lower fence = Q1 − 1.5 × IQR = 20.5 − 7.5 = 13.
Since 48 > 33, the 48 is an outlier. No other value falls outside [13, 33].
Interpretation. That 48-minute wait is a genuine outlier, not just a big number — one customer waited far longer than the typical pattern. (For the record: x̄ ≈ 25.22 but the median is 23; the outlier pulls the mean up, which is exactly why median + IQR is the better summary here.)
Problem. Nine employees' annual salaries (in $1,000s): 32, 35, 38, 40, 42, 45, 48, 52, 120. Should you summarize with mean + SD or median + IQR? Justify, and report your choice.
Strategy. Check shape/outliers first, then report the matching pair.
Solution. Median = 5th value = 42. Q1 = median of {32,35,38,40} = (35+38)/2 = 36.5; Q3 = median of {45,48,52,120} = (48+52)/2 = 50. IQR = 50 − 36.5 = 13.5. Upper fence = 50 + 1.5(13.5) = 50 + 20.25 = 70.25. Since 120 > 70.25, the $120k salary is an outlier, and the data are clearly right-skewed.
Because there's an outlier and right skew, use the median and IQR: median = $42k, IQR = $13.5k. (For contrast, the mean is x̄ ≈ $50.2k — pulled well above the median by that one executive salary, which is exactly why it's the wrong summary here.)
Interpretation. "A typical employee earns about $42,000, and the middle half of salaries span $13,500." Reporting the mean of $50.2k would overstate what a typical employee actually makes.
Problem. A student enters daily high temperatures 68, 70, 72, 73, 75, 76, 78, 90 into L1 and runs 1-Var Stats. The calculator shows x̄ = 75.25, Sx ≈ 6.78, σx ≈ 6.34, Med = 74, Q1 = 71, Q3 = 77. (a) Which standard deviation should the student report and why? (b) Use the 1.5×IQR rule to check whether 90° is an outlier.
Strategy. Identify sample vs. population; then run the fence test.
Solution.
(a) These eight days are a sample of daily highs, so report Sx ≈ 6.78, the n − 1 version. σx would only be right if these eight days were the entire population of interest.
(b) IQR = Q3 − Q1 = 77 − 71 = 6. Upper fence = 77 + 1.5(6) = 77 + 9 = 86. Since 90 > 86, 90° is an outlier.
Interpretation. That 90° day stands out as unusually hot relative to the rest of the week, by the formal rule — not just because it "looks high."
1. Dividing by n when you should divide by n − 1. Students grab the population formula (or read σx off the calculator) for sample data. On the AP exam your data are almost always a sample, so use s with n − 1 — that's Sx, not σx. Mixing these up changes your answer and can cost points.
2. Calling any extreme-looking value an "outlier" without the rule. "1200 is way bigger than the others, so it's an outlier" earns nothing. You must compute Q1 − 1.5×IQR and Q3 + 1.5×IQR and show the value falls outside the fences. The rule is the answer; the eyeball is not.
3. Using the mean (and SD) to summarize skewed data. If a distribution is skewed or has an outlier, the mean gets dragged toward the tail and misrepresents "typical." Switch to the resistant pair: median and IQR.
4. Forgetting to sort, or including the median in the halves. Quartiles require sorted data. And with an odd count, the AP/TI-84 convention is to exclude the overall median before finding Q1 and Q3. Skipping either step gives wrong quartiles, a wrong IQR, and wrong fences.
5. Reporting a center with no spread (or vice versa). "The mean is 80" is half an answer. A complete summary always pairs a center with a matching spread (mean + SD, or median + IQR) — and ideally a comment on shape.
4, 6, 6, 7, 8, 9, 11, the mean is closest to:4, 6, 6, 7, 8, 9, 11, the five-number summary is Min 4, Q1 6, Med 7, Q3 9, Max 11. The IQR is:Q3 + 1.5×IQR) is:8. (in context) A clinic records patient ages: 10, 12, 13, 14, 15, 16, 40. Q1 = 12, Q3 = 16, so IQR = 4. Is the age 40 an outlier by the 1.5×IQR rule, and which summary should the clinic report?
(A) Not an outlier; report mean + SD
(B) Outlier; report mean + SD
(C) Outlier; report median + IQR
(D) Not an outlier; report median + IQR
9. (in context) Home prices in a neighborhood are strongly right-skewed because of one mansion. A realtor wants a single number for the "typical" home price. She should report the:
(A) mean, because it uses all the data
(B) median, because it resists the outlier
(C) range, because it shows the full span
(D) mode, because it's the most common price
1-Var Stats reports Sx = 5.2 and σx = 4.9 for a sample. Which should you report as the standard deviation, and why?Compute the sample standard deviation of 3, 3, 4, 5, 8 by hand. (Hint: mean = 4.6.)
14. (in context) Eight quiz scores: 7, 8, 8, 9, 10, 10, 10, 12. Find the mean, the median, and the mode. Which is largest, and what does that tell you about the shape?
15. (in context) A coach times nine sprinters' 100m dashes (seconds): 11.2, 11.4, 11.5, 11.6, 11.8, 12.0, 12.1, 12.3, 15.0. Describe how the 15.0 affects the mean versus the median, and state which center the coach should report.
---
1. (B) 7.3. (4+6+6+7+8+9+11)/7 = 51/7 ≈ 7.29 ≈ 7.3. (A) is the median, not the mean. (C) and (D) come from miscounting or dropping a value.
2. (B) Median. The median is positional, so extreme values can't move it. (A) and (D) are built from the mean and are dragged by outliers. (C) range is the least resistant — it's defined entirely by the two extremes.
3. (B) n − 1. This is the sample (degrees-of-freedom) version that makes s unbiased. (A) n is the population formula. (C) and (D) are not used.
4. (C) mean > median. A long right tail pulls the (non-resistant) mean above the (resistant) median. (A) describes left skew; (B) describes symmetry.
5. (B) 3. IQR = Q3 − Q1 = 9 − 6 = 3. (A) is the range (11 − 4 = 7); (D) is Q3 alone.
6. (C) 13.5. Q3 + 1.5×IQR = 9 + 1.5(3) = 9 + 4.5 = 13.5. (A) just restates the max; (B) and (D) use the wrong IQR or arithmetic.
7. (A) 6. IQR = 16 − 12 = 4. Fences: lower = 12 − 1.5(4) = 12 − 6 = 6; upper = 16 + 1.5(4) = 16 + 6 = 22. An outlier must fall strictly outside [6, 22]. Only 6 is at the lower fence boundary — and values at the fence are typically counted as not outliers, so re-examine: 6 sits exactly on the lower fence. The intended outlier here is a value below 6 or above 22. 23 (D) > 22 is the outlier. Correct answer: (D) 23. (18 and 22 lie inside the fences; 6 is on the boundary, not beyond it.)
8. (C) Outlier; report median + IQR. IQR = 16 − 12 = 4; upper fence = 16 + 1.5(4) = 22. Since 40 > 22, age 40 is an outlier, so the resistant pair (median + IQR) is appropriate. (A) and (D) wrongly clear the outlier; (B) reports the non-resistant pair despite an outlier.
9. (B) median, because it resists the outlier. With strong right skew from one mansion, the mean (A) is inflated and misrepresents "typical." Range (C) and mode (D) aren't measures of a typical price here.
10. (B) Sx, because the data are a sample (n − 1). σx (A, D) is only correct for a full population. They are not interchangeable (C) — they use different denominators.
11. (B) 80% of test-takers scored at or below you. A percentile is a rank, not a raw score (A, D) and not the percent above you (C).
12. (B) mean + SD. For symmetric, outlier-free data the mean and SD are the standard, most informative summary. The resistant pair (A) is reserved for skew/outliers.
13. s ≈ 2.07. Mean = (3+3+4+5+8)/5 = 23/5 = 4.6. Deviations: −1.6, −1.6, −0.6, 0.4, 3.4. Squares: 2.56, 2.56, 0.36, 0.16, 11.56, sum = 17.2. Sample variance = 17.2/(5−1) = 17.2/4 = 4.3. s = sqrt(4.3) ≈ 2.0736 ≈ 2.07. (If you got ≈ 1.85, you divided by n = 5 and found σ, the population SD — wrong for a sample.)
14. Sorted: 7, 8, 8, 9, 10, 10, 10, 12. Mean = 74/8 = 9.25. Median = (9 + 10)/2 = 9.5. Mode = 10 (appears three times). The mode (10) is largest, then the median (9.5), then the mean (9.25). Because the mean < median, there's a mild left skew — a couple of lower scores (the 7) pull the mean down below the median.
15. The 15.0-second time is a clear outlier (the others cluster near 11–12 seconds). It drags the mean upward — x̄ ≈ 12.1 s — well above where the bulk of the data sit, while the median (the 5th of 9 values, 11.8 s) is essentially unaffected. Because the data are right-skewed by this outlier, the coach should report the median (11.8 s) as the typical time; the mean overstates how slow a typical sprinter is.
---
StatsIQ · Lesson 3 of 30 · Unit 1: Exploring One-Variable Data & Collecting Data · Phase 1: Data & Design
Disclaimer: AP® is a trademark registered by the College Board, which is not affiliated with, and does not endorse, this product. Content is aligned to the 2026–27 revised AP Statistics Course and Exam Description.
Accuracy review: All statistics in this lesson were recomputed by hand and cross-checked. The fully worked sample standard deviation of {2, 4, 4, 6, 9} is s = √7 ≈ 2.6458. All five-number summaries use the AP/TI-84 quartile convention (median excluded from halves when n is odd). Reviewed for statistical accuracy by Isaac, retired actuary.