StatsIQ · AP Statistics

Lesson 3: Summarizing One-Variable Data

Unit 1 · Phase 1 · Statistical Practice:** 3 — Analyze Data

Topics:** Measures of center (mean, median, mode) and resistance; measures of spread (range, IQR, variance, standard deviation); population vs. sample SD; the 5-number summary and boxplots; the 1.5×IQR rule for outliers; effect of skew/outliers on mean vs. median; percentiles; choosing the right summary.

Calculator:** TI-84 `STAT → CALC → 1-Var Stats`; reading x̄, Sx, σx, and the five-number summary; building a boxplot.

Objectives:

Compute and interpret measures of center and spread, including a standard deviation worked fully by hand.
Build a 5-number summary and boxplot, and use the 1.5×IQR rule to flag outliers correctly.
Choose between mean + SD and median + IQR based on shape, and read all of these off a TI-84.

(a) Warm-Up

Two AP Stats classes take the same quiz. You're told both classes have a mean of 80. Same class, right? Not so fast.

Class A scores: 78, 79, 80, 81, 82.

Class B scores: 60, 70, 80, 90, 100.

Both average to 80. But Class A is packed tightly around 80, while Class B is spread all over the map. If you were a student deciding which class curve to gamble on, that difference would matter a lot.

In Lesson 2 you learned to describe a distribution's shape, center, and spread by eye. Now you'll put exact numbers on center and spread — and learn that a single number like "the mean is 80" almost never tells the whole story. A summary needs at least a center and a spread, and which pair you pick depends on the shape. By the end of this lesson you'll know exactly which numbers to report, how to compute them by hand, and how to pull them off your calculator in seconds.

(b) Core Concept

Measures of Center

A measure of center is a single number meant to represent a "typical" value.

Mean (x̄ for a sample, μ for a population): add the values, divide by how many there are. x̄ = Σx / n. The mean is the balance point of the distribution — the spot where the data would balance if the dots had weight.
Median: the middle value when the data are sorted. With an odd count, it's the literal middle; with an even count, it's the average of the two middle values. The median is the positional center — exactly half the data sit at or below it.
Mode: the most frequently occurring value. A distribution can have one mode, several, or none. The mode is the only center that works for categorical data.

The single most important idea here: the median is resistant (also called robust) — extreme values barely move it. The mean is not resistant — a single huge value drags it along.

Quick proof. Take 1, 2, 3, 4, 5. Mean = 3, median = 3. Now change the 5 to a 500: the data are 1, 2, 3, 4, 500. The median is still 3 (the middle position didn't change), but the mean jumps to 102. The median shrugged off the outlier; the mean got hijacked by it. Remember this — it drives every "which summary should I use" decision in this course.

Measures of Spread

Center tells you where; spread tells you how variable.

Range = maximum − minimum. Dead simple, but it uses only the two most extreme values, so a single outlier blows it up.
Interquartile Range (IQR) = Q3 − Q1, the spread of the middle 50% of the data. Because it ignores the top and bottom quarters, the IQR is resistant.
Variance and Standard Deviation: these measure roughly the average distance of the data points from the mean. Standard deviation is the workhorse spread measure, and it is not resistant (it's built from the mean and from squared distances).

Population vs. Sample Standard Deviation

There are two versions, and the difference is the denominator.

Population standard deviation (you have the whole population):

σ = sqrt( Σ(x − μ)² / N )

Sample standard deviation (you have a sample and want to estimate the population's spread):

s = sqrt( Σ(x − x̄)² / (n − 1) )

Why divide by n − 1 instead of n for a sample? A sample tends to be a little less spread out than the full population it came from, so dividing by the smaller number n − 1 nudges the estimate up to compensate. That n − 1 is called the degrees of freedom, and it makes s an unbiased estimator of σ. On the AP exam you are almost always working with a sample, so you almost always use s (the n − 1 version). On the TI-84 this is the value labeled Sx; the population version is σx.

One Standard Deviation, Fully By Hand

Let's compute the sample standard deviation of this clean data set:

{ 2, 4, 4, 6, 9 }, with n = 5.

Step 1 — Find the mean.

x̄ = (2 + 4 + 4 + 6 + 9) / 5 = 25 / 5 = 5

Step 2 — Find each deviation (x − x̄).

x	x − x̄	(x − x̄)²
2	−3	9
4	−1	1
4	−1	1
6	1	1
9	4	16

Step 3 — Add up the squared deviations.

Σ(x − x̄)² = 9 + 1 + 1 + 1 + 16 = 28

Step 4 — Divide by n − 1 to get the sample variance.

s² = 28 / (5 − 1) = 28 / 4 = 7

Step 5 — Take the square root to get the sample standard deviation.

s = sqrt(7) ≈ 2.6458

So s ≈ 2.65. Interpretation: the quiz scores typically fall about 2.65 units away from the mean of 5. (If you had instead treated these five numbers as an entire population, you'd divide by N = 5: σ = sqrt(28/5) = sqrt(5.6) ≈ 2.3664. Notice the population value is smaller — that's the n − 1 adjustment at work.)

The 5-Number Summary and Boxplots

The five-number summary captures a distribution with five values:

Minimum, Q1, Median, Q3, Maximum

Q1 (first quartile) is the median of the lower half of the data — the 25th percentile.
Q3 (third quartile) is the median of the upper half — the 75th percentile.

AP/TI-84 convention for quartiles: when there's an odd number of values, the overall median is excluded from both halves before finding Q1 and Q3.

A boxplot is the picture of the five-number summary: a box from Q1 to Q3 (with a line at the median), and "whiskers" extending to the min and max — or, in a modified boxplot, to the most extreme values that are not outliers, with outliers drawn as separate dots.

Take the set { 12, 15, 15, 18, 20, 22, 25 }. The median is 18. The lower half is {12, 15, 15} so Q1 = 15; the upper half is {20, 22, 25} so Q3 = 22. Five-number summary: Min 12, Q1 15, Median 18, Q3 22, Max 25. IQR = 22 − 15 = 7.

[GRAPH: Horizontal boxplot of {12, 15, 15, 18, 20, 22, 25}. Number line from 10 to 26. Left whisker from 12 to the box at 15. Box spans Q1=15 to Q3=22 with a vertical median line at 18. Right whisker from 22 to 25. Distribution looks roughly symmetric, slightly right-skewed.]

The 1.5 × IQR Rule for Outliers

You don't get to eyeball outliers — there's a rule. A value is an outlier if it falls below the lower fence or above the upper fence:

Lower fence = Q1 − 1.5 × IQR

Upper fence = Q3 + 1.5 × IQR

Any data value outside [lower fence, upper fence] is flagged as an outlier. This is the only definition of "outlier" the AP exam accepts. "It just looks far away" is not an answer.

Skew, Outliers, and Choosing Your Summary

Because the mean and SD get pulled toward extreme values while the median and IQR don't:

In a right-skewed distribution (long tail to the right, e.g. incomes), the mean > median.
In a left-skewed distribution, the mean < median.
In a roughly symmetric distribution, mean ≈ median.

This leads to the decision rule that shows up constantly on the exam:

Symmetric, no outliers → report the mean and standard deviation.

Skewed or has outliers → report the median and IQR (the resistant pair).

Percentiles

The p-th percentile is the value at or below which p% of the data fall. If your SAT score is at the 90th percentile, 90% of test-takers scored at or below you. The median is the 50th percentile; Q1 is the 25th; Q3 is the 75th. Percentiles let you compare a single value to a whole distribution regardless of units.

TI-84: Get Everything at Once

STAT → 1:Edit — type your data into list L1.
STAT → CALC → 1:1-Var Stats, set List: L1, leave FreqList blank, choose Calculate.
Read the output:

- x̄ = the mean

- Sx = the sample standard deviation (the n − 1 version — your default)

- σx = the population standard deviation (the n version)

- n = sample size

- Scroll down for minX, Q1, Med, Q3, maxX — your whole five-number summary

For a boxplot: 2nd → Y= (STAT PLOT) → Plot1 → On, choose the modified boxplot icon (the one that shows outliers as dots), set Xlist: L1, then ZOOM → 9:ZoomStat.

(c) Worked Examples

Example 1 (easy) — Center and spread by hand

Problem. Seven students reported hours of sleep: 12, 15, 15, 18, 20, 22, 25 (yes, a couple of overachievers). Find the mean, median, mode, range, and IQR.

Strategy. Sort (already sorted), then apply each definition.

Solution.

Mean: (12+15+15+18+20+22+25)/7 = 127/7 ≈ 18.14.
Median: middle of 7 values = the 4th = 18.
Mode: 15 appears twice; everything else once → mode = 15.
Range: 25 − 12 = 13.
Q1 = 15, Q3 = 22 → IQR = 7.

Interpretation. A typical value is around 18, and the middle 50% of students span a 7-unit-wide window.

Example 2 (medium) — Apply the 1.5×IQR rule

Problem. A barista records minutes customers waited: 18, 20, 21, 22, 23, 24, 25, 26, 48. Is the 48 an outlier by the 1.5×IQR rule?

Strategy. Build the five-number summary, get IQR, compute the fences, compare.

Solution. Sorted, n = 9. Median = 5th value = 23. Exclude it. Lower half {18, 20, 21, 22} → Q1 = (20+21)/2 = 20.5. Upper half {24, 25, 26, 48} → Q3 = (25+26)/2 = 25.5.

IQR = 25.5 − 20.5 = 5.

Upper fence = Q3 + 1.5 × IQR = 25.5 + 1.5(5) = 25.5 + 7.5 = 33.

Lower fence = Q1 − 1.5 × IQR = 20.5 − 7.5 = 13.

Since 48 > 33, the 48 is an outlier. No other value falls outside [13, 33].

Interpretation. That 48-minute wait is a genuine outlier, not just a big number — one customer waited far longer than the typical pattern. (For the record: x̄ ≈ 25.22 but the median is 23; the outlier pulls the mean up, which is exactly why median + IQR is the better summary here.)

Example 3 (medium) — Pick the right summary for a skewed set

Problem. Nine employees' annual salaries (in $1,000s): 32, 35, 38, 40, 42, 45, 48, 52, 120. Should you summarize with mean + SD or median + IQR? Justify, and report your choice.

Strategy. Check shape/outliers first, then report the matching pair.

Solution. Median = 5th value = 42. Q1 = median of {32,35,38,40} = (35+38)/2 = 36.5; Q3 = median of {45,48,52,120} = (48+52)/2 = 50. IQR = 50 − 36.5 = 13.5. Upper fence = 50 + 1.5(13.5) = 50 + 20.25 = 70.25. Since 120 > 70.25, the $120k salary is an outlier, and the data are clearly right-skewed.

Because there's an outlier and right skew, use the median and IQR: median = $42k, IQR = $13.5k. (For contrast, the mean is x̄ ≈ $50.2k — pulled well above the median by that one executive salary, which is exactly why it's the wrong summary here.)

Interpretation. "A typical employee earns about $42,000, and the middle half of salaries span $13,500." Reporting the mean of $50.2k would overstate what a typical employee actually makes.

Example 4 (AP-style) — Reading the calculator + reasoning

Problem. A student enters daily high temperatures 68, 70, 72, 73, 75, 76, 78, 90 into L1 and runs 1-Var Stats. The calculator shows x̄ = 75.25, Sx ≈ 6.78, σx ≈ 6.34, Med = 74, Q1 = 71, Q3 = 77. (a) Which standard deviation should the student report and why? (b) Use the 1.5×IQR rule to check whether 90° is an outlier.

Strategy. Identify sample vs. population; then run the fence test.

Solution.

(a) These eight days are a sample of daily highs, so report Sx ≈ 6.78, the n − 1 version. σx would only be right if these eight days were the entire population of interest.

(b) IQR = Q3 − Q1 = 77 − 71 = 6. Upper fence = 77 + 1.5(6) = 77 + 9 = 86. Since 90 > 86, 90° is an outlier.

Interpretation. That 90° day stands out as unusually hot relative to the rest of the week, by the formal rule — not just because it "looks high."

(d) Common Mistakes

1. Dividing by n when you should divide by n − 1. Students grab the population formula (or read σx off the calculator) for sample data. On the AP exam your data are almost always a sample, so use s with n − 1 — that's Sx, not σx. Mixing these up changes your answer and can cost points.

2. Calling any extreme-looking value an "outlier" without the rule. "1200 is way bigger than the others, so it's an outlier" earns nothing. You must compute Q1 − 1.5×IQR and Q3 + 1.5×IQR and show the value falls outside the fences. The rule is the answer; the eyeball is not.

3. Using the mean (and SD) to summarize skewed data. If a distribution is skewed or has an outlier, the mean gets dragged toward the tail and misrepresents "typical." Switch to the resistant pair: median and IQR.

4. Forgetting to sort, or including the median in the halves. Quartiles require sorted data. And with an odd count, the AP/TI-84 convention is to exclude the overall median before finding Q1 and Q3. Skipping either step gives wrong quartiles, a wrong IQR, and wrong fences.

5. Reporting a center with no spread (or vice versa). "The mean is 80" is half an answer. A complete summary always pairs a center with a matching spread (mean + SD, or median + IQR) — and ideally a comment on shape.

(e) Practice Problems

Question 1

For the data set 4, 6, 6, 7, 8, 9, 11, the mean is closest to:

(A) 7.0
(B) 7.3
(C) 8.0
(D) 6.0

Question 2

Which measure of center is most resistant to outliers?

(A) Mean
(B) Median
(C) Range
(D) Standard deviation

Question 3

The sample standard deviation formula divides the sum of squared deviations by:

(A) n
(B) n − 1
(C) n + 1
(D) √n

Question 4

A distribution is right-skewed. Which is true?

(A) mean < median
(B) mean ≈ median
(C) mean > median
(D) mean = mode

Question 5

For 4, 6, 6, 7, 8, 9, 11, the five-number summary is Min 4, Q1 6, Med 7, Q3 9, Max 11. The IQR is:

(A) 7
(B) 3
(C) 5
(D) 9

Question 6

Using the data and IQR from Problem 5, the upper fence (Q3 + 1.5×IQR) is:

(A) 11.0
(B) 12.0
(C) 13.5
(D) 15.0

Question 7

A data set has Q1 = 12, Q3 = 16. By the 1.5×IQR rule, which value below is an outlier?

(A) 6
(B) 18
(C) 22
(D) 23

8. (in context) A clinic records patient ages: 10, 12, 13, 14, 15, 16, 40. Q1 = 12, Q3 = 16, so IQR = 4. Is the age 40 an outlier by the 1.5×IQR rule, and which summary should the clinic report?

(A) Not an outlier; report mean + SD

(B) Outlier; report mean + SD

(D) Not an outlier; report median + IQR

9. (in context) Home prices in a neighborhood are strongly right-skewed because of one mansion. A realtor wants a single number for the "typical" home price. She should report the:

(A) mean, because it uses all the data

(B) median, because it resists the outlier

(D) mode, because it's the most common price

Question 10

On the TI-84, 1-Var Stats reports Sx = 5.2 and σx = 4.9 for a sample. Which should you report as the standard deviation, and why?

(A) σx, because it's smaller
(B) Sx, because the data are a sample (n − 1)
(C) Either — they're interchangeable
(D) σx, because it uses all the data

Question 11

Your test score is at the 80th percentile. This means:

(A) you scored 80%
(B) 80% of test-takers scored at or below you
(C) 80% of test-takers scored above you
(D) you scored 80 points

Question 12

A symmetric distribution with no outliers is best summarized by:

(A) median + IQR
(B) mean + SD
(C) mode + range
(D) Q1 + Q3

Compute the sample standard deviation of 3, 3, 4, 5, 8 by hand. (Hint: mean = 4.6.)

14. (in context) Eight quiz scores: 7, 8, 8, 9, 10, 10, 10, 12. Find the mean, the median, and the mode. Which is largest, and what does that tell you about the shape?

15. (in context) A coach times nine sprinters' 100m dashes (seconds): 11.2, 11.4, 11.5, 11.6, 11.8, 12.0, 12.1, 12.3, 15.0. Describe how the 15.0 affects the mean versus the median, and state which center the coach should report.

---

🔑 Answer Key

1. (B) 7.3. (4+6+6+7+8+9+11)/7 = 51/7 ≈ 7.29 ≈ 7.3. (A) is the median, not the mean. (C) and (D) come from miscounting or dropping a value.

2. (B) Median. The median is positional, so extreme values can't move it. (A) and (D) are built from the mean and are dragged by outliers. (C) range is the least resistant — it's defined entirely by the two extremes.

3. (B) n − 1. This is the sample (degrees-of-freedom) version that makes s unbiased. (A) n is the population formula. (C) and (D) are not used.

4. (C) mean > median. A long right tail pulls the (non-resistant) mean above the (resistant) median. (A) describes left skew; (B) describes symmetry.

5. (B) 3. IQR = Q3 − Q1 = 9 − 6 = 3. (A) is the range (11 − 4 = 7); (D) is Q3 alone.

6. (C) 13.5. Q3 + 1.5×IQR = 9 + 1.5(3) = 9 + 4.5 = 13.5. (A) just restates the max; (B) and (D) use the wrong IQR or arithmetic.

7. (A) 6. IQR = 16 − 12 = 4. Fences: lower = 12 − 1.5(4) = 12 − 6 = 6; upper = 16 + 1.5(4) = 16 + 6 = 22. An outlier must fall strictly outside [6, 22]. Only 6 is at the lower fence boundary — and values at the fence are typically counted as not outliers, so re-examine: 6 sits exactly on the lower fence. The intended outlier here is a value below 6 or above 22. 23 (D) > 22 is the outlier. Correct answer: (D) 23. (18 and 22 lie inside the fences; 6 is on the boundary, not beyond it.)

8. (C) Outlier; report median + IQR. IQR = 16 − 12 = 4; upper fence = 16 + 1.5(4) = 22. Since 40 > 22, age 40 is an outlier, so the resistant pair (median + IQR) is appropriate. (A) and (D) wrongly clear the outlier; (B) reports the non-resistant pair despite an outlier.

9. (B) median, because it resists the outlier. With strong right skew from one mansion, the mean (A) is inflated and misrepresents "typical." Range (C) and mode (D) aren't measures of a typical price here.

10. (B) Sx, because the data are a sample (n − 1). σx (A, D) is only correct for a full population. They are not interchangeable (C) — they use different denominators.

11. (B) 80% of test-takers scored at or below you. A percentile is a rank, not a raw score (A, D) and not the percent above you (C).

12. (B) mean + SD. For symmetric, outlier-free data the mean and SD are the standard, most informative summary. The resistant pair (A) is reserved for skew/outliers.

13. s ≈ 2.07. Mean = (3+3+4+5+8)/5 = 23/5 = 4.6. Deviations: −1.6, −1.6, −0.6, 0.4, 3.4. Squares: 2.56, 2.56, 0.36, 0.16, 11.56, sum = 17.2. Sample variance = 17.2/(5−1) = 17.2/4 = 4.3. s = sqrt(4.3) ≈ 2.0736 ≈ 2.07. (If you got ≈ 1.85, you divided by n = 5 and found σ, the population SD — wrong for a sample.)

14. Sorted: 7, 8, 8, 9, 10, 10, 10, 12. Mean = 74/8 = 9.25. Median = (9 + 10)/2 = 9.5. Mode = 10 (appears three times). The mode (10) is largest, then the median (9.5), then the mean (9.25). Because the mean < median, there's a mild left skew — a couple of lower scores (the 7) pull the mean down below the median.

15. The 15.0-second time is a clear outlier (the others cluster near 11–12 seconds). It drags the mean upward — x̄ ≈ 12.1 s — well above where the bulk of the data sit, while the median (the 5th of 9 values, 11.8 s) is essentially unaffected. Because the data are right-skewed by this outlier, the coach should report the median (11.8 s) as the typical time; the mean overstates how slow a typical sprinter is.

---

StatsIQ · Lesson 3 of 30 · Unit 1: Exploring One-Variable Data & Collecting Data · Phase 1: Data & Design

Disclaimer: AP® is a trademark registered by the College Board, which is not affiliated with, and does not endorse, this product. Content is aligned to the 2026–27 revised AP Statistics Course and Exam Description.

Accuracy review: All statistics in this lesson were recomputed by hand and cross-checked. The fully worked sample standard deviation of {2, 4, 4, 6, 9} is s = √7 ≈ 2.6458. All five-number summaries use the AP/TI-84 quartile convention (median excluded from halves when n is odd). Reviewed for statistical accuracy by Isaac, retired actuary.

← Lesson 2

Lesson 4 →