In Lesson 1 you learned to tell a categorical variable (eye color, sport, brand) from a quantitative variable (height, points scored, reaction time). That distinction wasn't busywork — it decides which graph you're allowed to draw.
Picture two phones on a table. One shows a survey: "What's your favorite streaming service?" The other shows a fitness app logging each user's resting heart rate. Both produced a pile of data. But you would never display them the same way. Counting how many people picked Netflix is a different job than showing the spread of heart rates from 48 to 92 beats per minute.
Here's today's question to keep in the back of your mind: a graph isn't decoration — it's the first analysis you do. Before you compute a single average, a good display already tells you whether the data is symmetric or lopsided, whether there's a gap, and where the "typical" value lives. By the end of this lesson you'll look at a histogram and instantly describe what's going on — in plain English, tied to context. That skill shows up on roughly every AP exam ever written.
Everything starts with the variable type.
If you ever draw a histogram of "favorite sport," stop — that's a categorical variable wearing the wrong costume.
A frequency table lists each category and its count. Add a percent column and it's a relative-frequency table. Suppose we ask 60 students their primary way of getting to school:
| Method | Count | Percent |
|---|---|---|
| Car | 24 | 40% |
| Bus | 18 | 30% |
| Walk | 12 | 20% |
| Bike | 6 | 10% |
| Total | 60 | 100% |
(Check: 24 + 18 + 12 + 6 = 60, and 24/60 = 0.40, 18/60 = 0.30, 12/60 = 0.20, 6/60 = 0.10. The percents sum to 100%.)
A bar chart puts categories on one axis and count (or percent) on the other, with bars separated by gaps because the categories aren't numbers on a scale.
[GRAPH: Bar chart of "Method of getting to school" for 60 students.
X-axis: categories Car, Bus, Walk, Bike (separated bars with gaps).
Y-axis: "Count" from 0 to 25.
Bar heights: Car = 24, Bus = 18, Walk = 12, Bike = 6.
Bars do not touch.]
A pie chart shows each category as a slice of a whole. It works only when categories are parts of a single total. Why a bar chart usually wins: people compare lengths far more accurately than they compare angles or areas. With slices that are close in size, you can't tell which is bigger. With bars, your eye reads it off the axis instantly. Use a pie chart only for a quick "share of the whole" feel; reach for a bar chart whenever you need to compare categories — which is almost always.
A two-way table (preview of Unit 3) crosses two categorical variables. Here's school-travel method by grade:
| Underclassmen | Upperclassmen | Total | |
|---|---|---|---|
| Drives/Bikes | 8 | 22 | 30 |
| Bus/Walk | 22 | 8 | 30 |
| Total | 30 | 30 | 60 |
We'll mine two-way tables for conditional probability later. For now, just know they're the categorical equivalent of organizing two variables at once.
Dotplots. Each value is a dot above its spot on a number line; repeats stack. Great for small data sets. You can read every original value right off the plot.
Stemplots (stem-and-leaf plots). Split each number into a stem (all but the last digit) and a leaf (the last digit). Here are the points scored by a basketball player in her last 15 games:
12, 14, 18, 21, 23, 23, 25, 28, 31, 33, 34, 37, 41, 44, 52
Stems are the tens digit; leaves are the ones digit:
1 | 2 4 8
2 | 1 3 3 5 8
3 | 1 3 4 7
4 | 1 4
5 | 2
Key: 2 | 1 means 21 points
Always include a key so a reader knows 5 | 2 means 52, not 5.2. A stemplot keeps every individual value and shows the shape — turn it on its side and it's basically a histogram.
When data are crowded into too few stems, use split stems: give each tens-digit two rows, one for leaves 0–4 and one for 5–9. To compare two groups, use a back-to-back stemplot — shared stems in the middle, one group's leaves to the left (read right to left) and the other's to the right:
Group A Group B
8 4 2 | 1 | 5 7
8 5 3 | 2 | 0 4 6
4 | 3 | 1 2 8
Key: 2 | 4 means 24
Histograms. For larger data sets, group values into equal-width bins (intervals) and draw a bar whose height is the count (or percent) in that bin. Unlike a bar chart, histogram bars touch, because the horizontal axis is a continuous number line.
Take the resting heart rates (beats per minute) of 30 athletes:
48, 51, 52, 54, 55, 55, 57, 58, 59, 60,
61, 62, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 76, 78, 81, 85, 92
Group into bins of width 10 starting at 40. Counting carefully:
| Bin (bpm) | Count |
|---|---|
| 40–49 | 1 |
| 50–59 | 8 |
| 60–69 | 11 |
| 70–79 | 7 |
| 80–89 | 2 |
| 90–99 | 1 |
| Total | 30 |
(Check: 1 + 8 + 11 + 7 + 2 + 1 = 30. The "40–49" bin holds 48; "50–59" holds 51–59, which is 51,52,54,55,55,57,58,59 = 8 values; "60–69" holds 60,61,62,62,63,64,65,66,67,68,69 = 11 values; "70–79" holds 70,71,72,73,74,76,78 = 7; "80–89" holds 81,85 = 2; "90–99" holds 92 = 1.)
[GRAPH: Histogram of resting heart rate (bpm) for 30 athletes.
X-axis: "Resting Heart Rate (bpm)" with bins 40-49, 50-59, 60-69, 70-79, 80-89, 90-99.
Y-axis: "Count" from 0 to 12.
Bar heights: 1, 8, 11, 7, 2, 1. Bars touch (no gaps).
Distribution rises to a peak at 60-69, then tails off to the right toward 90-99.]
A relative-frequency histogram uses the same bins but plots proportion (count ÷ total) instead of count, so heights sum to 1 (or 100%). The shape is identical; only the vertical scale changes. Use it when comparing groups of different sizes.
Choosing bin width. Too wide and you blur the shape into one or two fat bars; too narrow and the histogram looks spiky and random. Aim for about 5–10 bins for a moderate data set, and use "nice" round widths (5, 10, 25…). Different widths can make the same data look different — that's why bin choice is itself an analytic decision.
For any quantitative distribution, describe four things — Shape, Center, Spread, Unusual features — always in context (name the variable and its units).
Shape.
Center. Roughly, the "typical" value — where the middle of the data sits. (We compute mean and median precisely in Lesson 3; here, eyeball it.)
Spread. How much the values vary — from the smallest to the largest, or how tightly they cluster around the center.
Unusual features. Gaps (empty intervals), clusters (separated groups), and possible outliers (values that stand far apart from the rest). Note them; you'll get formal outlier rules in Lesson 3.
For the heart-rate histogram: The distribution of resting heart rates for these 30 athletes is unimodal and slightly skewed right, centered around the mid-60s bpm, with values spread from 48 to 92 bpm. The value at 92 bpm sits a bit apart and may be a high outlier. Notice every clause names the variable and its units — that's "in context," and it's where AP points live.
The calculator can build a histogram for you in three moves.
1. Enter the data into a list.
TI-84: STAT → 1:Edit
Type the values into L1, pressing ENTER after each.
(To clear an old list first: arrow up onto "L1", press CLEAR, then ENTER.)
2. Turn on a STAT PLOT and pick the histogram icon.
TI-84: 2nd → Y= (STAT PLOT) → 1:Plot1 → ENTER
Set: Plot1 = On
Type = the histogram icon (3rd choice, top row)
Xlist = L1
Freq = 1
3. Set the WINDOW so the bins match what you want, then graph.
TI-84: WINDOW
Xmin = 40 (start of first bin)
Xmax = 100 (just past the last value)
Xscl = 10 (bin WIDTH — this controls the bins)
Ymin = 0
Ymax = 12 (a little above the tallest bar)
Then press GRAPH.
The key setting is Xscl — it is your bin width. Change Xscl from 10 to 5 and the same data redraws with twice as many, narrower bins. Press TRACE and arrow left/right to read each bin's interval and count straight off the screen. (Tip: ZOOM → 9:ZoomStat auto-fits the window if you don't want to set it by hand, though it picks its own bin width.)
A teacher records quiz scores (out of 50) for her class:
2 | 3 7
3 | 1 5 5 8 9
4 | 0 2 4 6 8 8 9
5 | 0
Key: 3 | 1 means 31 points
(a) How many students took the quiz? (b) What was the highest score? (c) Describe the shape.
Strategy. Count leaves for the total; the key decodes each value; read shape off the stem lengths.
Solution.
(a) Leaves: 2 + 5 + 7 + 1 = 15 students.
(b) The bottom row 5 | 0 decodes to 50 — a perfect score.
(c) The longest rows are at the high end (40s), with a short tail trailing down toward the 20s. The longer tail points toward low values, so the distribution is skewed left.
Interpretation. Most students scored in the 40s, with a few trailing off toward lower scores — a left-skewed performance, which makes sense for an easy quiz where most people did well.
Twenty students reported how many text messages they sent in one hour:
0, 1, 1, 2, 2, 2, 3, 3, 4, 4,
5, 5, 6, 7, 8, 9, 11, 14, 18, 27
Use bins of width 5 starting at 0. Build the frequency table and describe the distribution in context.
Strategy. Define bins [0–4], [5–9], [10–14], [15–19], [20–24], [25–29]; tally; then describe SCSU.
Solution.
| Texts | Count |
|---|---|
| 0–4 | 10 |
| 5–9 | 6 |
| 10–14 | 2 |
| 15–19 | 1 |
| 20–24 | 0 |
| 25–29 | 1 |
| Total | 20 |
(Check: 0,1,1,2,2,2,3,3,4,4 = 10 in the first bin; 5,5,6,7,8,9 = 6; 11,14 = 2; 18 = 1; none in 20–24; 27 = 1. Total 10+6+2+1+0+1 = 20.)
[GRAPH: Histogram of "Texts sent in one hour" for 20 students.
X-axis bins: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29. Y-axis "Count" 0 to 10.
Heights: 10, 6, 2, 1, 0, 1. Tall on the left, long thin tail to the right,
with a gap at 20-24 and an isolated bar at 25-29.]
Interpretation. The distribution of texts sent is unimodal and strongly skewed right, with a center around 3–4 texts and most students sending fewer than 10. There's a gap between 18 and 27, and the value of 27 texts is a possible high outlier — one student texting far more than everyone else.
Two classes took the same 40-point test. Display them back-to-back and compare.
Class A: 22, 25, 28, 31, 33, 36, 38, 38
Class B: 18, 21, 24, 27, 29, 33, 35, 39
Strategy. Use shared tens-digit stems; Class A leaves on the left (read right to left), Class B on the right.
Solution.
Class A Class B
| 1 | 8
8 5 2 | 2 | 1 4 7 9
8 8 6 3 1 | 3 | 3 5 9
Key: 2 | 5 means 25 points
Interpretation. Both distributions are roughly symmetric and unimodal. Class A is centered a bit higher (most scores in the low 30s) than Class B (centered in the upper 20s). Class B is more spread out (18 up to 39, a range of 21) than Class A (22 to 38, a range of 16). In context: Class A scored both higher and more consistently on this test.
A histogram of the prices of 50 used cars at a dealership is strongly skewed right, with most cars priced between \$8,000 and \$15,000 and a few luxury models above \$40,000. A student claims, "Since it's skewed right, most of the cars are expensive."
Is the student correct? Explain in context.
Strategy. Connect skew direction to where the bulk of the data sits versus where the tail sits.
Solution. The student is wrong. In a right-skewed distribution, the long tail points toward the high values, but the bulk of the data piles up on the low end. Here, most cars cluster between \$8,000 and \$15,000 — the inexpensive end. Only a handful of luxury cars stretch the tail out past \$40,000.
Interpretation. "Skewed right" describes the shape (a long upper tail), not where most values live. For these used cars, the typical car is inexpensive, and a few pricey outliers pull the tail rightward. This is exactly why median often beats mean as a center for skewed data — a preview of Lesson 3.
1. Naming skew by the peak instead of the tail. Students see the tall bars on the left and call it "skewed left." Wrong — skew is named for the longer tail, not the peak. Peak on the left + tail stretching right = skewed right. Repeat the mantra: the tail tells the tale.
2. Confusing a bar chart with a histogram. Bar charts display categorical data with gaps between bars; histograms display quantitative data with bars that touch along a number line. If the horizontal axis has category names, it's a bar chart; if it has numerical intervals, it's a histogram. Drawing a histogram for categorical data (or leaving gaps in a histogram) loses points.
3. Describing without context. "It's skewed right and centered at 6" earns little. Name the variable and units: "The number of texts sent is skewed right, centered around 3–4 texts." On the AP exam, context-free descriptions are routinely marked down.
4. Forgetting the key on a stemplot. Without a key, 5 | 2 is ambiguous — is it 52 or 5.2? Always write Key: 5 | 2 means 52.
5. Forgetting to mention unusual features. A complete description includes shape, center, spread, and any gaps, clusters, or possible outliers. Skipping a visible gap or outlier is an incomplete answer.
Use the stemplot of daily high temperatures (°F) for 12 days:
6 | 2 5 8
7 | 1 4 4 6 9
8 | 0 3 5
Key: 7 | 1 means 71°F
How many days had a high temperature of at least 80°F?
A. 2
B. 3
C. 4
D. 5
Consider this histogram of the number of pets owned by 40 families:
[GRAPH: Histogram. X-axis "Number of pets" bins 0-1, 2-3, 4-5, 6-7, 8-9.
Y-axis "Count". Bar heights: 22, 11, 4, 2, 1. Tall on left, tail to the right.]
The shape of this distribution is best described as:
A. Skewed left
B. Symmetric
C. Skewed right
D. Bimodal
Interpretation items
A fitness app records the number of steps (in thousands) walked by 25 users in one day. The histogram is unimodal and skewed right, with most users between 4,000 and 8,000 steps and one user near 22,000 steps.
Write a complete description of this distribution in context (shape, center, spread, and any unusual features).
A teacher displays final exam scores for two sections, A and B, in a back-to-back stemplot. Section A's scores are tightly clustered in the 80s; Section B's scores spread from the 60s to the 90s and are roughly symmetric. Compare the two distributions in context (mention center and spread).
Twenty data points representing the ages (years) of cars in a parking lot are:
1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10, 12, 14, 17, 20
Using bins of width 5 starting at 0, make a frequency table, state the number of bins you used, and describe the shape of the distribution in context.
1. C — Bar chart. Favorite platform is categorical, so it needs a bar chart (or pie chart). A (histogram), B (stemplot), and D (dotplot) all require quantitative data and can't display categories.
2. B — Skewed left. Skew is named for the longer tail. The tail stretches toward low values, so it is skewed left. A reverses the tail rule (the classic trap). C and D don't match a distribution with a clear tail.
3. B — Bar chart bars have gaps; histogram bars touch. Bar charts show categorical data (separated bars); histogram bars touch because the axis is a continuous number line. A, C, and D are not real defining differences.
4. B — 3 days. The "8" stem holds leaves 0, 3, 5 → temperatures 80, 83, 85, i.e. 3 days at or above 80°F. A miscounts; C and D over-count.
5. D — 85°F. The bottom row 8 | 0 3 5 decodes to 80, 83, 85; the highest is 85. A and C are other values in the plot; B is the lowest of the 80s.
6. B — People judge bar lengths more accurately than pie slice angles. This is the core reason bar charts usually win. A is false (pie charts can show percents), C and D are false claims.
7. C — Most households cluster at lower incomes, with a few high incomes stretching the tail. Right-skew = bulk on the low end, long tail toward high values. A confuses tail with bulk; B reverses the tail direction; D contradicts "skewed."
8. B — The proportion (or percent) in each bin instead of the count. Only the vertical scale changes; the shape is identical. A describes categories (not a histogram axis); C and D are not what the height represents.
9. C — Skewed right. Counts: 22, 11, 4, 2, 1 — tall on the left with a tail trailing toward higher pet counts. The longer tail points right → skewed right. A reverses the rule; B and D don't fit the shape.
10. C — 7 families. "4 or more pets" = bins 4–5, 6–7, 8–9 = 4 + 2 + 1 = 7. A counts only the 4–5 bin; B is a miscount; D (17) is the count of families with fewer than 4 pets (40 − 17... check: 22 + 11 = 33 own 0–3, so 40 − 33 = 7 own 4+; D = 17 has no basis and is a distractor).
11. C — Number of minutes spent on homework. This is quantitative, so it gets a histogram. A, B, and D are all categorical (bar chart).
12. B — Compare the distributions of a quantitative variable for two groups. That's exactly the job of a back-to-back stemplot. A and C describe categorical tools; D is beyond what a stemplot does.
13. Full-credit description: The distribution of daily steps for these 25 users is unimodal and skewed right. The center is around 6,000 steps (most users fall between 4,000 and 8,000 steps), and values spread from roughly 4,000 up to about 22,000 steps. The user near 22,000 steps is a possible high outlier, sitting well apart from the rest. (Points for: shape with correct skew direction, center in context, spread in context, and identifying the unusual high value — all naming the variable/units.)
14. Full-credit comparison: Both sections' final exam scores are unimodal. Section A is more tightly clustered (most scores in the 80s, so smaller spread), while Section B is more spread out (scores from the 60s through the 90s, a larger range). Section A's center is higher (around the 80s) than Section B's, whose roughly symmetric scores center lower (around the 70s–80s). (Points for: comparing center and spread using explicit comparative language — "higher," "more spread out" — in context. A description of only one section earns partial credit.)
15. Bins of width 5 starting at 0 → intervals 0–4, 5–9, 10–14, 15–19:
| Age (years) | Count |
|---|---|
| 0–4 | 9 |
| 5–9 | 6 |
| 10–14 | 3 |
| 15–19 | 1 |
| 20–24 | 1 |
| Total | 20 |
(Check: 1,1,2,2,2,3,3,4,4 = 9; 5,5,6,7,8,9 = 6; 10,12,14 = 3; 17 = 1; 20 = 1. Total 9+6+3+1+1 = 20. The value 20 falls in the 20–24 bin, giving 5 bins used; if a student stops at 15–19 and places 20 there they'd report 4 bins — accept a clearly stated bin scheme as long as 20 is handled consistently.) Description: The distribution of car ages is unimodal and skewed right, centered around 4–5 years, with most cars under 10 years old and a tail of older cars (a 17-year-old and a 20-year-old) stretching toward higher ages. The 20-year-old car is a possible high outlier.
---
StatsIQ · Lesson 2 of 30 · Unit 1 — Exploring One-Variable Data & Collecting Data. For exam-preparation purposes only; not affiliated with or endorsed by the College Board. AP is a registered trademark of the College Board. Content pending statistical-accuracy review (Isaac).