You flip a fair coin 3 times and count the number of heads. Before you flip, you can't say exactly how many heads you'll get — it could be 0, 1, 2, or 3. But you can describe how the chances are spread across those values: getting 1 or 2 heads is far more likely than getting 0 or 3.
That count of heads is a random variable: a numerical outcome of a random process. We don't know the value in advance, but we know the probabilities.
Here's the question that drives this whole lesson: if you repeated those 3 flips thousands of times, what would the average number of heads come out to? And how much would individual results typically bounce around that average?
Those two ideas — the long-run average and the typical spread — are the expected value and the standard deviation of a random variable. By the end of this lesson you'll compute both by hand, and read them straight off your calculator. This is the engine behind binomial and normal distributions in the lessons ahead, so let's get it right.
A random variable is a variable whose value is a numerical outcome of a random phenomenon. We write random variables with capital letters (X, Y) and their specific values with lowercase letters (x).
This lesson is about discrete random variables. Lesson 13 handles continuous ones.
A probability distribution of a discrete random variable lists every possible value and its probability. We usually show it as a table:
| x | x₁ | x₂ | x₃ | ... |
|--------|------|------|------|-----|
| P(x) | p₁ | p₂ | p₃ | ... |
For this to be a valid probability distribution, two rules must hold:
0 ≤ P(x) ≤ 1 for each value.Σ P(x) = 1.If either rule fails, it is not a legitimate distribution. Always check the sum first — it's the most common error.
The expected value, also called the mean of the random variable, is the long-run average value of X over many, many repetitions. The symbol is μ_X (or E(X)), and the formula is:
μ_X = E(X) = Σ x·P(x)
You multiply each value by its probability and add up all those products. This is a weighted average: values with higher probability pull the mean toward them.
Long-run interpretation (this is the part the AP exam cares about): μ_X is not the value you expect on any single trial — in fact E(X) need not even be a possible value of X. It is the value the average outcome approaches if you repeated the random process a very large number of times. "Expected value" means "expected on average over the long run."
The variance measures how far the values of X typically fall from the mean, weighted by probability. The symbol is σ²_X:
σ²_X = Σ (x − μ_X)²·P(x)
For each value: subtract the mean, square the deviation, then weight it by the probability, and add everything up. The squaring is what keeps positive and negative deviations from canceling — never skip it.
The standard deviation is the square root of the variance, which brings us back to the original units:
σ_X = √(σ²_X)
It answers: "By how much does a typical outcome of X differ from the mean μ_X?"
Shortcut (handy for checking): Variance also equals
σ²_X = E(X²) − μ²_X = Σ x²·P(x) − μ²_X. We'll use this as a verification tool.
A pizza shop tracks how many pizzas X a single online customer orders. From thousands of past orders:
| x (pizzas) | 0 | 1 | 2 | 3 |
|------------|------|------|------|------|
| P(x) | 0.10 | 0.40 | 0.30 | 0.20 |
Step 0 — Valid? Sum = 0.10 + 0.40 + 0.30 + 0.20 = 1.00, and each probability is between 0 and 1. ✓ Valid distribution.
Step 1 — Expected value μ_X = Σ x·P(x):
0 × 0.10 = 0.00
1 × 0.40 = 0.40
2 × 0.30 = 0.60
3 × 0.20 = 0.60
-----------------
μ_X = 1.60 pizzas
So μ_X = 1.60 pizzas. Interpretation: over many, many customers, orders average about 1.6 pizzas each — even though no single customer can order 1.6 pizzas.
Step 2 — Variance σ²_X = Σ (x − μ_X)²·P(x), with μ_X = 1.60:
x | (x − 1.6) | (x − 1.6)² | × P(x) | product
--|-----------|------------|---------------|--------
0 | −1.6 | 2.56 | × 0.10 | 0.256
1 | −0.6 | 0.36 | × 0.40 | 0.144
2 | 0.4 | 0.16 | × 0.30 | 0.048
3 | 1.4 | 1.96 | × 0.20 | 0.392
--------
σ²_X = sum = 0.840
So σ²_X = 0.840.
Check with the shortcut: E(X²) = 0²(0.10) + 1²(0.40) + 2²(0.30) + 3²(0.20) = 0 + 0.40 + 1.20 + 1.80 = 3.40.
Then σ²_X = 3.40 − (1.60)² = 3.40 − 2.56 = 0.840. ✓ Matches.
Step 3 — Standard deviation: σ_X = √0.840 = 0.9165… ≈ 0.917 pizzas.
Final answer: μ_X = 1.60 pizzas and σ_X ≈ 0.917 pizzas. Interpretation of σ: the number of pizzas a typical customer orders differs from the mean of 1.6 by about 0.92 pizzas.
This is a slick way to get μ_X and σ_X without grinding out the table:
TI-84:
STAT → EDIT
L1: 0 1 2 3 (the values of x)
L2: .10 .40 .30 .20 (the probabilities)
STAT → CALC → 1-Var Stats
List: L1 FreqList: L2 → Calculate
Output:
x̄ = 1.6 ← this is μ_X (E(X))
σx = .9165151… ← this is σ_X
The calculator treats the probabilities as "weights," so x̄ is exactly Σ x·P(x) and σx is exactly √(Σ (x − μ)²·P(x)). Important: read σx (population SD), not Sx (sample SD). On the AP exam this trick gets you μ and σ in seconds — but you must still interpret the numbers in context, because that's where the points live.
Linear transformation (optional, minimal — see report flag): If you scale and shift a single random variable, say
Y = aX + b, thenμ_Y = a·μ_X + bandσ_Y = |a|·σ_X(addingbshifts the mean but does not change the spread). For the pizza example, if each pizza costs $12 plus a flat $3 delivery fee, revenueY = 12X + 3has meanμ_Y = 12(1.6) + 3 = $22.20and SDσ_Y = 12(0.917) = $11.00. Note: this covers only ONE random variable; combining two different random variables is outside this course.
Problem: A student claims the following describes the number of pets X owned by classmates. Is it a valid probability distribution?
| x | 0 | 1 | 2 | 3 |
|------|------|------|------|------|
| P(x) | 0.25 | 0.40 | 0.20 | 0.10 |
Strategy: Check both rules — each probability in [0, 1], and the sum equals 1.
Solution: Each value is between 0 and 1. ✓ Sum = 0.25 + 0.40 + 0.20 + 0.10 = 0.95 ≠ 1. ✗
Interpretation: This is not valid — the probabilities sum to 0.95, not 1. Something is missing (the leftover 0.05 of probability is unaccounted for).
Problem: A quiz has scores X (number correct out of 3) distributed as below. Find E(X) and interpret it.
| x | 0 | 1 | 2 | 3 |
|------|------|------|------|------|
| P(x) | 0.10 | 0.20 | 0.40 | 0.30 |
Strategy: Confirm it's valid, then apply E(X) = Σ x·P(x).
Solution: Sum = 0.10 + 0.20 + 0.40 + 0.30 = 1.00 ✓.
0 × 0.10 = 0.00
1 × 0.20 = 0.20
2 × 0.40 = 0.80
3 × 0.30 = 0.90
-----------------
E(X) = 1.90 questions correct
Interpretation: If many students took this quiz, they would average 1.9 correct answers in the long run. No individual gets 1.9, but that's the long-run mean.
Problem: Using the quiz distribution from Example 2 (μ_X = 1.90), find σ_X and interpret it.
Strategy: Use σ²_X = Σ (x − μ_X)²·P(x), then take the square root. Verify with the shortcut.
Solution:
x | (x − 1.9)² | × P(x) | product
--|------------|----------|--------
0 | 3.61 | × 0.10 | 0.361
1 | 0.81 | × 0.20 | 0.162
2 | 0.01 | × 0.40 | 0.004
3 | 1.21 | × 0.30 | 0.363
--------
σ²_X = 0.890
σ_X = √0.890 = 0.9434… ≈ 0.943.
Check: E(X²) = 0 + 1(0.20) + 4(0.40) + 9(0.30) = 0.20 + 1.60 + 2.70 = 4.50; σ²_X = 4.50 − 1.9² = 4.50 − 3.61 = 0.890 ✓.
Interpretation: A typical student's score differs from the mean of 1.9 by about 0.94 questions.
Problem: A carnival game costs $2 to play. You draw one ball from a bag; with probability 0.20 you win a $5 prize, otherwise you win nothing. Let X be your net gain (prize minus cost). Find E(X) and explain whether the game is fair.
Strategy: Build the distribution of net gain, then compute E(X). A game is "fair" when E(X) = 0 (you break even in the long run).
Solution: If you win, net gain = $5 − $2 = +$3 (prob 0.20). If you lose, net gain = $0 − $2 = −$2 (prob 0.80).
| x ($) | +3 | −2 |
|-------|------|------|
| P(x) | 0.20 | 0.80 |
E(X) = (3)(0.20) + (−2)(0.80) = 0.60 − 1.60 = −1.00
Interpretation: E(X) = −$1.00. The game is not fair — on average a player loses $1 per play over the long run. That $1 average loss per player is exactly how the carnival makes money. To make it fair, the prize or the price would need adjusting so E(X) = 0.
1. Forgetting to square the deviations. Some students compute Σ (x − μ)·P(x) for variance. But Σ (x − μ)·P(x) always equals 0 (positive and negative deviations cancel). You must square: σ²_X = Σ (x − μ_X)²·P(x).
2. Forgetting to weight by probability. Computing the plain average of the x-values — or the plain SD of 0, 1, 2, 3 — ignores that some outcomes are far more likely than others. Every term in both μ_X and σ²_X must be multiplied by P(x). The mean is a probability-weighted average, not a simple average of the listed values.
3. Taking Sx instead of σx on the calculator. After 1-Var Stats L1,L2, the SD of the random variable is σx, the population standard deviation. Sx (sample SD) is the wrong one and will cost you. Read σx.
4. Probabilities that don't sum to 1. If Σ P(x) ≠ 1, the distribution is invalid — fix or flag it before computing anything. A missing or extra value is a classic AP trap.
5. Misreading expected value as a guaranteed outcome. E(X) is a long-run average, not a prediction for one trial. Saying "you will get 1.6 pizzas" is wrong; "over many orders the average is 1.6 pizzas" is right.
1 (MC). Which of the following is a continuous random variable?
A) The number of text messages a student sends in a day
B) The exact time (in minutes) a student spends on homework tonight
C) The number of red cards drawn from a deck
D) The number of siblings a student has
2 (MC). For a valid discrete probability distribution, which must be true?
A) Every probability equals 1/n where n is the number of values
B) The probabilities sum to 1 and each is between 0 and 1
C) The mean must be one of the listed values of x
D) The standard deviation must be less than 1
3 (MC). A random variable X has P(1) = 0.3, P(2) = 0.5, P(4) = 0.2. What is E(X)?
A) 2.0
B) 2.1
C) 2.3
D) 2.5
4 (MC). For the distribution in Problem 3, E(X²) equals which value?
A) 4.41
B) 5.10
C) 5.50
D) 6.30
5 (MC). Using Problems 3 and 4, the variance σ²_X equals:
A) 0.69
B) 0.89
C) 1.09
D) 1.29
6 (MC). The mean of a random variable X is 4 and its standard deviation is 2. If Y = 3X + 1, what are the mean and SD of Y?
A) Mean 13, SD 6
B) Mean 13, SD 7
C) Mean 12, SD 6
D) Mean 13, SD 2
7 (MC). A distribution lists P(0) = 0.5, P(1) = 0.3, P(2) = 0.15, P(3) = 0.10. This distribution is:
A) Valid, with mean 0.85
B) Valid, with mean 0.80
C) Invalid, because probabilities sum to 1.05
D) Invalid, because a probability exceeds 1
8 (MC). "The expected value of a random variable is −$1.50." The best interpretation is:
A) You will lose exactly $1.50 every time
B) On average, over many trials, you lose about $1.50 per trial
C) The most likely outcome is a loss of $1.50
D) The variable can never be positive
9 (Short answer, in context). A coffee shop's number of muffins X bought per customer has the distribution below.
| x | 0 | 1 | 2 | 3 |
|------|------|------|------|------|
| P(x) | 0.40 | 0.35 | 0.20 | 0.05 |
(a) Verify it's a valid distribution. (b) Find μ_X and interpret it. (c) Find σ_X.
10 (Short answer, in context). A $1 raffle ticket has a 1-in-500 chance of winning a $300 prize (and otherwise wins nothing). Let X be your net gain. (a) Write the distribution of X. (b) Find E(X). (c) Is buying a ticket a fair bet? Explain in context.
11 (Short answer). Explain in one sentence why Σ (x − μ_X)·P(x) is not a useful measure of spread, and what we use instead.
12 (Short answer). A random variable has μ_X = 10 and σ_X = 3. A teacher rescales scores with W = 2X + 5. Find μ_W and σ_W.
A regional transportation survey records the number of vehicles X owned by a randomly selected household. Based on the survey, the probability distribution of X is partially given below. The probability that a household owns 2 vehicles was not printed.
| x (vehicles) | 0 | 1 | 2 | 3 | 4 |
|--------------|------|------|------|------|------|
| P(x) | 0.10 | 0.35 | ? | 0.12 | 0.03 |
(a) Determine the missing probability P(2), and explain how you know your value makes this a valid probability distribution. (2 points)
(b) Compute the expected number of vehicles per household, μ_X = E(X). Show your work, and interpret this value in context. (4 points)
(c) Compute the standard deviation σ_X of the number of vehicles per household. Show your work, and interpret this value in context. (4 points)
(a) The probabilities of a valid distribution must sum to 1:
P(2) = 1 − (0.10 + 0.35 + 0.12 + 0.03) = 1 − 0.60 = 0.40.
With P(2) = 0.40, every probability is between 0 and 1, and the total is 0.10 + 0.35 + 0.40 + 0.12 + 0.03 = 1.00, so this is a valid discrete probability distribution.
Completed table:
| x | 0 | 1 | 2 | 3 | 4 |
|------|------|------|------|------|------|
| P(x) | 0.10 | 0.35 | 0.40 | 0.12 | 0.03 |
(b) μ_X = Σ x·P(x):
0 × 0.10 = 0.00
1 × 0.35 = 0.35
2 × 0.40 = 0.80
3 × 0.12 = 0.36
4 × 0.03 = 0.12
-----------------
μ_X = 1.63 vehicles
Interpretation: Over many randomly selected households, the average number of vehicles owned is about 1.63 vehicles per household in the long run. (No single household owns 1.63 vehicles; this is a long-run average.)
(c) σ²_X = Σ (x − μ_X)²·P(x) with μ_X = 1.63:
x | (x − 1.63)² | × P(x) | product
--|-------------|----------|--------
0 | 2.6569 | × 0.10 | 0.26569
1 | 0.3969 | × 0.35 | 0.138915
2 | 0.1369 | × 0.40 | 0.054760
3 | 1.8769 | × 0.12 | 0.225228
4 | 5.6169 | × 0.03 | 0.168507
----------
σ²_X = 0.853100
σ_X = √0.8531 = 0.9236… ≈ 0.924 vehicles.
Calculator confirmation: L1 = {0,1,2,3,4}, L2 = {.10,.35,.40,.12,.03}, 1-Var Stats L1,L2 gives x̄ = 1.63 and σx ≈ 0.9236.
Interpretation: The number of vehicles owned by a typical household differs from the mean of 1.63 by about 0.92 vehicles.
Part (a) — 2 points
P(2) = 0.40 (from setting the sum equal to 1).Part (b) — 4 points
μ_X = Σ x·P(x) (formula or correct products shown).μ_X = 1.63.Part (c) — 4 points
σ²_X = Σ (x − μ_X)²·P(x) using the mean from (b) (deviations squared and weighted).σ²_X ≈ 0.853.σ_X ≈ 0.92 (accept 0.92–0.924).Where students lose points:
P(2) → loses both (a) points and corrupts (b) and (c).P(x), in part (c) → loses setup and value points.Sx (sample SD) from the calculator instead of σx → wrong value, loses the (c) value point.1. B. Time is measured on a continuous scale (any value in an interval), so it is continuous. A (counts of texts), C (counts of cards), and D (counts of siblings) are all countable → discrete.
2. B. The two defining rules: 0 ≤ P(x) ≤ 1 and Σ P(x) = 1. A describes a uniform distribution only (not required). C is false — E(X) need not be a listed value (e.g., 1.6 pizzas). D is irrelevant to validity.
3. B. E(X) = 1(0.3) + 2(0.5) + 4(0.2) = 0.3 + 1.0 + 0.8 = 2.1.
- Distractor A (2.0): rounds/drops a term. C (2.3) and D (2.5): arithmetic slips treating the weights unequally.
4. C. E(X²) = 1²(0.3) + 2²(0.5) + 4²(0.2) = 0.3 + 2.0 + 3.2 = 5.5.
- Distractor A (4.41) is μ² (mixing up E(X²) with [E(X)]²). B and D are arithmetic slips on the squared terms.
5. C. σ²_X = E(X²) − μ² = 5.5 − (2.1)² = 5.5 − 4.41 = 1.09.
- Distractors A (0.69) and B (0.89) come from using a wrong E(X²); D (1.29) from a slip in μ². (Direct check: (1−2.1)²(0.3) + (2−2.1)²(0.5) + (4−2.1)²(0.2) = 1.21(0.3) + 0.01(0.5) + 3.61(0.2) = 0.363 + 0.005 + 0.722 = 1.09 ✓.)
6. A. For Y = 3X + 1: μ_Y = 3(4) + 1 = 13; σ_Y = |3|·σ_X = 3(2) = 6. Mean 13, SD 6.
- B (SD 7): wrongly adds the +1 to the SD. C (mean 12): forgets the +1 in the mean. D (SD 2): forgets to scale the SD by 3.
7. C. Sum = 0.5 + 0.3 + 0.15 + 0.10 = 1.05 ≠ 1 → invalid.
- A and B assume validity (wrong). D is false — no single probability exceeds 1; the problem is the sum.
8. B. Expected value is a long-run average per trial. A overstates it as certain; C confuses it with the mode/most likely value; D is unrelated.
9. (a) Sum = 0.40 + 0.35 + 0.20 + 0.05 = 1.00 ✓ and each P(x) ∈ [0,1] → valid.
(b) μ_X = 0(0.40) + 1(0.35) + 2(0.20) + 3(0.05) = 0 + 0.35 + 0.40 + 0.15 = 0.90 muffins. Interpretation: customers average about 0.9 muffins each over the long run.
(c) σ²_X = Σ (x − 0.9)²P(x):
`
(0−0.9)² = 0.81 × 0.40 = 0.324
(1−0.9)² = 0.01 × 0.35 = 0.0035
(2−0.9)² = 1.21 × 0.20 = 0.242
(3−0.9)² = 4.41 × 0.05 = 0.2205
--------
σ²_X = 0.790
`
σ_X = √0.790 = 0.8888… ≈ 0.889 muffins. (Check: E(X²) = 0 + 0.35 + 0.80 + 0.45 = 1.60; 1.60 − 0.81 = 0.79 ✓.)
10. (a) Net gain: win = $300 − $1 = +$299 with prob 1/500 = 0.002; lose = −$1 with prob 499/500 = 0.998.
`
| x ($) | +299 | −1 |
|-------|-------|-------|
| P(x) | 0.002 | 0.998 |
`
(b) E(X) = 299(0.002) + (−1)(0.998) = 0.598 − 0.998 = −0.40, i.e. −$0.40.
(c) Not fair (and not favorable): on average a buyer loses $0.40 per ticket over the long run. A fair bet would have E(X) = 0.
11. Because positive and negative deviations cancel, Σ (x − μ_X)·P(x) always equals 0 for any distribution, so it carries no information about spread. We square the deviations first — using σ²_X = Σ (x − μ_X)²·P(x) — to prevent the cancellation.
12. For W = 2X + 5: μ_W = 2(10) + 5 = 25; σ_W = |2|·σ_X = 2(3) = 6. (The +5 shifts the mean but does not affect the SD.)
---
StatsIQ · Lesson 11 of 30 · Unit 2: Probability, Random Variables, and Probability Distributions · Phase 2
This lesson is independent study material aligned to the new (2026–27) AP Statistics Course and Exam Description. "AP" and "AP Statistics" are trademarks of the College Board, which was not involved in the production of, and does not endorse, this product.
Accuracy review: All expected values, variances, and standard deviations were independently recomputed by hand and cross-checked with the E(X²) − μ² shortcut and the TI-84 1-Var Stats L1,L2 method. Reviewed for statistical accuracy by Isaac (retired actuary).