Your school's activities office has a spreadsheet of 2,400 club-membership records: student name, grade, club, hours volunteered. The principal asks: "Which seniors volunteered more than 20 hours, sorted by hours?"
Doing this by hand means scanning 2,400 rows, copying qualifying ones, then ordering them — twenty error-prone minutes. A program (or spreadsheet feature) does it in milliseconds: filter grade = 12, filter hours > 20, sort by hours descending. Three operations, composable like LEGO bricks, each simple alone and powerful chained.
This lesson is about those bricks. The exam will hand you a small table and a described chain of operations; your job is to execute the chain exactly — no skipped steps, no assumptions.
Data-processing questions present data as a table: each row is a record (one student, one song, one purchase); each column is a field (name, grade, price). Programs process such tables with a small set of operations:
grade = 12, price < 10). Output: a smaller table, same columns, original order preserved among survivors.These operations are the "extracting information" machinery from Lesson 5 made concrete.
Table: clubRecords
| Name | Grade | Club | Hours |
|---------|-------|----------|-------|
| Aisha | 12 | Robotics | 34 |
| Ben | 11 | Chess | 22 |
| Carmen | 12 | Robotics | 18 |
| Diego | 12 | Art | 25 |
| Elena | 10 | Chess | 40 |
Filter Grade = 12: keep Aisha, Carmen, Diego. Filter that result by Hours > 20: keep Aisha (34) and Diego (25) — Carmen's 18 fails. Two rows survive.
Discipline points: check every row against the condition; respect strict vs. inclusive comparisons (> 20 excludes a row with exactly 20; ≥ 20 includes it — Lesson 2's boundary obsession again); multiple conditions joined by AND require all to hold, OR requires at least one.
Sort the filtered result by Hours descending: Aisha (34), then Diego (25). Sorting doesn't add or remove rows — if your sorted table has a different row count than before, you dropped something.
Ties: if two rows share the sort value, the question will either not care or specify a tiebreaker. Don't invent one.
Filter-then-sort vs. sort-then-filter: for keeping purposes the final set is the same; but questions asking "which record is FIRST after these operations" depend on the exact sequence — execute in the stated order, literally.
"How many juniors?" = filter + count. "Average hours of Robotics members?" = filter Club = Robotics → mean of Hours → (34 + 18) / 2 = 26. The classic error: computing the aggregate over the whole table instead of the filtered rows. Filter first, then aggregate what remains.
Two datasets can be combined when they share a common field. The activities table has student names and clubs; the main-office table has names and email addresses. Shared field: name. Combining unlocks questions neither table answers alone ("email every Robotics member").
CED-level claims: combining data sources is a named way to gain insight; the shared identifier is what makes the join possible; and combined datasets can reveal more about individuals than either source alone (a privacy concern that returns in Lesson 23 — innocuous datasets can combine into invasive ones).
The CED's point isn't that filtering is clever — it's that the size of real datasets makes programs the only viable tool. A human can filter 5 rows; only a program filters 5 billion. Bonus claims: programs apply the condition identically to every row (no fatigue errors), and the same program re-runs instantly when data updates. When a question asks why an organization processes data with software, the answer is scale + consistency + repeatability.
Problem: Using clubRecords above, how many rows does the filter Club = "Chess" keep?
Solution: Check all five rows: Ben ✓, Elena ✓, others ✗. 2 rows.
Interpretation: Yes, it's this mechanical. Full credit goes to those who check every row rather than stopping at the first match.
Problem: Starting from clubRecords: filter Hours ≥ 22, then sort by Name ascending. List the resulting Name column, in order.
Strategy: Execute in order. Filter first — carefully at the boundary — then alphabetize survivors.
Solution: Filter Hours ≥ 22: Aisha (34) ✓, Ben (22) ✓ (inclusive — 22 stays), Carmen (18) ✗, Diego (25) ✓, Elena (40) ✓. Sort by Name ascending: Aisha, Ben, Diego, Elena.
Interpretation: The planted trap was Ben: ≥ 22 keeps exactly-22. Had the filter been > 22, Ben drops. One symbol, different answer — read the comparison operator like it's radioactive.
Problem: What is the average Hours among students in grade 12?
Solution: Filter Grade = 12 → Aisha 34, Carmen 18, Diego 25. Mean = (34 + 18 + 25) / 3 = 77 / 3 ≈ 25.67.
Interpretation: The tempting error: averaging all five rows ((34+22+18+25+40)/5 = 27.8) — a listed distractor whenever this question appears. Filter first. Aggregate second.
Problem: A library has two tables. Table 1: cardNumber, name, gradeLevel. Table 2: cardNumber, bookTitle, dueDate. Which question can be answered ONLY by combining both tables?
(A) How many books are due this week? (B) How many students are in grade 10? (C) Which grade level has the most overdue books? (D) What is the most-borrowed book title?
Solution: (C). Grade level lives only in Table 1; overdue status (dueDate) lives only in Table 2; connecting them requires matching rows via the shared cardNumber. (A) and (D) need only Table 2; (B) needs only Table 1.
Interpretation: The combine-tables question always works this way: find the answer choice whose two required fields live in different tables. Name the shared field to confirm the join is possible.
> 20 vs ≥ 20 on a row with exactly 20. The exam plants a boundary row nearly every time. Circle the operator, check that row explicitly.Problems 1–7 use this table of a music app's playHistory:
| Song | Artist | Genre | Plays | Minutes |
|---|---|---|---|---|
| Vega | Lumen | Pop | 41 | 3 |
| Slate | Korrid | Rock | 18 | 4 |
| Ember | Lumen | Pop | 30 | 3 |
| Quill | Fenwick | Jazz | 18 | 5 |
| Aurora | Korrid | Rock | 55 | 4 |
| Drift | Fenwick | Jazz | 12 | 6 |
Genre = "Rock" keep?1. (B). Slate and Aurora. Mechanical scan of all six rows.
Plays > 18, which songs remain?2. (A). Plays > 18 (strict): Vega 41 ✓, Slate 18 ✗ (boundary row — strict excludes it), Ember 30 ✓, Quill 18 ✗, Aurora 55 ✓, Drift 12 ✗. (B) treats > as ≥ — the planted trap.
3. (A). Descending Plays: Aurora 55, Vega 41, Ember 30, Slate 18 / Quill 18, Drift 12. Second = Vega. (B) is first, not second — read the position asked.
Artist = "Lumen", then compute total Plays of the remaining rows:4. (C). Lumen rows: Vega 41 + Ember 30 = 71. (B) totals the whole table (aggregate-before-filter error); (A) takes one row only.
5. (A). Goal = Jazz only (filter) in play order (sort descending). (B) filters by artist, not genre — Fenwick happens to be Jazz in this table, but the operation doesn't express the goal and would break if Fenwick released a pop song. Express the condition you mean. (C) keeps non-jazz; (D) sorts by the wrong column.
Plays ≥ 30?6. (B). Plays ≥ 30: Vega (3 min), Ember (3), Aurora (4). Mean = (3+3+4)/3 = 10/3 ≈ 3.33. (A) averages Minutes over all six rows (≈ 4.17) — filter first! (D) is the sum, not the mean.
Artist, Country, DebutYear. Which question requires combining it with playHistory?7. (B). "Plays" lives in playHistory; "Country" lives in the artist table; the join key is Artist. (A) and (D) need only the second table; (C) needs only the first.
8. (A) and (B). Scale and consistency — the CED's stated reasons. (C) false: cleaning is still on you. (D) false: processing can't repair collection bias (Lesson 5).
Grade ≥ 11 AND Hours > 30. A student in grade 11 with exactly 30 hours is:9. (B). AND requires both. Grade 11 ✓ (≥ 11 inclusive), Hours exactly 30 fails the strict > 30 → excluded. This question is Lessons 2 + 6 shaking hands: boundaries and logic.
10. (B). Ascending = smallest first; sorting preserves row count. (A) invents merging; (C) is descending; (D) denies the sort.
11. (A). Shared ID joins the tables; the combined profile exceeds either source — the CED's stated insight and privacy warning (returns in Lesson 23).
12. (Model answer.) Filter Genre = "Pop", then sort by Song ascending. Result: Ember, Vega. (Sort-then-filter also earns credit if the final list is correct — but say the operations explicitly.)
Answer letter distribution check: B, A, A, C, A, B, B, A+B, B, B, A, — singles: A×4, B×5, C×1, D×0 + multi (A,B). Cumulative through L6: D remains underweight; L7's key is engineered D-heavy (flagged for the final sweep).
12 (short response). Using playHistory: describe a chain of two operations that produces the Pop songs ordered alphabetically, and state the resulting Song column.
Filtering, aggregating, and processing a list of data is exactly the shape of a strong Create PT program — and of the list-traversal algorithms coming in Lesson 12. The PT rubric wants a list that manages complexity plus a procedure implementing an algorithm with selection and iteration. "Loop through my list, keep/count/total the entries matching a condition" is the canonical way to satisfy all of it at once.
Start keeping a PT idea list now, in this shape: what data (list) → what question (filter/aggregate) → what output. Examples: quiz scores → how many above the class average → display the count; pantry items → which expire this week → display the list. By Lesson 14 you'll wrap one of these in a procedure, and your PT skeleton will exist.
1. (B). Slate and Aurora. Mechanical scan of all six rows.
2. (A). Plays > 18 (strict): Vega 41 ✓, Slate 18 ✗ (boundary row — strict excludes it), Ember 30 ✓, Quill 18 ✗, Aurora 55 ✓, Drift 12 ✗. (B) treats > as ≥ — the planted trap.
3. (A). Descending Plays: Aurora 55, Vega 41, Ember 30, Slate 18 / Quill 18, Drift 12. Second = Vega. (B) is first, not second — read the position asked.
4. (C). Lumen rows: Vega 41 + Ember 30 = 71. (B) totals the whole table (aggregate-before-filter error); (A) takes one row only.
5. (A). Goal = Jazz only (filter) in play order (sort descending). (B) filters by artist, not genre — Fenwick happens to be Jazz in this table, but the operation doesn't express the goal and would break if Fenwick released a pop song. Express the condition you mean. (C) keeps non-jazz; (D) sorts by the wrong column.
6. (B). Plays ≥ 30: Vega (3 min), Ember (3), Aurora (4). Mean = (3+3+4)/3 = 10/3 ≈ 3.33. (A) averages Minutes over all six rows (≈ 4.17) — filter first! (D) is the sum, not the mean.
7. (B). "Plays" lives in playHistory; "Country" lives in the artist table; the join key is Artist. (A) and (D) need only the second table; (C) needs only the first.
8. (A) and (B). Scale and consistency — the CED's stated reasons. (C) false: cleaning is still on you. (D) false: processing can't repair collection bias (Lesson 5).
9. (B). AND requires both. Grade 11 ✓ (≥ 11 inclusive), Hours exactly 30 fails the strict > 30 → excluded. This question is Lessons 2 + 6 shaking hands: boundaries and logic.
10. (B). Ascending = smallest first; sorting preserves row count. (A) invents merging; (C) is descending; (D) denies the sort.
11. (A). Shared ID joins the tables; the combined profile exceeds either source — the CED's stated insight and privacy warning (returns in Lesson 23).
12. (Model answer.) Filter Genre = "Pop", then sort by Song ascending. Result: Ember, Vega. (Sort-then-filter also earns credit if the final list is correct — but say the operations explicitly.)
Answer letter distribution check: B, A, A, C, A, B, B, A+B, B, B, A, — singles: A×4, B×5, C×1, D×0 + multi (A,B). Cumulative through L6: D remains underweight; L7's key is engineered D-heavy (flagged for the final sweep).
Exam tip: For table questions, work with your pencil ON the table: strike out filtered rows, number the sort order in the margin. Never execute two operations in your head at once — the questions are engineered so that shortcut-takers pick the distractor that skips a step.