Our Science
Built on real College Board data,
not guesswork.
Every question in FinishStrong is traceable to a specific College Board testing point. Every answer is verified by multiple AI models. Every skill is tracked with probabilistic mastery estimation.
13,187
CB Questions
37
Test Administrations
95
Skills Tracked
8
Score Curves
The Engine
Item Response Theory (IRT 3PL)
Item Response Theory is the statistical framework that College Board uses to score the SAT. It treats every question as having measurable psychometric properties, not just "easy" or "hard." The three-parameter logistic model (3PL) captures three dimensions of every question:
Where on the ability scale a question separates students who know the material from those who don't. Calibrated against real student populations.
How sharply a question distinguishes between students of different ability levels. High discrimination = questions that truly test understanding.
The probability of a correct answer from pure guessing. For 4-choice multiple choice, c = 0.25. Free response questions have c = 0.
How we use it:FinishStrong selects questions where your probability of answering correctly is approximately 70%. This isn't arbitrary. Vygotsky's Zone of Proximal Development (1978) established that learners grow fastest when challenged just beyond their current ability. Bjork's research on desirable difficulties (1994) confirmed that retrieval attempts with a moderate failure rate produce stronger, more durable learning than easy repetition.
At 70% success probability, students are challenged enough to engage deeply but succeed often enough to build confidence. Questions that are too easy (90%+) don't create learning. Questions that are too hard (below 40%) create frustration and learned helplessness. The 70% target is the sweet spot where the brain works hardest to encode new knowledge.
The IRT model is recalibrated nightly against incoming response data, ensuring that question parameters stay accurate as more students use the system. This is the same approach College Board uses to equate scores across test administrations.
Lord, F.M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Mastery Model
Bayesian Knowledge Tracing (BKT)
How do you know whether a student actually understands a concept, or just got lucky? Bayesian Knowledge Tracing answers this question using probability theory. After every single response, BKT updates a probabilistic model of what the student knows, accounting for four factors:
The probability the student already knew the skill before this response. Starts low for new skills, increases with correct answers.
The probability the student learned the skill from this interaction. Different students learn at different rates — BKT captures this.
The probability a student who knows the material gets it wrong anyway — careless errors, misreading, time pressure. Accounts for realistic test conditions.
The probability a student who doesn't know the material gets it right by chance. Multiple choice has higher guess rates than free response.
Mastery progression: BKT produces a continuous probability (0 to 1) of mastery for each of 95 SAT skills. FinishStrong maps these to four levels that students can see and track:
Unstarted
P(K) = 0
Familiar
P(K) > 0.3
Proficient
P(K) > 0.6
Mastered
P(K) > 0.85
Unlike simple percentage scores ("you got 7/10 right"), BKT accounts for the difficulty of questions attempted, the possibility of lucky guesses, and the probability of careless errors. A student who gets 3 hard questions right has a higher P(K) than one who gets 5 easy questions right. This is a more honest and useful representation of what a student actually knows.
BKT also drives the prerequisite graph: a student can't unlock "Systems of Equations" until they've demonstrated proficiency in "Linear Equations in One Variable." This prevents the frustration of encountering material you don't have the foundation for.
Corbett, A.T. & Anderson, J.R. (1995). Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction, 4(4), 253-278.
Memory Science
Spaced Repetition (SM-2)
Hermann Ebbinghaus discovered the forgetting curve in 1885: without review, you lose 80% of new information within 48 hours. But a single review at the right moment can reset the curve, and each subsequent review extends the interval before forgetting occurs. This is the foundation of spaced repetition.
FinishStrong uses the SM-2 algorithm (originally developed by Piotr Wozniak for SuperMemo) to schedule skill reviews. After you demonstrate competence in a skill, SM-2 determines when you'll start to forget it and schedules a review just before that happens.
HOW INTERVALS GROW
Our modification:Standard SM-2 uses a quality grade from 0-5 based on recall difficulty. FinishStrong maps your confidence rating plus your correctness to this quality grade. If you said "certain" and got it right, that's a 5 — the interval grows maximally. If you said "certain" but got it wrong, that's a 0 — the interval resets completely. This integration of metacognitive data into the scheduling algorithm is unique to FinishStrong.
Research consistently shows that spaced practice produces 200% or greater retention compared to massed practice (cramming). Rohrer & Taylor (2007) demonstrated this specifically in mathematics: students who practiced with spacing outperformed those who massed their practice by over 3x on delayed tests.
The practical impact: a student who practices 5 minutes daily for 60 days retains dramatically more than one who crams for 5 hours the week before the test. Spaced repetition is why FinishStrong is designed as a daily habit, not a weekend marathon.
Pimsleur, P. (1967). A Memory Schedule. Modern Language Journal, 51(2), 73-75.
Data Foundation
Real College Board Data Pipeline
We don't generate random questions and hope they're relevant. FinishStrong is built on a foundation of real College Board data that ensures every question maps to what the SAT actually tests.
Reference Questions
From 37 SAT administrations spanning 2018-2025. Every question tagged to a specific CB testing point, domain, section, and difficulty level.
Performance Benchmarks
Population-level accuracy rates for each skill from real test administrations. We know what percentage of students typically answer correctly for every skill at every difficulty level.
Equating Curves
Score conversion curves from actual SAT administrations. These map raw performance to scaled scores the same way College Board does, enabling accurate score prediction.
Atomic Skills
The SAT doesn't test 'math' — it tests 95 specific sub-skills organized into 8 domains. Each skill has prerequisites forming a directed acyclic graph that mirrors how mathematical and verbal knowledge builds.
COLLEGE BOARD STANDARD
MEDIUMAlgebra: Linear equations in 1 variable
Students create, solve, or interpret linear equations in one variable
Tested on March 2023, October 2022, May 2022 SATs
Quality Gate
Dual-Model Verification
Every question in FinishStrong goes through a multi-stage quality pipeline before any student sees it. This isn't a nice-to-have — an incorrect answer on a prep question can actively harm a student's learning by creating false associations. We treat question correctness as a zero-tolerance requirement.
Stage 1 — Dual-model verification: Every question is independently solved by OpenAI (GPT-4o) and Google (Gemini). Both models must agree on the correct answer. If they disagree, the question is flagged for manual review. To date, 956 questions have been verified at 100% confidence through this process.
Stage 2 — AI Student Panel: Every verified question is then evaluated by 5 AI student personas, each representing a different learner profile. This catches issues that correct-answer verification misses: confusing wording, culturally biased assumptions, ambiguous phrasing, and difficulty miscalibration.
Maya
16, strong reader, math anxiety, first-gen college student
Jake
17, math whiz, rushes through reading passages, retaking SAT
Priya
15, high achiever, makes careless errors on questions she considers easy
Devon
17, ADHD, brilliant but inconsistent, needs high engagement to focus
Aaliyah
16, ESL learner, strong grammar intuition, weak academic vocabulary
Each persona rates the question on clarity, engagement, difficulty calibration, and real-world relevance. Questions must score an average of 7/10 or higher to enter the active corpus. Questions below the threshold are rejected and regenerated.
AVERAGE PANEL RATING
8.0/10
Questions scoring below 7/10 are rejected and regenerated
Test-Taking Science
Strategy Coaching on Every Question
Millman, Bishop & Ebel (1965) established that test-wiseness is a "distinct cognitive skill, separate from content mastery." A test-wise student performs better on the same content than an equally knowledgeable but less strategic peer. FinishStrong teaches test-taking technique alongside every single question — not as an add-on, but as a core part of how each question coaches.
Research shows strategy contributes 30-50% of SAT score improvement depending on starting level, with the biggest impact in the 1000-1400 range. We don't just teach what the SAT tests — we teach how to take the SAT.
Elimination
7 strategiesProcess of elimination, extreme eliminators, half-right rejectors, scope checking
Time Management
5 strategiesThree-pass system, flag and move, time allocation, module urgency
Reading
6 strategiesQuestion-first, bookends, evidence grounding, hypothesis testing
Math Solving
7 strategiesDesmos graphing, backsolving, pick numbers, estimation, verification
Grammar
3 strategiesError type identification, period test, parallel structure check
Metacognition
3 strategies3-second verify, check what they asked, confidence calibration
The dual-model pipeline: Every strategy tip is generated by one AI model (GPT-4o) and independently verified by a different model (Gemini). Tips that are generic, inaccurate, or inapplicable are rejected. Only tips that pass both models reach students. This same peer-review principle applies to all 1,681+ questions in the corpus.
EXAMPLE COACH TIP
"Use 'Backsolving (Plug In Answers)': pick a safe value like x = 1, calculate the original expression, then check which answer choice gives the same value. No algebra needed."
Strategy: Pick Numbers · Skill: Rational Expressions · Impact: Saves 60-90 seconds per question
The 31-strategy taxonomy covers every question type on the digital SAT. Each strategy has research citations, applicability rules, estimated time savings, and hours-to-automaticity targets. Students don't just learn strategies — they practice them on real questions until the techniques become automatic.
Millman, J., Bishop, C.H. & Ebel, R. (1965). An Analysis of Test-Wiseness. Educational and Psychological Measurement, 25(3), 707-726.
The Differentiator
Confidence Calibration
Before answering every question, FinishStrong asks: "How confident are you?" This isn't decorative. It's training the most undervalued skill in test preparation: metacognition — knowing what you know.
The Education Endowment Foundation (EEF) found that metacognitive strategies produce an average of +8 months of academic progress per year — one of the highest-impact interventions in all of education. Despite this, no major SAT prep platform trains metacognition systematically. FinishStrong does.
How the bet mechanic works:Before each question, you choose a confidence level — "guessing," "maybe," "likely," or "certain." Higher confidence stakes more XP: get it right and you earn a multiplier; get it wrong and you lose more than if you'd been honest about uncertainty. This creates a genuine incentive to calibrate accurately rather than default to "certain" every time.
WHAT CALIBRATION REVEALS
Over time, the calibration dashboard tracks how well your confidence predicts your accuracy. A well-calibrated student knows exactly which questions to spend more time on during the real SAT and which to answer quickly. This is the difference between a student who "knows the material" and one who maximizes their score.
No other SAT prep app teaches this skill.
Education Endowment Foundation (2021). Metacognition and Self-Regulated Learning: Guidance Report. London: EEF.
References
Research Citations
See the science in action.
Try a session. Watch the IRT engine select questions at your level. See BKT update your mastery in real time. Feel the difference.