Our Science

Built on real College Board data,
not guesswork.

Every question in FinishStrong is traceable to a specific College Board testing point. Every answer is verified by multiple AI models. Every skill is tracked with probabilistic mastery estimation.

13,187

CB Questions

Test Administrations

Skills Tracked

Score Curves

The Engine

Item Response Theory (IRT 3PL)

Item Response Theory is the statistical framework that College Board uses to score the SAT. It treats every question as having measurable psychometric properties, not just "easy" or "hard." The three-parameter logistic model (3PL) captures three dimensions of every question:

bDifficulty

Where on the ability scale a question separates students who know the material from those who don't. Calibrated against real student populations.

aDiscrimination

How sharply a question distinguishes between students of different ability levels. High discrimination = questions that truly test understanding.

cGuessing

The probability of a correct answer from pure guessing. For 4-choice multiple choice, c = 0.25. Free response questions have c = 0.

How we use it:FinishStrong selects questions where your probability of answering correctly is approximately 70%. This isn't arbitrary. Vygotsky's Zone of Proximal Development (1978) established that learners grow fastest when challenged just beyond their current ability. Bjork's research on desirable difficulties (1994) confirmed that retrieval attempts with a moderate failure rate produce stronger, more durable learning than easy repetition.

At 70% success probability, students are challenged enough to engage deeply but succeed often enough to build confidence. Questions that are too easy (90%+) don't create learning. Questions that are too hard (below 40%) create frustration and learned helplessness. The 70% target is the sweet spot where the brain works hardest to encode new knowledge.

The IRT model is recalibrated nightly against incoming response data, ensuring that question parameters stay accurate as more students use the system. This is the same approach College Board uses to equate scores across test administrations.

Lord, F.M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

Mastery Model

Bayesian Knowledge Tracing (BKT)

How do you know whether a student actually understands a concept, or just got lucky? Bayesian Knowledge Tracing answers this question using probability theory. After every single response, BKT updates a probabilistic model of what the student knows, accounting for four factors:

P(K)Prior Knowledge

The probability the student already knew the skill before this response. Starts low for new skills, increases with correct answers.

P(T)Learning Rate

The probability the student learned the skill from this interaction. Different students learn at different rates — BKT captures this.

P(S)Slip Rate

The probability a student who knows the material gets it wrong anyway — careless errors, misreading, time pressure. Accounts for realistic test conditions.

P(G)Guess Rate

The probability a student who doesn't know the material gets it right by chance. Multiple choice has higher guess rates than free response.

Mastery progression: BKT produces a continuous probability (0 to 1) of mastery for each of 95 SAT skills. FinishStrong maps these to four levels that students can see and track:

Unstarted

P(K) = 0

Familiar

P(K) > 0.3

Proficient

P(K) > 0.6

Mastered

P(K) > 0.85

Unlike simple percentage scores ("you got 7/10 right"), BKT accounts for the difficulty of questions attempted, the possibility of lucky guesses, and the probability of careless errors. A student who gets 3 hard questions right has a higher P(K) than one who gets 5 easy questions right. This is a more honest and useful representation of what a student actually knows.

BKT also drives the prerequisite graph: a student can't unlock "Systems of Equations" until they've demonstrated proficiency in "Linear Equations in One Variable." This prevents the frustration of encountering material you don't have the foundation for.

Corbett, A.T. & Anderson, J.R. (1995). Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction, 4(4), 253-278.

Memory Science

Spaced Repetition (SM-2)

Hermann Ebbinghaus discovered the forgetting curve in 1885: without review, you lose 80% of new information within 48 hours. But a single review at the right moment can reset the curve, and each subsequent review extends the interval before forgetting occurs. This is the foundation of spaced repetition.

FinishStrong uses the SM-2 algorithm (originally developed by Piotr Wozniak for SuperMemo) to schedule skill reviews. After you demonstrate competence in a skill, SM-2 determines when you'll start to forget it and schedules a review just before that happens.

HOW INTERVALS GROW

1st review

1 day

2nd review

3 days

3rd review

7 days

4th review

14 days

5th review

30 days

6th review

60+ days

Our modification:Standard SM-2 uses a quality grade from 0-5 based on recall difficulty. FinishStrong maps your confidence rating plus your correctness to this quality grade. If you said "certain" and got it right, that's a 5 — the interval grows maximally. If you said "certain" but got it wrong, that's a 0 — the interval resets completely. This integration of metacognitive data into the scheduling algorithm is unique to FinishStrong.

Research consistently shows that spaced practice produces 200% or greater retention compared to massed practice (cramming). Rohrer & Taylor (2007) demonstrated this specifically in mathematics: students who practiced with spacing outperformed those who massed their practice by over 3x on delayed tests.

The practical impact: a student who practices 5 minutes daily for 60 days retains dramatically more than one who crams for 5 hours the week before the test. Spaced repetition is why FinishStrong is designed as a daily habit, not a weekend marathon.

Pimsleur, P. (1967). A Memory Schedule. Modern Language Journal, 51(2), 73-75.

Data Foundation

Real College Board Data Pipeline

We don't generate random questions and hope they're relevant. FinishStrong is built on a foundation of real College Board data that ensures every question maps to what the SAT actually tests.

13,187

Reference Questions

From 37 SAT administrations spanning 2018-2025. Every question tagged to a specific CB testing point, domain, section, and difficulty level.

Performance Benchmarks

Population-level accuracy rates for each skill from real test administrations. We know what percentage of students typically answer correctly for every skill at every difficulty level.

Equating Curves

Score conversion curves from actual SAT administrations. These map raw performance to scaled scores the same way College Board does, enabling accurate score prediction.

Atomic Skills

The SAT doesn't test 'math' — it tests 95 specific sub-skills organized into 8 domains. Each skill has prerequisites forming a directed acyclic graph that mirrors how mathematical and verbal knowledge builds.

COLLEGE BOARD STANDARD

MEDIUM

Algebra: Linear equations in 1 variable

Students create, solve, or interpret linear equations in one variable

Appears ~16 per test63% answer correctly

Tested on March 2023, October 2022, May 2022 SATs

Quality Gate

Dual-Model Verification

Every question in FinishStrong goes through a multi-stage quality pipeline before any student sees it. This isn't a nice-to-have — an incorrect answer on a prep question can actively harm a student's learning by creating false associations. We treat question correctness as a zero-tolerance requirement.

Stage 1 — Dual-model verification: Every question is independently solved by OpenAI (GPT-4o) and Google (Gemini). Both models must agree on the correct answer. If they disagree, the question is flagged for manual review. To date, 956 questions have been verified at 100% confidence through this process.

Stage 2 — AI Student Panel: Every verified question is then evaluated by 5 AI student personas, each representing a different learner profile. This catches issues that correct-answer verification misses: confusing wording, culturally biased assumptions, ambiguous phrasing, and difficulty miscalibration.

Maya

16, strong reader, math anxiety, first-gen college student

Jake

17, math whiz, rushes through reading passages, retaking SAT

Priya

15, high achiever, makes careless errors on questions she considers easy

Devon

17, ADHD, brilliant but inconsistent, needs high engagement to focus

Aaliyah

16, ESL learner, strong grammar intuition, weak academic vocabulary

Each persona rates the question on clarity, engagement, difficulty calibration, and real-world relevance. Questions must score an average of 7/10 or higher to enter the active corpus. Questions below the threshold are rejected and regenerated.

AVERAGE PANEL RATING

8.0/10

Questions scoring below 7/10 are rejected and regenerated

Test-Taking Science

Strategy Coaching on Every Question

Millman, Bishop & Ebel (1965) established that test-wiseness is a "distinct cognitive skill, separate from content mastery." A test-wise student performs better on the same content than an equally knowledgeable but less strategic peer. FinishStrong teaches test-taking technique alongside every single question — not as an add-on, but as a core part of how each question coaches.

Research shows strategy contributes 30-50% of SAT score improvement depending on starting level, with the biggest impact in the 1000-1400 range. We don't just teach what the SAT tests — we teach how to take the SAT.

Elimination

7 strategies

Process of elimination, extreme eliminators, half-right rejectors, scope checking

Time Management

5 strategies

Three-pass system, flag and move, time allocation, module urgency

Reading

6 strategies

Question-first, bookends, evidence grounding, hypothesis testing

Math Solving

7 strategies

Desmos graphing, backsolving, pick numbers, estimation, verification

Grammar

3 strategies

Error type identification, period test, parallel structure check

Metacognition

3 strategies

3-second verify, check what they asked, confidence calibration

The dual-model pipeline: Every strategy tip is generated by one AI model (GPT-4o) and independently verified by a different model (Gemini). Tips that are generic, inaccurate, or inapplicable are rejected. Only tips that pass both models reach students. This same peer-review principle applies to all 1,681+ questions in the corpus.

🎯

EXAMPLE COACH TIP

"Use 'Backsolving (Plug In Answers)': pick a safe value like x = 1, calculate the original expression, then check which answer choice gives the same value. No algebra needed."

Strategy: Pick Numbers · Skill: Rational Expressions · Impact: Saves 60-90 seconds per question

The 31-strategy taxonomy covers every question type on the digital SAT. Each strategy has research citations, applicability rules, estimated time savings, and hours-to-automaticity targets. Students don't just learn strategies — they practice them on real questions until the techniques become automatic.

Millman, J., Bishop, C.H. & Ebel, R. (1965). An Analysis of Test-Wiseness. Educational and Psychological Measurement, 25(3), 707-726.

The Differentiator

Confidence Calibration

Before answering every question, FinishStrong asks: "How confident are you?" This isn't decorative. It's training the most undervalued skill in test preparation: metacognition — knowing what you know.

The Education Endowment Foundation (EEF) found that metacognitive strategies produce an average of +8 months of academic progress per year — one of the highest-impact interventions in all of education. Despite this, no major SAT prep platform trains metacognition systematically. FinishStrong does.

How the bet mechanic works:Before each question, you choose a confidence level — "guessing," "maybe," "likely," or "certain." Higher confidence stakes more XP: get it right and you earn a multiplier; get it wrong and you lose more than if you'd been honest about uncertainty. This creates a genuine incentive to calibrate accurately rather than default to "certain" every time.

WHAT CALIBRATION REVEALS

OverconfidentSays 'certain,' answers wrong → study this skill harder

UnderconfidentSays 'guessing,' answers right → you know more than you think

Well-calibratedConfidence matches accuracy → metacognitive mastery

Over time, the calibration dashboard tracks how well your confidence predicts your accuracy. A well-calibrated student knows exactly which questions to spend more time on during the real SAT and which to answer quickly. This is the difference between a student who "knows the material" and one who maximizes their score.

No other SAT prep app teaches this skill.

Education Endowment Foundation (2021). Metacognition and Self-Regulated Learning: Guidance Report. London: EEF.

References

Research Citations

Lord, F.M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

Corbett, A.T. & Anderson, J.R. (1995). Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction, 4(4), 253-278.

Pimsleur, P. (1967). A Memory Schedule. Modern Language Journal, 51(2), 73-75.

Vygotsky, L.S. (1978). Mind in Society: The Development of Higher Psychological Processes. Cambridge, MA: Harvard University Press.

Bjork, R.A. (1994). Memory and Metamemory Considerations in the Training of Human Beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about Knowing.

Rohrer, D. & Taylor, K. (2007). The Shuffling of Mathematics Problems Improves Learning. Instructional Science, 35(6), 481-498.

Education Endowment Foundation (2021). Metacognition and Self-Regulated Learning: Guidance Report. London: EEF.

Ebbinghaus, H. (1885). Memory: A Contribution to Experimental Psychology. Leipzig: Duncker & Humblot. (Translated 1913)

Millman, J., Bishop, C.H. & Ebel, R. (1965). An Analysis of Test-Wiseness. Educational and Psychological Measurement, 25(3), 707-726.

Flavell, J.H. (1979). Metacognition and Cognitive Monitoring. American Psychologist, 34(10), 906-911.

See the science in action.

Try a session. Watch the IRT engine select questions at your level. See BKT update your mastery in real time. Feel the difference.

PLAY TODAY'S CHALLENGE Create Free Account