Can you tell a real target from a fake-out when they look almost identical? And when you're genuinely unsure, do you gamble or hold? Those feel like one question — "how good is my read?" — but they're two, and mixing them up is why raw accuracy is a blunt score. Your reaction time test keeps things clean with one cue and one response; the moment a task adds decoys, you need a sharper way to score it.
Accuracy hides two different skills
Signal detection theory splits performance into two independent numbers (the classic reference is Green & Swets, 1966; the practical formulas used here follow Stanislaw & Todorov, 1999):
- Sensitivity (d′) — how far apart "real target" and "nothing there" sit in your head. Bigger d′ means you separate them more cleanly, even when they look alike.
- Criterion (c) — how trigger-happy you are when the evidence is ambiguous.
Two players with identical accuracy can have very different d′ and criterion: one sees clearly but gambles, the other is fuzzy but disciplined. One number can't tell them apart. Two can.
What d′ measures
d′ is built from two rates — your hits (correct "yes, that's a target") and your false alarms
(wrong "yes" on a decoy). In plain terms, d′ = z(hit rate) − z(false-alarm rate): the further your
hits outrun your false alarms, the bigger the gap between real and fake in your perception. It works on
any task with present-versus-absent trials, which is exactly the shape of a go/no-go game. PulsarMS's
reaction test applies the same logic through its false-start record — see
false starts and anticipation for how a
too-early "response" gets caught — and a discrimination game just makes it the whole point.
Criterion: your itchy-trigger dial
Criterion is descriptive, not a score — there's no "correct" bias. But it's an honest mirror. A big lean toward "yes" makes you a gambler: lots of hits, and lots of false alarms on the decoys. A lean toward "no" makes you a holdout: few false alarms, but more real targets missed. Seeing your own bias is half the value — most people are surprised which way they tilt under pace.
That is also why d′ is hard to cheat. Mash everything and you're fast, but your false alarms explode and d′ collapses. Freeze on everything and you never false-alarm, but you miss every target and d′ collapses the other way. Only genuinely telling them apart moves it. For the broader map of simple versus choice tasks this sits inside, read simple vs choice reaction time.
d′ measures discrimination on this specific task — it is not an impulse-control score, a perception IQ, or anything clinical. And the timing underneath still carries the usual browser-observed ± confidence band and same-setup framing: your display and input latency are inside the number.
The honest limits
d′ only means something with enough of both signal and noise trials. Thin data gives a wobbly number, so a good implementation gates the readout on trial count, widens the ± when it's sparse, and applies the standard small-sample correction so a perfect or empty run doesn't blow the math up to infinity. Read it as "trending up as my reads get sharper," not as a precise, portable score. Like every metric in the training range, it puts more of your hardware into the measurement than the simple test does — which is the honest cost of a richer game — and we never claim to see past the browser to photons, nerves, or muscles.
Want to watch your own signal-versus-noise separation? The discrimination games in the Arena compute d′ from your rounds. For the speed side of those same decision trials, see the drift-diffusion model.
Sources & context
Signal detection theory originates with Green & Swets (1966); the hit-rate/false-alarm formulas and the small-sample correction used here follow Stanislaw & Todorov, 1999. Those are controlled-lab methods, not measurements of you on your hardware — treat them as the model behind the metric, not a fixed number for your setup. For how PulsarMS timestamps what the browser can actually observe, and why every score ships with a ± band, read how we measure.