Guide

Reaction Time Score Interpretation: Median, Best, SD, and Error Bars

·4 min read·PulsarMS Teamdatabenchmarkmeasurement

A reaction time score is only useful if you know how to read it. The headline number is usually a median, mean, or best trial. Those three numbers can tell very different stories.

PulsarMS emphasizes the visual median, the audio average over 10 cues, and the confidence band because they are harder to fake and easier to compare across sessions.

Median reaction time

The median is the middle clean trial. If you run five valid trials and sort them from fastest to slowest, the third result is the median. This is a good baseline because one lucky fast click and one distracted slow click do not dominate it.

Use median when you ask:

  • Am I improving over weeks?
  • Did this monitor or mouse change my baseline?
  • Am I faster today than yesterday?
  • Is my audio setup delaying me?

Best trial

The best trial is emotionally satisfying and statistically weak. It can show your ceiling, but it is often helped by luck or anticipation. If your best trial is much faster than every other trial, do not treat it as your real reaction time.

A best trial below 100 ms is exceptional in a simple reaction test — for most people it means predicting the stimulus rather than reacting to it. PulsarMS doesn't auto-discard it: the false-start record against randomized waits decides. A fast trial inside a clean session counts; a fast trial surrounded by false starts is anticipation. Only responses below the ~50 ms physical limit are excluded outright.

Mean reaction time

The mean is the average. It can be useful with many trials, but it is sensitive to slow outliers. Reaction time distributions often have a long slow tail because a moment of distraction can add a large delay. That is why short visual sessions are usually better summarized by the median; PulsarMS uses the audio mean only after 10 randomized cues and pairs it with regularity points.

Standard deviation and spread

Spread tells you how consistent you are. A player with a 210 ms median and a tight spread may be more dependable than a player with a 190 ms median and wild swings. In competitive situations, consistency matters because you need the response on demand, not once every ten tries.

High spread can come from:

  • fatigue,
  • distraction,
  • anticipation and correction,
  • inconsistent mouse grip,
  • touchscreen input,
  • audio output delay,
  • low refresh rate,
  • too few trials.

Confidence band

The confidence band is PulsarMS's way of showing measurement uncertainty. In a visual test, refresh rate is a major part of that uncertainty. In an audio test, output latency and calibration matter.

If your median improves by 3 ms but your confidence band is larger than that, do not celebrate yet. Repeat the test. Look for a change that holds across sessions and is larger than normal variation.

Percentiles and leaderboards

Percentiles are useful only when the comparison pool is similar. A desktop mouse score on a 600 Hz monitor should not be casually compared against a phone touchscreen score. Hardware, input method, age, fatigue, and practice all shape the leaderboard.

Use leaderboards for motivation, not diagnosis. Use your own repeated visual medians and audio 10-cue averages for real progress.

How to decide if an improvement is real

Use this checklist:

  1. Same device and browser?
  2. Same input method?
  3. Same test modality?
  4. Median improved, not just best trial?
  5. False starts stayed low?
  6. Spread did not get worse?
  7. Improvement is larger than the confidence band?
  8. Change repeated on another day?

If the answer is yes to most of those, the improvement is more likely real.

Start with the reaction time test hub, then use the modality-specific guides:

The goal is not to chase the lowest possible screenshot. The goal is to build a repeatable baseline you can trust.

Sources & context

For background on simple reaction-time variability and factors, see this PMC review of factors influencing simple reaction time. PulsarMS's interpretation rules are designed for practical browser testing, not medical diagnosis. The confidence band those rules lean on is defined in how we measure.