Skip to main content Skip to secondary navigation
Main content start

Understanding Speech—Moment by Moment

By Irmak Ergin, Jill Kries, and Laura Gwilliams

Understanding what someone is saying usually feels easy and automatic. But, we have all had the experience of misunderstanding someone, or failing to derive understanding entirely. Typically, research investigating speech comprehension does so by applying a “posthoc” measure of comprehension- that is, after the comprehension has happened, such as multiple-choice questions, rating scales, or summaries. These methods can capture overall understanding, but they miss the dynamic changes in comprehension as speech unfolds.

In our new paper, Measuring naturalistic speech comprehension in real time, we introduce a method designed to address this gap. We built a custom slider device that participants can use while listening to continuous, naturalistic speech, allowing them to report how well they understand what they are hearing in real time. The slider synchronizes with experimental software and provides millisecond-level readout, making it possible to generate a continuous behavioral trace of comprehension rather than a single score at the end.

To test whether this new measure works as intended, we evaluated it across three experiments. We asked the question: Does this continuous measure track comprehension at least as well as established post hoc methods, and perhaps better? Overall, the answer was yes. Across the study, slider responses captured fluctuations in understanding driven by manipulations such as speech rate (how fast the speech is) and information load (how surprising the content is), and the method was validated against existing measures.

At the same time, the paper highlights why standard post hoc measures are often limited. They rely heavily on memory, meaning they reflect not only comprehension, but also how much information a listener can retain afterward. Multiple-choice questions can be shaped by guessing or by the wording of the questions themselves. Summary-based measures introduce yet another complication, because they depend on both comprehension and the ability to reconstruct or retell what was heard.

One of the most promising aspects of the new method is its potential for cognitive neuroscience. The study shows that using the slider does not disrupt comprehension, making it well-suited for co-registration with neuroimaging methods such as electroencephalography (EEG) and magnetoencephalography (MEG), where we can look at the time-resolved neural dynamics of speech comprehension. This opens up an exciting new opportunity for language research. For the first time, it becomes possible to align the time course of the input speech signal with the listener’s changing experience of comprehension. It creates a path toward linking those behavioral dynamics to neural activity. Instead of asking only whether a listener understood something overall, we can begin to ask when comprehension succeeds, when it breaks down, and how those fluctuations relate to ongoing brain responses.

Although we apply the method here to speech comprehension, the broader idea extends well beyond language. If cognitive processes unfold over time, our measurements should too. This measure can enable researchers to study dynamic cognition in ways that static end-of-task measures simply cannot.

Read the paper!