Look who's talking: A comparison of automated and human-generated speaker tags in naturalistic day-long recordings.

The LENA system has revolutionized research on language acquisition, providing both a wearable device to collect day-long recordings of children's environments, and a set of automated outputs that process, identify, and classify speech using proprietary algorithms. This output includes information about input sources (e.g., adult male, electronics). While this system has been tested across a variety of settings, here we delve deeper into validating the accuracy and reliability of LENA's automated diarization, i.e., tags of who is talking. Specifically, we compare LENA's output with a gold standard set of manually generated talker tags from a dataset of 88 day-long recordings, taken from 44 infants at 6 and 7 months, which includes 57,983 utterances. We compare accuracy across a range of classifications from the original Lena Technical Report, alongside a set of analyses examining classification accuracy by utterance type (e.g., declarative, singing). Consistent with previous validations, we find overall high agreement between the human and LENA-generated speaker tags for adult speech in particular, with poorer performance identifying child, overlap, noise, and electronic speech (accuracy range across all measures: 0-92%). We discuss several clear benefits of using this automated system alongside potential caveats based on the error patterns we observe, concluding with implications for research using LENA-generated speaker tags.





Bulgarelli, Federica, and Elika Bergelson (2019). Look who's talking: A comparison of automated and human-generated speaker tags in naturalistic day-long recordings. Behavior research methods. 10.3758/s13428-019-01265-7 Retrieved from https://hdl.handle.net/10161/19711.

Elika Bergelson

Associate Research Professor of Psychology and Neuroscience

Dr. Bergelson's lab has moved to Harvard Psychology; she retains an unremunerated research appointment at Duke through mid-2024 for logistical reasons. She formerly accepted PhD applicants through the Developmental and Cog/CogNeuro areas of P&N and the CNAP program.

In my research, I try to understand the interplay of processes during language acquisition.
In particular, I am interested in how word learning relates to other aspects of learning language (e.g. speech sound acquisition, grammar/morphology learning), and social/cognitive development more broadly (e.g. joint attention processes) in the first few years of life.

I pursue these questions using three main approaches: in-lab measures of early comprehension and production (eye-tracking, looking-time, and in EEG studies in collaboration with the Woldorff lab), and at-home measures of infants' linguistic and social environment (as in the SEEDLingS project).

More recently the lab is branching out to look at a wider range of human populations and at infants who are blind or deaf/heard of hearing.

