Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

dc.contributor.author

Räsänen, O

dc.contributor.author

Seshadri, S

dc.contributor.author

Karadayi, J

dc.contributor.author

Riebling, E

dc.contributor.author

Bunce, J

dc.contributor.author

Cristia, A

dc.contributor.author

Metze, F

dc.contributor.author

Casillas, M

dc.contributor.author

Rosemberg, C

dc.contributor.author

Bergelson, E

dc.contributor.author

Soderstrom, M

dc.date.accessioned

2020-01-01T18:45:10Z

dc.date.available

2020-01-01T18:45:10Z

dc.date.issued

2019-10-01

dc.date.updated

2020-01-01T18:45:05Z

dc.description.abstract

© 2019 The Authors Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of-the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE.

dc.identifier.issn

0167-6393

dc.identifier.issn

1872-7182

dc.identifier.uri

https://hdl.handle.net/10161/19710

dc.language

en

dc.publisher

Elsevier BV

dc.relation.ispartof

Speech Communication

dc.relation.isversionof

10.1016/j.specom.2019.08.005

dc.subject

Science & Technology

dc.subject

Technology

dc.subject

Acoustics

dc.subject

Computer Science, Interdisciplinary Applications

dc.subject

Computer Science

dc.subject

Language acquisition

dc.subject

Word count estimation

dc.subject

Automatic syllabification

dc.subject

Daylong recordings

dc.subject

Noise robustness

dc.subject

SYSTEM

dc.subject

SEGMENTATION

dc.subject

RELIABILITY

dc.subject

LENA(TM)

dc.title

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

dc.type

Journal article

duke.contributor.orcid

Bergelson, E|0000-0003-2742-4797

pubs.begin-page

63

pubs.end-page

80

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.organisational-group

Duke

pubs.organisational-group

Psychology and Neuroscience

pubs.organisational-group

Linguistics

pubs.organisational-group

Duke Institute for Brain Sciences

pubs.organisational-group

University Institutes and Centers

pubs.organisational-group

Institutes and Provost's Academic Units

pubs.organisational-group

Surgery, Head and Neck Surgery and Communication Sciences

pubs.organisational-group

Surgery

pubs.organisational-group

Clinical Science Departments

pubs.organisational-group

School of Medicine

pubs.publication-status

Published

pubs.volume

113

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
rasanen_etal_2019.pdf
Size:
2.7 MB
Format:
Adobe Portable Document Format
Description:
Published version