Data

Databases used in the Interspeech Computational Paralinguistics Challenge (ComParE) series are usually owned by individual donators. End User License Agreements (EULAs) are usually given for participation in the challenge. Usage of the databases outside of the Challenges always has to be negotiated with the data owners – not the organisers of the Challenge. We aim to provide contact information per database – however, this requires consent of the data owners, which we are currently collecting.

Below, a description of the current 2019 data is given. All of these corpora provide realistic data in challenging acoustic conditions. They feature further rich annotation such as speaker meta-data, transcripts, and segmentation, and are partitioned into training, development, and test data, observing subject independence. Benchmark results of the most popular approaches by open-source toolkits will be provided as in the years before including Deep Learning and Bag-of-Audio-Words baselines.

For the Styrian Dialect Sub-Challenge, the collection of the STYRIALECTS (Styrian Dialects) dataset was guided by Professor Ralf Vollmann, University of Graz, Austria, and Florian Pokorny between 2010 and 2016 for the purpose of cross-discipline research on aspects of the current dialectal continuum of Bavarian German in the southern parts of Austria. Essentially, the set comprises audio recordings of speakers representative for different dialect areas of Styria, which is the south-eastern province of Austria with the city of Graz representing its provincial capital. All recordings were conducted in an interview setting usually consisting of three parts, namely (i) a questionnaire compiled with regard to expected dialectal features, (ii) a picture naming task, and (iii) a short free conversation about language attitudes towards standard language and dialect. For the purpose of computer-based analyses including automatic dialect classification paradigms, the recordings of 63 speakers (27 males, 36 females) were automatically segmented by means of a speaker diarisation algorithm. Subsequently, all segments were manually revised yielding more than 10,000 clips across 4 different Styrian dialect areas.

For the Continuous Sleepiness Sub-Challenge, SLEEP (Duesseldorf Sleepy Language Corpus), was created at the Institute of Psychophysiology, Duesseldorf, and the Institute of Safety Technology, University of Wuppertal, Germany. The corpus consists of approximately 82 hours of speech with 965 participants (526 m, 439 f), leading to a total of 15k audio recordings. The mean age of the participants was 27.8 ± 11.5 years and a range of 12–75 years. The recordings were made in quiet rooms with a microphone/headset/hardware setup; the tasks were presented on a computer in front of the participants. Audio files were recorded with 44.1 kHz, down-sampled to 16 kHz, with a quantisation of 16 bit. The speech material consists of different reading passages and speaking tasks. Furthermore, spontaneous narrative speech was elicited by asking subjects to briefly comment on, e. g., their last weekend, the best present they ever got, or to describe a picture. A session of one subject lasted from 15 minutes to 1 hour. Each participant had to report sleepiness on the well-established Karolinska Sleepiness Scale (KSS). Additionally, two raters apply post-hoc observer KSS ratings.

For the Baby Sound Sub-Challenge, Meg Cychosz, and colleagues are providing the BSS (Baby Speech Sounds) dataset which contains vocalisations from 26 healthy infants (2-33 months) who were exposed to a range of languages: English, Spanish, Tsimane, and Quechua. The children were recorded using the Language Environment ANalysis (LENA) Digital Language Processor, a lightweight audio recording device. Children wore the recorder for extended periods, between 6 and 16 hours, inside a clothing pocket specially designed for the device. The child vocalizations in the dataset have been partitioned into 400 millisecond chunks which were then annotated by citizen science annotators on the iHEARu-PLAY platform. 

For the Orca Activity Sub-Challenge, the DLFD (DeepAL Fieldwork Data) was collected via a 15-meter research trimaran in 2017 and 2018 in Northern British Columbia. A custom-made high sensitivity and low noise towed-array was deployed, which has a flat frequency response of within +/-2.5 dB between 10 Hz and 80 kHz. Underwater sounds were digitised with a sound acquisition device (MOTU 24AI), sampling at 96 kHz, recorded by PAMGuard, and stored on hard drives as multichannel wav-files (4
hydrophones in 2017; 8 hydrophones in a towed array in 2018). The total amount (2017 and 2018) of collected audio data comprises 157 hours (1 channel). The overall number of annotations comprises ∼5.66 h (pure orca annotations ∼1.40 h, distributed over 3,197 audio clips (1 channel)).