Development of Speech Corpus for Improving Phonetic Search Engine Performance in Zero Recourse Nyishi Language

Authors

  • Likha Ganu, Biri Arun

Keywords:

Speech Corpus, Phonetic engine, Nyishi, Transcription, Speech Recognition.

Abstract

Advancements in speech translation technology are underway to enable natural communication across languages. Most languages with limited resources don’t even have any speech data. Creating speech corpora is extremely difficult and time-consuming. This paper outlines our ongoing endeavor to construct speech corpora for one of the zero-recourse languages in North-East India, with a specific focus on the Nyishi Language from the Tibeto-Burman language family. Methodology involving lab-based and crowd-sourced recordings using handheld audio recorders, and the Nyishi speech corpus database currently boasts more than 2200 utterances from 34 native speakers diverse in regional dialect, age, and gender. The corpus encompasses four distinct speech modes - spontaneous conversations, fluent speech, read speech, and storytelling narratives. Audio recordings were meticulously transcribed using International Phonetic Alphabet symbols and annotated for tone, pitch, pacing, syllabification, and break marking. Statistical analysis of the phoneme distributions provides new insights into the phonetic composition of Nyishi. Findings reveal the central role of the vowel /a/ (34.1% instances) along with prolific use of front-vowel-based diphthongs like /ai/ (15.5% instances) and triphthong formations containing /ia/ sequences (30.2% instances). With speaker metadata encoded directly into the filename conventions, this structured corpus supports diverse research inquiries from acoustic phonetics, tonality studies, and morphological analysis to the development of speech recognition and synthesis systems.  Constructing usable speech recognition and synthesis datasets for endangered languages like Nyishi facilitates preservation efforts and enables language revitalization applications. This paper elaborates on the methodology employed in collecting speech samples and presents descriptive statistics of the speech corpora.

Downloads

Download data is not yet available.

References

Driem, G. v. “Lost in the sands of time somewhere north of the bay of Bengal. Himalayan Languages and Linguistics”, 11-38(2011) . https://doi.org/10.1163/ej.9789004194489.i-322.10.

Thurgood, Graham, and Randy J. LaPolla, eds. The Sino-Tibetan languages. Taylor & Francis, 2016.

Driem, G. v.”The diversity of the tibeto-burman language family and the linguistic ancestry of chinese. Bulletin of Chinese “Linguistics,1(2)(2007),211270.https://doi.org/10.1163/2405478x90000023.

Post, Mark W. "Tones in Northeast Indian languages, with a focus on Tani: A fieldworker’s guide." Language and culture in Northeast India and beyond: In honour of Robbins Burling (2015): 182-210.

Gauthier, Elodie, Laurent Besacier, and Sylvie Voisin. "Automatic speech recognition for African languages with vowel length contrast." Procedia Computer Science 81 (2016): 136-143.

Godfrey, John J., Edward C. Holliman, and Jane McDaniel. "SWITCHBOARD: Telephone speech corpus for research and development." Acoustics, speech, and signal processing, ieee international conference on. Vol. 1. IEEE Computer Society, 1992.

Singh, Amitoj, et al. "ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages." Artificial Intelligence Review 53 (2020): 3673-3704.

Jia, Ye, et al. "CVSS corpus and massively multilingual speech-to-speech translation." arXiv preprint arXiv:2201.03713 (2022).

Poria, Soujanya, Erik Cambria, and Alexander Gelbukh. "Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis." Proceedings of the 2015 conference on empirical methods in natural language processing. 2015.

Panayotov, Vassil, et al. "Librispeech: an asr corpus based on public domain audio books." 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2015.

Corpus, VoxForge Russian Speech. "Retrieved January 15, 2010, from VoxForge: http://www. dev. voxforge. org/projects/Russian/browser/ Trunk/AcousticModels." (2007).

Lane, Ian, et al. "Tools for collecting speech corpora via Mechanical-Turk." Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk. 2010.

Novotney, Scott, and Chris Callison-Burch. "Cheap, fast and good enough: Automatic speech recognition with non-expert transcription." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010.

Levine, Sergey, et al. "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection." The International journal of robotics research 37.4-5 (2018): 421-436.

Jia, Ye, et al. "Direct speech-to-speech translation with a sequence-to-sequence model." arXiv preprint arXiv:1904.06037 (2019).

Dondrup, R. A handbook of the Nyishi language. Itanagar, India: Directorate of Research, Govt. of Arunachal Pradesh (1988)..

Abraham, B. Word tones in Nyishi. Indian Linguistics(1985)., 46, 19-30.

Robinson, Tony, et al. "WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition." 1995 International Conference on Acoustics, Speech, and Signal Processing. Vol. 1. IEEE, 1995.

Deka, Barsha, et al. "Speech corpora of under resourced languages of north-east india." 2018 Oriental COCOSDA-International Conference on Speech Database and Assessments. IEEE, 2018.

Kumar, Gopendra. Geology of Arunachal pradesh. GSI, 2013.

Feng, R., & Guo, Q. Second Language Speech Fluency: What Is in the Picture and WhatIsMissing (2022, February 28).

Downloads

Published

03.07.2024

How to Cite

Likha Ganu. (2024). Development of Speech Corpus for Improving Phonetic Search Engine Performance in Zero Recourse Nyishi Language. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 1347 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6379

Issue

Section

Research Article