Languages are widely acknowledged as central to all forms of social, economic and cultural life. This year International Mother Language Day on 21 February 2009 once again draws the international community’s attention to the foundations of linguistic diversity and multilingualism. South Africa’s constitution guarantees equal status to 11 official languages to cater for the country's diverse peoples and their cultures: Afrikaans; English; IsiNdebele; IsiXhosa; IsiZulu; Sepedi; Sesotho; Setswana; SiSwati; Tshivenda and Xitsonga.
Daniel van Niekerk
Master’s student Daniel van Niekerk is part of the Meraka Institute’s human language technologies (HLT) research group. HLT makes it easier for people to interact with machines to benefit a wide range of people; the HLT group, led by Professor Gerhard van Huyssteen, studies how these technologies can be applied, adapted and developed to benefit the people of southern Africa as a support for language diversity in an affordable and equitable fashion. A secondary but equally important aim is to address information empowerment by making information readily accessible.
Getting to grips with text-to-speech systems
Van Niekerk’s dissertation, titled ‘Automatic speech segmentation with limited data’, is close to submission. He is a computer engineering graduate registered at the North-West University, working under the supervision of CSIR Fellow, Professor Etienne Barnard.
It deals with text-to-speech (TTS) systems, which are the ‘magic’ technology that makes it possible to render text into synthesised speech. He cites examples in the everyday world around us, “Assistive devices such as screen readers for the blind make use of TTS systems to convert text to spoken words. TTS is also used in limited everyday domains to render speech from written numbers and other short texts.”
He explains the significance of this research, “Building TTS systems require the development of relevant speech corpora. In South Africa, we do not have a lot of speech data, and certainly not equitably distributed among the 11 official languages.” Speech corpora are databases of speech audio files and text transcriptions, annotated to make the basic structural constituents of the language usable.
An example of an annotated audio file which shows how a word, 'isimo' consisting of five phonemes (sounds) is recognised and broken into segments (bottom two rows). The top two rows depict the time domain signal and corresponding spectrogram. ‘Isimo’ can be translated as ‘situation’
“My work of automatic speech segmentation is particularly interesting in South Africa’s multilingual context,” Van Niekerk points out. Speech segmentation as a manual exercise is costly in terms of time and skills; automatic segmentation of texts in any or all South African languages to identify speech segments as phonetics is therefore a novel and potentially powerful technology to provide the building blocks for ‘new’ and diverse speech systems.
Van Niekerk’s research in TTS has immediate application to the Lwazi project underway at the HLT group. This speech-driven, telephone-based system funded by the Department of Arts and Culture aims to deliver government information to all South Africans in their own language. “Providing information verbally in the mother language of the user is very important,” he emphasises. “It gives people confidence and makes every language relevant. It can also potentially overcome the challenge of providing information to areas of the population where illiteracy is high. Another area of application is education, where resources and penetration of people trained in all official languages are limited.”
His research focuses on three corpora – Afrikaans, IsiZulu and Setswana – and he plans to test his results in the Lwazi multilingual application. TTS systems for natural languages can work across the board, he contends, as long as the constituents of the input language text are well understood.
The HLT research group is unique in its approach to tackling a broad array of languages. Working with linguists has assisted the TTS subgroup to ensure that the language data it uses are well researched and understood. “Professor van Huyssteen is a respected senior linguist and has provided very useful input and expertise relevant to our work since he joined last year,” Van Niekerk confirms. The group relies on cooperation with CtexT of North-West University for speech recordings and other research tools, such as pronunciation dictionaries.
Van Niekerk enjoys the excitement of a table tennis game at work, where the HLT group matches creative juices with loads of energy
Finding the future through HLT
He is enthusiastic about the application of TTS systems. “By developing a multilingual TTS framework, we can change components of the system in order to support diverse languages.” TTS systems keep the context of the text in mind as they break down the text into sentences, phrases, word and phonemes. The tests for accuracy and acceptability are intelligibility (tested by human listeners) and the quality of the voice.
Van Niekerk sees a great future for research in the field of TTS, “It’s a good thing for all languages and improvements in natural language processing inspire more analysis and knowledge.” He is keen to participate with his group in the international Blizzard Challenge in 2010. This competition provides researchers and speech labs with the same set of originally recorded waves (i.e. five hours of a male speaker), and then challenges teams to create the ‘best’ (i.e. most natural and intelligible) TTS voice.
A PhD in this field holds strong attraction for him, “Lots of interesting engineering problems are begging to be solved at the Meraka Institute.” He enjoys the empowerment of using open source software as a research tool.
Biffy Van Rooyen, email: BVRooyen@csir.co.za