Voice computing

Personalised services and cross-platform voice user interfaces are becoming an industry norm. Such interfaces make human interaction with computers possible, using speech recognition to understand spoken commands and text-to-speech (TTS) to provide feedback to the user. The plethora of voice computing offerings entering the global market does not cater sufficiently for locally used official and other African languages. The voice computing research group focuses on localising voice user interfaces and voice computing technologies to enable access to information, make these available for use in third-party applications/solutions and support personalised services.

The research group hosts the Speech Node of the South African Centre for Digital Language Resources (SADiLaR), a research infrastructure that forms part of the South African Research Infrastructure Roadmap. The group develops text and speech datasets crucial to local language technology development efforts and disseminates these through SADiLaR.

Value proposition

The Voice Computing Research Group applies its voice computing capabilities to design, develop and implement modern software architectures and voice computing platforms to promote the development of local content for the networked world and to enable the digital transformation of key sectors of the economy, such as publishing and telecommunications, and social sectors, such as education and health. The group researches and develops demonstrable voice computing solutions and transfers tools, technology products and processes to support existing, and the formation of new, private and public enterprises that will create new and competitive products and services.

Capabilities

TTS
Automatic speech recognition
Controlled natural language processing
Natural language processing (NLP) for TTS
Voice user interface design and testing
Usability evaluation
Ebook augmentation using NLP and TTS

Offerings

Qfrency TTS is a software product that converts digital text into synthetic speech. This type of technology is known as TTS. The goal of the technology is to generate synthetic speech that is as close to human speech as possible, including articulation and prosody. There are various forms of the technology, from end-user on device systems that do not require Internet access to cloud-based systems. The technology is known as an “assistive technology” in that it interfaces with various other technologies to provide speech feedback, for example, navigation software, book readers, screen readers, and in business-to-consumer applications in call centres.

Augmented Ebooks aims to assist the publishing industry with digital transformation by opening up the market dramatically to new types of consumers. It provides unprecedented access and incentives for many types of non-readers to become readers, especially in the wake of Covid-19. Augmented Ebooks is packaged as three sub-products, namely the Converter, Augmenter and Reader. The Converter allows the publishing industry to convert input ebooks in other file formats (DOCX and PDF) to EPUB 3. The Augmenter uses Qfrency TTS to add human-narrated or synthesised audio and synchronises it to the text of an input EPUB 3 ebook from the publishing industry, in order to produce an output augmented EPUB 3 ebook. The Reader renders the synchronised content in the augmented EPUB 3 ebook with audio playback, navigation and highlighting to end-users.

Dr Karen Calteaux

kcalteaux@csir.co.za

Page Comment