Although the initial market segment is the print-disabled community, we believe in the future this PoC will also help to expand the publishing market beyond the traditional reader base to those we term multimodal, low-literate and multilingual users, by leveraging the audio-based solutions currently aimed at print-disabled users. We define the multimodal user as a person who is able and has the time to read, but whose visual sense is otherwise preoccupied, e.g. a child who walks to school, an adult who drives to work, someone who exercises at the gym, etc. The low-literate user may want to read, but has not learnt the skill to do so. The multilingual user would like to read – whether literate or not – but there is no material available in his/her language of understanding.

The CSIR has developed a Proof-of-Concept (PoC) EPUB 3 conversion system (for Afrikaans, English, Sesotho and isiZulu) capable of producing EPUB 3 documents containing synchronized text and audio. This is done in two ways, either by taking existing, human-narrated audio and aligning it with the document text using ASR, or by synthesizing a text-only document into audio with implicit alignment using TTS.

Improving audio-based solutions, particulary for the print-disabled community.