Researchers from North-West University’s Multilingual Speech Technologies (MuST) and a young South African company called INTSYST have developed a prototype speech transcription platform that will have applications for both the public and private sectors.
MuST is a research niche area of NWU’s Vanderbijlpark Campus and is a focused, project-oriented group of scientists who develop speech technologies for multilingual environments.
Professor Marelie Davel, Director of MuST, notes that that many private and public entities are continually generating large volumes of audio material. Along with researchers Dr Charl van Heerden and Prof Etienne Barnard, also from MuST, the team developed a proposal for creating a web-based platform for speech transcription specifically to address the needs of transcribers involved in transcribing large audio resources in the South African languages.
Available to everyone
In 2015, MuST initiated a project to develop an open source (available to everyone) web-based platform that will enable users with varying degrees of sophistication to easily and quickly transform speech to text in the same South African language. Funded by the Department of Arts and Culture (DAC), the three-year project was initially set up to accommodate three languages during the prototype phase: English, isiZulu and Setswana. The system was designed to be scaleable so that more languages could be added in the future.
The project is currently managed by MuST senior researcher Dr Neil Kleynhans, with major design and technical inputs provided by colleagues Dr Daniel van Niekerk and Dr Charl van Heerden.
The project was designed so that the platform can act as a virtual transcription assistant, supporting a human transcriber in the typically labour-intensive process of converting audio to text. It can also be used in a fully automated mode to produce approximate transcriptions, the accuracy of which depends on the amount of task-specific customisation that is performed using the platform.
In addition to the primary features provided, which will allow users to upload an audio file and receive a draft transcription, the platform provides other supporting features, such as spell checking, text editing, audio diarisation (higher level audio labelling, such as speech/silence detection and annotating different speakers), audio-text alignment and an audio playback system.
Another important feature is that the platform provides project management facilities that support the flow of the transcription and editing process.
Users on board
To further focus the speech transcription platform development effort, MuST approached various potential end users. These included South African transcribers of lectures, radio broadcasts and even parliamentary debates.
The latter provides a particularly interesting case study. Currently, all debates of the National Assembly and sittings of the National Council of Provinces and the Extended Public Committee are transcribed and the transcripts shared with the public. The transcription task, however, is often time-consuming and repetitive. Transcription of an audio recording to a written record goes through many iterations before a final version is released. It also requires several reporters (an individual who performs transcribing and editing) and a project manager who divides the audio recording up and allots portions of it to various reporters. In the final step, a collator puts all the portions together to have a complete record which can be released after it is edited and checked for accuracy.
The parliamentary reporting unit kindly volunteered to assist with a first round of platform testing, which was conducted in 2015. The reporters provided valuable feedback and confirmed that the platform would be a very useful tool for them. They also provided feedback on additional services that the platform could provide.
In the second year of the project, the researchers adapted the prototype to add the features that will make the platform more effective for users. A next round of user testing is currently underway.
Positive reception
The project has been positively received by the DAC.
Ms Ulrike Janke, director of the Human Language Technologies (HLT) unit at the DAC, says: “We are delighted to be associated with this ambitious project and in particular the incorporation of African languages. The DAC is responsible for elevating the status and advancing the use of the previously marginalised official languages and this project certainly has the potential to contribute towards achieving this.”
The project team will demonstrate the improved web-based platform and will once again evaluate it with various volunteers. Neil says the value of working with end users during the development phases should not be underestimated. “By working closely with the reporters we will be able to deliver a system that provides flexibility and accessibility and will hopefully prove to be a valued tool in the transcription process,” he says.
The successful completion of the first phase of the project suggests that there are other applications for the web-based platform. For example, the same platform could be customised for other applications such as teaching environments or call centres. Marelie and her team believe that their platform can play an important role as a beneficial virtual assistance service that will expedite the time-consuming, exacting and repetitive transcription process.
Ms Ulrike Janke (DAC), with members of the MuST team: Ms Anina Lambrechts, Dr Charl van Heerden, Prof Marelie Davel, Dr Neil Kleynhans and Dr Daniel van Niekerk.
The MuST team in the gallery at Parliament.