For many years, the North-West University (NWU) has been at the heart of technology advances that support the use and development of South Africa’s 11 official languages. Now, as host of a national research infrastructure, the South African Centre for Digital Language Resources (SADiLaR), the NWU will be able to take its commitment to multilingualism a major step forward.
A multi-partner initiative, SADiLaR is the only humanities programme of its kind in South Africa and is a national research infrastructure. SADiLaR consists of a network of linked nodes consisting of a number of South African universities and agencies such as Unisa, the University of Pretoria, Council for Scientific and Industrial Research, the Inter-institutional Centre for Language Development and Assessment (ICELDA) and the NWU’s own CTexT® unit.
It came about as a result of the work of various researchers under the guidance of Prof Justus Roux, its former director, who prepared the project as part of the Department of Science and Technology’s (DST) new South African Research Infrastructure Roadmap (SARIR).
The centre is the first of its kind in Africa and promotes existing links with similar entities globally, especially with the Common Language Resource Infrastructure (CLARIN) in Europe.
“South Africa is 20 years behind the rest of the world when it comes to the development of Digital Humanities (DH), but SADiLaR ought to enable researchers to develop skills to do DH-related research on par with what is produced internationally,” says Prof Attie de Lange, current director of SADiLaR.
The advancement of technology
SADiLaR provides a platform to access linguistic data and reuse this data, while also equipping researchers with the technologies and software to simplify linguistic analysis as much as possible.
The main distribution channel will be a repository that allows interested parties to access any of the data sets distributed and made available by SADiLaR.
“The repository will also link to larger international infrastructures and language distribution agencies such as the European Language Resource Association (ELRA) and CLARIN in Europe, and the Language Data Consortium (LDC) in the USA,” says Dr Roald Eiselen, SADiLaR’s technical manager.
SADiLaR will also make several research-enabling technologies available such as:
- metadata and data processing infrastructures that are specifically linked to particular projects;
- more general language data analytic platforms made available on-line; and
- automatic language analysis modules that support the development of more complex language technologies.
Although a substantial number of open-source technologies are being re-used and adapted to the South African context, several of the technologies and services that SADiLaR is developing will be new technologies that will be distributed for further use by language communities both in Africa and around the world.
The NWU, along with all other South African universities, looks forward to reaping the benefits of the ground-breaking innovations that SADiLaR will generate.
A look back at SADiLaR’s history
In 2008 a Ministerial Advisory Committee addressed recommendations to support human language technology (HLT) development at a national level.
SADiLaR partly resulted from this decision and became a tangible vehicle for developing and supporting a multilingual country.
Ten years on, SADiLaR is in its first year of incubation and is facilitating an environment for the creation, management and distribution of digital language resources, offering relevant software that is freely available for research and development around South Africa’s 11 official languages.