ISC Publications

Publisher

Proceedings of The 12th Language Resources and Evaluation Conference (LREC)

Authors (2)

Paul McNamee

Brian M. Thompson

2020

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

Abstract

Research in machine translation (MT) is developing at a rapid pace. However, most work in the community has focused on languages where large amounts of digital resources are available. In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili. These languages are of social importance and serve as test-beds for developing technologies that perform reasonably well despite the low-resource constraint. Our findings suggest that statistical machine translation (SMT) and neural machine translation (NMT) can perform similarly in low-resource scenarios, but neural systems require more careful tuning to match performance. We also investigate how to exploit additional data, such as bilingual text harvested from the web, or user dictionaries; we find that NMT can significantly improve in performance with the use of these additional data. Finally, we survey the landscape of machine translation resources for the languages of Africa and provide some suggestions for promising future research directions

Citation

@inproceedingsduh-etal-2020-benchmarking title: "Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages" author: "Duh Kevin and McNamee Paul and Post Matt and Thompson Brian" booktitle: "Proceedings of The 12th Language Resources and Evaluation Conference" month: may year: "2020" address: "Marseille France" publisher: "European Language Resources Association" url: "https://www.aclweb.org/anthology/2020.lrec-1.325" pages: "2667--2675" abstract: "Research in machine translation (MT) is developing at a rapid pace. However most work in the community has focused on languages where large amounts of digital resources are available. In this study we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili. These languages are of social importance and serve as test-beds for developing technologies that perform reasonably well despite the low-resource constraint. Our findings suggest that statistical machine translation (SMT) and neural machine translation (NMT) can perform similarly in low-resource scenarios but neural systems require more careful tuning to match performance. We also investigate how to exploit additional data such as bilingual text harvested from the web or user dictionaries; we find that NMT can significantly improve in performance with the use of these additional data. Finally we survey the landscape of machine translation resources for the languages of Africa and provide some suggestions for promising future research directions." language: "English" ISBN: "979-10-95546-34-4"

Citation

Publisher

Proceedings of The 12th Language Resources and Evaluation Conference (LREC)

Authors (2)

Paul McNamee

Brian M. Thompson

2020

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

Abstract

Citation

Citation

ISC

Bart Paulhamus, Chief
Bart.Paulhamus@jhuapl.edu
240-228-8514

Doh Youn Hong, Operations Manager
Doh.Hong@jhuapl.edu
240-592-2560

Intelligent Systems Center
7701 Montpelier Road
Laurel, MD 20723

Contact Us

Publisher

Proceedings of The 12th Language Resources and Evaluation Conference (LREC)

Authors (2)

Paul McNamee

Brian M. Thompson

2020

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

Abstract

Citation

Citation

Bart Paulhamus, Chief Bart.Paulhamus@jhuapl.edu 240-228-8514

Doh Youn Hong, Operations Manager Doh.Hong@jhuapl.edu 240-592-2560

Intelligent Systems Center 7701 Montpelier Road Laurel, MD 20723

Bart Paulhamus, Chief
Bart.Paulhamus@jhuapl.edu
240-228-8514

Doh Youn Hong, Operations Manager
Doh.Hong@jhuapl.edu
240-592-2560

Intelligent Systems Center
7701 Montpelier Road
Laurel, MD 20723