Skip to main navigation Skip to search Skip to main content

Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

5 Citations (Scopus)

Abstract

Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

Original languageEnglish
Title of host publicationThe Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL) : Workshop Proceedings
EditorsRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Number of pages9
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Publication date2024
Pages124-132
ISBN (Print)9782493814401
ISBN (Electronic)978-2-493814-40-1
Publication statusPublished - 2024
Event5th Workshop on Resources for African Indigenous Languages - RAIL 2024 - Lingotto Conference Centre, Torino (Italy), Torino, Italy
Duration: 25.05.202425.05.2024
Conference number: 5
https://bit.ly/rail2024

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

Research areas and keywords

  • Amh-QuAD
  • Amharic Question Answering Dataset
  • Amharic Reading Comprehension
  • Low Resource Question Answering
  • Informatics

Fingerprint

Dive into the research topics of 'Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset'. Together they form a unique fingerprint.

Cite this