ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

  • Mikhail Salnikov*
  • , Andrey Sakhovskiy
  • , Irina Nikishina
  • , Aida Usmanova
  • , Angelie Kraft
  • , Cedric Möller
  • , Debayan Banerjee
  • , Junbo Huang
  • , Longquan Jiang
  • , Rana Abdullah
  • , Xi Yan
  • , Elena Tutubalina
  • , Ricardo Usbeck
  • , Alexander Panchenko
  • *Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungBegutachtung

Abstract

In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.

OriginalspracheEnglisch
TitelNatural Language Processing and Information Systems : 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I
Redakteure/-innenRyutaro Ichise
Seitenumfang16
Herausgeber (Verlag)Springer Science and Business Media Deutschland
Erscheinungsdatum2026
Seiten95-110
ISBN (Print)978-3-031-97140-2
ISBN (elektronisch)978-3-031-97141-9
DOIs
PublikationsstatusErschienen - 2026
Veranstaltung30th International Conference on Natural Language and Information Systems - NLDB 2025 - Kanazawa, Japan
Dauer: 04.07.202506.07.2025
Konferenznummer: 30

Bibliographische Notiz

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

Fachgebiete und Schlagwörter

  • Informatik

ASJC Scopus Sachgebiete

  • Theoretische Informatik
  • Allgemeine Computerwissenschaft

Fingerprint

Untersuchen Sie die Forschungsthemen von „ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren