Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Learning from partially annotated sequences

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungBegutachtung

16 Zitate (Scopus)

Abstract

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

OriginalspracheEnglisch
TitelMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
Redakteure/-innenDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Seitenumfang16
ErscheinungsortHeidelberg, Berlin
Herausgeber (Verlag)Springer Verlag
Erscheinungsdatum2011
AuflagePART 1
Seiten407-422
ISBN (Print)978-3-642-23779-9
ISBN (elektronisch)978-3-642-23780-5
DOIs
PublikationsstatusErschienen - 2011
Extern publiziertJa
VeranstaltungEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Griechenland
Dauer: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

Fachgebiete und Schlagwörter

  • Informatik
  • Wirtschaftsinformatik

Fingerprint

Untersuchen Sie die Forschungsthemen von „Learning from partially annotated sequences“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren