Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

A preliminary study on similarity-preserving digital book identifiers

  • Klemo Vladimir
  • , Marin Silic
  • , Nenad Romic
  • , Goran Delac
  • , Sinisa Srbljic

    Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungBegutachtung

    Abstract

    Due to proliferation of digital publishing, e-book catalogs are abundant but noisy and unstructured. Tools for the digital librarian rely on ISBN, metadata embedded into digital files (without accepted standard) and cryptographic hash functions for the identification of coderivative or nearduplicate content. However, unreliability of metadata and sensitivity of hashing to
    even smallest changes prevents efficient detection of coderivative or similar digital books. Focus of the study are books with many versions that differ in certain amount of OCR errors and have a number of sentence-length variations. Identification of similar books is performed using small-sized fingerprints that can be easily shared and compared. We created synthetic datasets to evaluate fingerprinting accuracy while providing standard precision and recall measurements.
    OriginalspracheEnglisch
    TitelProceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities : LaTeCH 2015
    Redakteure/-innenKalliopi A. Zervanou, Marieke van Erp, Beatrice Alex
    Seitenumfang6
    ErscheinungsortBeijing
    Herausgeber (Verlag)Association for Computational Linguistics (ACL)
    Erscheinungsdatum2015
    Seiten78-83
    ISBN (elektronisch)978-1-941643-63-1
    PublikationsstatusErschienen - 2015
    Veranstaltung9th Socio-Economic Sciences and Humanities Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities - SIGHUM 2015 - Peking, China
    Dauer: 26.07.201530.07.2015
    Konferenznummer: 9
    https://aclanthology.info/volumes/proceedings-of-the-9th-sighum-workshop-on-language-technology-for-cultural-heritage-social-sciences-and-humanities-latech
    https://sighum.wordpress.com/events/latech-2015/

    Bibliographische Notiz

    Publisher Copyright:
    © 2015 Proceedings of the Annual Meeting of the Association for Computational Linguistics.

    Fachgebiete und Schlagwörter

    • Digitale Medien

    Fingerprint

    Untersuchen Sie die Forschungsthemen von „A preliminary study on similarity-preserving digital book identifiers“. Zusammen bilden sie einen einzigartigen Fingerprint.

    Dieses zitieren