Building a DDC Annotated Corpus from OAI Metadata

Authors

  • Mathias Lösch
  • Ulli Waltinger
  • Wolfram Horstmann
  • Alexander Mehler

DOI:

https://doi.org/10.2390/biecoll-OR2010-79

Keywords:

OR2010, Posters Sessions, Dewey Decimal Classification, OAI metadata, corpus construction, Library and information sciences, DDC: 020

Abstract

A frequently overlooked benefit of open access publications is that they are an easy accessible and cost-effective data source for research disciplines like text mining, natural language processing or computational linguistics. In those fields, linguistic data is usually managed in the form of corpora, i.e. machine readable bodies of texts that represent a particular variety of language.

Downloads

Published

2010-12-31