Building a DDC Annotated Corpus from OAI Metadata

  • Mathias Lösch
  • Ulli Waltinger
  • Wolfram Horstmann
  • Alexander Mehler
Keywords: OR2010, Posters Sessions, Dewey Decimal Classification, OAI metadata, corpus construction, Library and information sciences, DDC: 020


A frequently overlooked benefit of open access publications is that they are an easy accessible and cost-effective data source for research disciplines like text mining, natural language processing or computational linguistics. In those fields, linguistic data is usually managed in the form of corpora, i.e. machine readable bodies of texts that represent a particular variety of language.