Universität Bielefeld Electronic Collections animiertes Foto Universität Bielefeld

Access to the Document



Enhancing document modeling by means of open topic models Crossing the frontier of classification schemes in digital libraries by example of the DDC

Mehler, Alexander ; Waltinger, Ulli

9th International Bielefeld Conference ''Upgrading the eLibrary: enhanced information services driven by technology and economics''
Bielefeld, 3.-5. Februar 2009

Download file

Abstract:
Purpose: We present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is done by exploring metadata as provided by the Open Archives Initiative (OAI) to derive document snippets as minimal document representations. The reason is to reduce the effort of document processing in digital libraries. Further, we perform feature selection and extension by means of social ontologies and related web-based lexical resources. This is done to provide reliable topic-related classifications while circumventing the problem of data sparseness. Finally, we evaluate our model by means of two language-specific corpora. This paper bridges digital libraries on the one hand and computational linguistics on the other. The aim is to make accessible computational linguistic methods to provide thematic classifications in digital libraries based on closed topic models as the DDC.
Design/methodology/approach: text classification, text-technology, computational linguistics, computational semantics, social semantics.
Findings: We show that SVM-based classifiers perform best by exploring certain selections of OAI document metadata.
Research limitations/implications: The findings show that it is necessary to further develop SVM-based DDC-classifiers by using larger training sets possibly for more than two languages in order to get better F-measure values.
Practical implications: We can show that DDC-classifications come into reach which primarily explore OAI metadata.
Originality/value:
We provide algorithmic and formal-mathematical information how to build DDC-classifiers for digital libraries.


Keywords: closed topic models , open topic models , document modeling , document snippets , DDC
Institution: Faculty of Linguistics und Literature
Institution: University Library (UB)
DDC classification: Library and information sciences

Suggested Citation:
Enhancing document modeling by means of open topic models Crossing the frontier of classification schemes in digital libraries by example of the DDC. 9th International Bielefeld Conference ''Upgrading the eLibrary: enhanced information services driven by technology and economics'', February 3-5, 2009, Bielefeld, Germany, eds. M. Höppner, W. Horstmann and S. Rahmsdorf

Online-Journal: Library Hi Tech, 27(2009) 4
URL: http://biecoll.ub.uni-bielefeld.de/volltexte/2010/5001



 Questions or comments: publikationsdienste.ub@uni-bielefeld.de
 Latest update: 15 Feb 2011
 Legal Notice
OPUS-Logo     OAI compliant      BU Logo
OAI-Logo