Universität Bielefeld Electronic Collections animiertes Foto Universität Bielefeld

Access to the Document



Evaluating the effect of unbalanced data in biomedical document classification

Laza, Rosalía ; Pavón, Reyes ; Reboiro-Jato, Miguel ; Fdez-Riverola, Florentino

Journal of Integrative Bioinformatics - JIB (ISSN 1613-4516)


Download file

Abstract:
Nowadays, document classification has become an interesting research field. Partly, this is due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents. Related with this domain, imbalanced data is a well-known problem in many practical applications of knowledge discovery and its effects on the performance of standard classifiers are remarkable. In this paper, we investigate the application of a Bayesian Network (BN) model for the triage of documents, which are represented by the association of different MeSH terms. Our results show that BNs are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels. Moreover, we perform an extensive experimental evaluation to investigate if the classification of Medline documents using a BN classifier poses additional challenges when dealing with class-imbalanced prediction. The evaluation involves two methods, under-sampling and cost-sensitive learning. We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.


Institution: Faculty of Technology, Research Groups in Informatics
DDC classification: Data processing, computer science, computer systems

Suggested Citation:
Laza, Rosalía ; Pavón, Reyes ; Reboiro-Jato, Miguel ; Fdez-Riverola, Florentino  (2011)  Evaluating the effect of unbalanced data in biomedical document classification. Journal of Integrative Bioinformatics - JIB (ISSN 1613-4516), 8(3), 2011

Online-Journal: http://journal.imbio.de/article.php?aid=177
URL: http://biecoll.ub.uni-bielefeld.de/volltexte/2011/5193



 Questions or comments: publikationsdienste.ub@uni-bielefeld.de
 Latest update: 15 Feb 2011
 Legal Notice
OPUS-Logo     OAI compliant      BU Logo
OAI-Logo