Universität Bielefeld Electronic Collections animiertes Foto Universität Bielefeld

Zugang zum Dokument



Evaluating the effect of unbalanced data in biomedical document classification

Laza, Rosalía ; Pavón, Reyes ; Reboiro-Jato, Miguel ; Fdez-Riverola, Florentino

Journal of Integrative Bioinformatics - JIB (ISSN 1613-4516)



Abstract:
Nowadays, document classification has become an interesting research field. Partly, this is due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents. Related with this domain, imbalanced data is a well-known problem in many practical applications of knowledge discovery and its effects on the performance of standard classifiers are remarkable. In this paper, we investigate the application of a Bayesian Network (BN) model for the triage of documents, which are represented by the association of different MeSH terms. Our results show that BNs are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels. Moreover, we perform an extensive experimental evaluation to investigate if the classification of Medline documents using a BN classifier poses additional challenges when dealing with class-imbalanced prediction. The evaluation involves two methods, under-sampling and cost-sensitive learning. We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.


Beteiligte Einrichtung: Technische Fakultät, Arbeitsgruppen der Informatik
DDC-Sachgruppe: Datenverarbeitung, Informatik

Zitat-Vorschlag:
Laza, Rosalía ; Pavón, Reyes ; Reboiro-Jato, Miguel ; Fdez-Riverola, Florentino  (2011)  Evaluating the effect of unbalanced data in biomedical document classification. Journal of Integrative Bioinformatics - JIB (ISSN 1613-4516), 8(3), 2011

Online-Journal: http://journal.imbio.de/article.php?aid=177
URL: http://biecoll.ub.uni-bielefeld.de/volltexte/2011/5193



 Fragen und Anregungen an: publikationsdienste.ub@uni-bielefeld.de
 Letzte Änderung: 15.2.2011
 Impressum
OPUS-Logo     OAI-zertifiziert      Universitätsbibliothek Bielefeld
OAI-Logo