Dimensionality Reduction of very large document collections by Semantic Mapping

  • Renato Fernandes Corrêa
  • Teresa Bernarda Ludermir
Schlagworte: Document Clustering, Dimensionality Reduction, Semantic Mapping, DDC: 004 (Data processing, computer science, computer systems)


This paper describes improving in Semantic Mapping, a feature extraction method useful to dimensionality reduction of vectors representing documents of large text collections. This method may be viewed as a specialization of the Random Mapping, method proposed in WEBSOM project. Semantic Mapping, Random Mapping and Principal Component Analysis (PCA) are applied to categorization of document collections using Self-Organizing Maps (SOM). Semantic Mapping generated document representation as good as PCA and much better than Random Mapping.