Cross-Modal Learning of Visual Categories using Different Levels of Supervision


  • Mario Fritz
  • Geert-Jan M. Kruijff
  • Bernt Schiele



object categorization, cross-modal learning, incremental and interactive learning, DDC: 004 (Data processing, computer science, computer systems)


Today's object categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attractive due to their potential to use far more and unlabeled training data. This paper proposes a novel method that uses unsupervised training to obtain visual groupings of objects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a unified and scale-invariant object representation that allows to handle labeled as well as unlabeled information in a coherent way. One of the potential settings is to learn object category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous. First experiments demonstrate the ability of the system to learn meaningful generalizations across objects already from a few dialogue interactions.






The 5th International Conference on Computer Vision Systems