https://biecoll.ub.uni-bielefeld.de/index.php/icvs/issue/feedInternational Conference on Computer Vision Systems : Proceedings2019-06-05T13:16:22+00:00Open Journal Systems<p>Affiliation: Faculty of Technology, Research Groups in Informatics</p>https://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/1673D Modeling of Objects by Using Resilient Neural Network2019-06-05T13:16:22+00:00Erkan Besdokojs.ub@uni-bielefeld.deCamera Calibration (CC) is a fundamental issue for Shape-Capture, Robotic-Vision and 3D Reconstruction in Photogrammetry and Computer Vision. The purpose of CC is the determination of the intrinsic parameters of cameras for metric evaluation of the images. Classical CC methods comprise of taking images of objects with known geometry, extracting the features of the objects from the images, and minimizing their 3D backprojection errors. In this paper, a novel implicit-CC model (CC-RN) based on Resilient Neural Networks has been introduced. The CC-RN is particularly useful for 3D reconstruction of the applications that do not require explicitly computation of physical camera parameters in addition to the expert knowledge. The CC-RN supports intelligent-photogrammetry, photogrammetron. In order to evaluate the success of the proposed implicit-CC model, the 3D reconstruction performance of the CC-RN has been compared with two different well-known implementations of the Direct Linear Transformation (DLT). Extensive simulation results show that the CC-RN achieves a better performance than the well-known DLTs in the 3D backprojection of scene.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/177A New Method and Toolbox for Easily Calibrating Omnidirectional Cameras2019-06-05T13:16:11+00:00Davide Scaramuzzaojs.ub@uni-bielefeld.deRoland Siegwartojs.ub@uni-bielefeld.deIn this paper, we focus on calibration of central omnidirectional cameras, both dioptric and catadioptric. We describe our novel camera model and algorithm and provide a practical Matlab Toolbox, which implements the proposed method. Our method relies on the use of a planar grid that is shown by the user at different unknown positions and orientations. The user is only asked to click on the corner points of the images of this grid. Then, calibration is quickly and automatically performed. In contrast with previous approaches, we do not use any specific model of the omnidirectional sensor. Conversely, we assume that the imaging function can be described by a polynomial approximation whose coefficients are estimated by solving a linear least squares minimization problem followed by a non-linear refinement. The performance of the approach is shown through several calibration experiments on both simulated and real data. The proposed algorithm is implemented as a Matlab Toolbox, which allows any inexpert user to easily calibrate his own camera. The toolbox is completely Open Source and is freely downloadable from the author's Web page.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/194Automatic Analysis of Lens Distortions in Image Registration2019-06-05T13:15:30+00:00Birgit Möllerojs.ub@uni-bielefeld.deStefan Poschojs.ub@uni-bielefeld.deGeometric image registration by estimating homographies is an important processing step in a wide variety of computer vision applications. The 2D registration of two images does not require an explicit reconstruction of intrinsic or extrinsic camera parameters. However, correcting images for non-linear lens distortions is highly recommended. Unfortunately, standard calibration techniques are sometimes difficult to apply and reliable estimations of lens distortions can only rarely be obtained. In this paper we present a new technique for automatically detecting and categorising lens distortions in pairs of images by analysing registration results. The approach is based on a new metric for registration quality assessment and facilitates a PCA-based statistical model for classifying distortion effects. In doing so the overall importance for lens calibration and image corrections can be checked, and a measure for the efficiency of accordant correction steps is given.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/198Bowling for Calibration: An Undemanding Camera Calibration Procedure Using a Sphere2019-06-05T13:15:25+00:00Pietro Cerriojs.ub@uni-bielefeld.deOscar Gerelliojs.ub@uni-bielefeld.deDario Lodi Rizziniojs.ub@uni-bielefeld.deCamera calibration is a critical problem in computer vision. This paper presents a new method for extrinsic parameters computation: images of a ball rolling on a flat plane in front of the camera are used to compute roll and pitch angles. The calibration is achieved by an iterative Inverse Perspective Mapping (IPM) process that uses an estimation on ball gradient invariant as a stop condition. The method is quick and as easy to use as throw a ball and is particularly suited to be used to quickly calibrate vision systems in unfriendly environments where a grid is not available. The algorithm correctness is demonstrated and its accuracy is computed using both computer generated and real images.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/209Easy-to-use calibration of multiple-camera setups2019-06-05T13:14:33+00:00Ferenc Kahleszojs.ub@uni-bielefeld.deCornelius Lilgeojs.ub@uni-bielefeld.deReinhard Kleinojs.ub@uni-bielefeld.deCalibration of the pinhole camera model has a well-established theory, especially in the presence of a known calibration object. Unfortunately, in wide-base multi-camera setups, it is hard to create a calibration object, which is visible by all the cameras simultaneously. This results in the fact that conventional calibration methods do not scale well. Using well-known algorithms, we developed a streamlined calibration method, which is able to calibrate multi-camera setups only with the help of a planar calibration object. The object does not have to be observed by at the same time by all the cameras involved in the calibration. Our algorithm breaks down the calibration into four consecutive steps: feature extraction, distortion correction, intrinsic and finally extrinsic calibration. We also made the implementation of the presented method available from our website.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/239Radiometric alignment and vignetting calibration2019-06-05T11:13:07+00:00Pablo d´Angeloojs.ub@uni-bielefeld.deThis paper describes a method to photometrically align registered and overlapping images which have been subject to vignetting (radial light falloff), exposure variations, white balance variation and nonlinear camera response. Applications include estimation of vignetting and camera response; vignetting and exposure compensation for image image mosaicing; and creation of high dynamic range mosaics. Compared to previous work white balance changes can be compensated and a computationally efficient algorithm is presented. The method is evaluated with synthetic and real images and is shown to produce better results than comparable methods.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/240Real-time scattering compensation for time-of-flight camera2019-06-05T11:13:06+00:00James Mure-Duboisojs.ub@uni-bielefeld.deHeinz Hügliojs.ub@uni-bielefeld.de3D images from time-of-flight cameras may suffer from false depth readings caused by light scattering. In order to reduce such scattering artifacts, a scattering compensation procedure is proposed. First, scattering is analysed and expressed as a linear transform of a complex image. Then, a simple scattering model is formulated. Assuming a space invariant point spread function as a model for the scattering leads to a solution in a form of a deconvolution scheme whose computational feasibility and practical applicability are further discussed in this paper.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/242Robust Camera Calibration and Evaluation Procedure Based on Images Rectification and 3D Reconstruction2019-06-05T11:13:03+00:00Rachid Guerchoucheojs.ub@uni-bielefeld.deFrançois Coldefyojs.ub@uni-bielefeld.deThis paper presents a robust camera calibration algorithm based on contour matching of a known pattern object. The method does not require a fastidious selection of particular pattern points. We introduce two versions of our algorithm, depending on whether we dispose of a single or several calibration images. We propose an evaluation procedure which can be applied for all calibration methods for stereo systems with unlimited number of cameras. We apply this evaluation framework to 3 camera calibration techniques, our proposed robust algorithm, the modified Zhang algorithm implemented by J. Bouguet and Faugeras-Toscani method. Experiments show that our proposed robust approach presents very good results in comparison with the two other methods. The proposed evaluation procedure gives a simple and interactive tool to evaluate any camera calibration method.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/1683D Scene Segmentation and Object Tracking in Multiocular Image Sequences2019-06-05T13:16:21+00:00Joachim Schmidtojs.ub@uni-bielefeld.deChristian Wöhlerojs.ub@uni-bielefeld.deLars Krügerojs.ub@uni-bielefeld.deTobias Gövertojs.ub@uni-bielefeld.deChristoph Hermesojs.ub@uni-bielefeld.deIn this contribution we describe a vision-based system for the 3D detection and tracking of moving persons and objects in complex scenes. A 3D point cloud of the scene is extracted by a combined stereo technique consisting of a correlation-based block-matching approach and a spacetime stereo approach based on spatio-temporally local intensity modelling. Hence, the result of stereo analysis is a 3D point cloud attributed with motion information. For localising persons and objects in the scene the point cloud is segmented into meaningful clusters by applying a hierarchical clustering algorithm, using velocity information as an additional discrimination criterion. Initial object hypotheses are obtained by partitioning the observed scene with cylinders, including the tracking results of the previous frame. Multidimensional unconstrained nonlinear minimisation is then applied to refine the position, velocity and size of the initial cylinder in the scene, such that neighbouring clusters with similar velocity vectors are grouped to form a compact object. A particle filter is applied to select hypotheses which generate consistent trajectories. The described system is evaluated based on a tabletop sequence and several real-world sequences acquired in an industrial production environment, based on manually obtained ground truth data. We find that even in the presence of moving objects closely neighbouring the person, all objects are detected and tracked in a robust and stable manner. The average tracking accuracy is of the order of several percent of the distance to the scene.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/169A Biomimetic Vision Architecture2019-06-05T13:16:20+00:00Bruce A. Draperojs.ub@uni-bielefeld.deThe goal of biomimetic vision is to build artificial vision systems that are analogous to the human visual system. This paper presents a software architecture for biomimetic vision in which every major component is clearly defined in terms of its function and interface, and where every component has a analog in the regional functional anatomy of the human brain. We also present an end-to-end vision system implemented within this framework that learns to recognize objects without human supervision.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/170A Comparison of Classifiers for Prescreening of Honeybee Brood Cells2019-06-05T13:16:19+00:00Uwe Knauerojs.ub@uni-bielefeld.deFred Zautkeojs.ub@uni-bielefeld.deKaspar Bienefeldojs.ub@uni-bielefeld.deBeate Meffertojs.ub@uni-bielefeld.deWe report on an image classification task originated from the video observation of beehives. Biologists desire to have an automatic support to identify so called hygienic bees. For this it is important to know which brood cells are in a stadium of initial opening. To find these cells a prescreening process is necessary which classifies three types of cells. To solve this decision problem a number of classification techniques are evaluated. ROC-analysis for the given problem shows that the SVM classifier with RBF kernel outperforms linear discrimance analysis, decision trees, boosted classifiers, and other kernel functions.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/171A Comprehensive System for 3D Modeling from Range Images Acquired from a 3D ToF Sensor2019-06-05T13:16:18+00:00Agnes Swadzbaojs.ub@uni-bielefeld.deBing Liuojs.ub@uni-bielefeld.deJochen Penneojs.ub@uni-bielefeld.deOliver Jesorskyojs.ub@uni-bielefeld.deRalf Kompeojs.ub@uni-bielefeld.deDeveloping a system which generates a 3D representation of a whole scene is a difficult task. Several new technologies of 3D time-of-flight (ToF) imaging have been developed in recent years, which overcome various limitations of other 3D imaging systems, such as laser/ra\-dar/sonar scanners, structured light and stereo rigs. However, only limited work got published upon computer vision applications based on such ToF sensors. We present in this paper a new complete system for 3D modeling from a sequence of range images acquired during an arbitrary flight of a 3D ToF sensor. First, comprehensive preprocessing steps are performed to improve the quality of range images. An initial estimate of the transformation between two 3D point clouds, which are computed from two consecutive range images respectively, is achieved through feature extraction and tracking based on three kinds of images delivered by the 3D sensor. During the initial estimation, a RANSAC sampling algorithm is implemented to filter out outlier correspondences. At last the transformation is further optimized through registering the two 3D point clouds using a robust variation of the Iterative Closest Point (ICP) algorithm, the so-called Picky ICP. Extensive experimental results are provided in the paper and show the efficiency and robustness of the proposed system.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/172A Constrained Alternating Optimization Framework for Feature Matching2019-06-05T13:16:16+00:00Xiabi Liuojs.ub@uni-bielefeld.deYunde Jiaojs.ub@uni-bielefeld.deThis paper proposes a constrained alternating optimization framework to tackle the feature matching problem with partial matching and multiple matching. We model the difference between pairing features as the result of a transformation followed an uncertainty distribution. Based on this modeling, transformation estimation and feature matching are performed alternately from initial matching: the transformation is updated according to the matching, and the matching is updated according to the transformation and the uncertainty distribution. A pruning operation is further presented to reduce the search space of initial matching. In the proposed framework, we develop a B-spline curve feature matching algorithm for hand-gesture based text input, and a line feature matching algorithm which is tested for three applications: model-based recognition, image registration, and stereo matching. The experimental results for two algorithms are reported.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/173A flexible multi-server platform for distributed video information processing2019-06-05T13:16:15+00:00Yao Wangojs.ub@uni-bielefeld.deLinmi Taoojs.ub@uni-bielefeld.deQiang Liuojs.ub@uni-bielefeld.deYanjun Zhaoojs.ub@uni-bielefeld.deGuangyou Xuojs.ub@uni-bielefeld.deThe complexity of recent computer vision system calls for an infrastructure of distributed processing. This paper presents a platform acting as a framework for video information processing applications to plug in and contribute to the "intelligence" of the system. The platform is composed by a set of servers which collaborate with each other to complete the tasks like video capture, transmission, buffering and synchronization. A user side lib is also provided for the simplicity of application-platform interface. We show the usefulness of the platform by a motion detect application in our On-The-Spot Archiving system. As the platform is flexible, its usage is not limited in one or two systems.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/174A Layered Active Memory Architecture for Cognitive Vision Systems2019-06-05T13:16:14+00:00Ilias Koloniasojs.ub@uni-bielefeld.deWilliam Christmasojs.ub@uni-bielefeld.deJosef Kittlerojs.ub@uni-bielefeld.deRecognising actions and objects from video material has attracted growing research attention and given rise to important applications. However, injecting cognitive capabilities into computer vision systems requires an architecture more elaborate than the traditional signal processing paradigm for information processing. Inspired by biological cognitive systems, we present a memory architecture enabling cognitive processes (such as selecting the processes required for scene understanding, layered storage of data for context discovery, and forgetting redundant data) to take place within a computer vision system. This architecture has been tested by automatically inferring the score of a tennis match, and experimental results show a significant improvement in the overall vision system performance --- demonstrating that managing visual data in a manner more akin to that of the human brain is a key factor in improving the efficiency of computer vision systems.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/175A Motion Calculation System Based on Background Motion Modeling2019-06-05T13:16:13+00:00Tieqi Chenojs.ub@uni-bielefeld.deYi Lu Murpheyojs.ub@uni-bielefeld.deGrant Gerhartojs.ub@uni-bielefeld.deRobert Karlsenojs.ub@uni-bielefeld.deMotion calculation is often a necessary pre-processing step for moving object detection and tracking. It is a challenging task when the images are taken in outdoor scenes with cameras mounted on a moving vehicle. In this paper we present an accurate and efficient motion calculation system. The accuracy of the system is achieved by estimating background motions and eliminating those pixels that have similar motions to the background motion, and by calculating motion vectors using affine image transformation with Newton-Raphson style search method under subpixel resolution. Efficiency is achieved by concentrating on the regions of interests through a coarse-to-fine process.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/176A Multi-Cue-Based Human Body Tracking System2019-06-05T13:16:12+00:00Yihua Xuojs.ub@uni-bielefeld.deXiao Dengojs.ub@uni-bielefeld.deYunde Jiaojs.ub@uni-bielefeld.deThis paper presents a real-time vision-based system for the tracking of human upper body with both color images and depth maps. We combine the color-histogram-based particle filtering and mean shift algorithm to track face and hands and estimate other body parts by human kinetics. A multi-cue approach that integrates depth information and the color-based method is introduced to handle the rapid and complex motions of human hands. Real-time depth is recovered based on simple hardware configuration, which makes our system easy to be popularized in many real-world applications like digital entertainment. The system runs at 20 fps for images with 320x240 pixels on a 2.8GHz PC.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/178A Reactive Vision System: Active-Dynamic Saliency2019-06-05T13:16:10+00:00Andrew Dankersojs.ub@uni-bielefeld.deNick Barnesojs.ub@uni-bielefeld.deAlex Zelinskyojs.ub@uni-bielefeld.deWe develop an architecture for reactive visual analysis of dynamic scenes. We specify a minimal set of system features based upon biological observations. We implement feature on a processing network based around an active stereo vision mechanism. Active rectification and mosaicing allows static stereo algorithms to operate on the active platform. Foveal zero disparity operations permit attended object extraction and ensures coordinated stereo fixation upon visual surfaces. Active-dynamic inhibition of return, and task dependent biasing result in a flexible, preemptive and retrospective system that responds to unique visual stimuli and is capable of top-down modulation of attention towards regions and cues relevant to tasks.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/179A Real-time Algorithm for Finger Detection in a Camera Based Finger-Friendly Interactive Board System2019-06-05T13:16:09+00:00Ye Zhouojs.ub@uni-bielefeld.deGerald Morrisonojs.ub@uni-bielefeld.deThis paper proposes an approach to finger detection for a type of camera based interactive board. In this approach, finger finding is confined within a stripe that is the projection of the edge of the board on the image plane with respect to a camera instead of using global search. The region where a finger intersects with the stripe is first detected and segmented from the background. A region growing algorithm is then applied to the region to segment the whole finger. This approach can detect multi-targets and be implemented efficiently, processing 30 or more 640×120 images per second even in a cheap DSP.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/180A Real-time Visual Attention System Using Integral Images2019-06-05T13:16:08+00:00Simone Frintropojs.ub@uni-bielefeld.deMaria Klodtojs.ub@uni-bielefeld.deErich Romeojs.ub@uni-bielefeld.deSystems which simulate human visual attention are suited to quickly find regions of interest in images and are an interesting preprocessing method for a variety of applications. However, the scale-invariant computation of features in several feature dimensions is still too time consuming to be applied to video streams at frame rate which is necessary many practical applications. As a consequence, current implementations of attention systems often make compromises between the accuracy and speed of computing a focus of attention in order to reduce the computation time. In this paper, we present a method for achieving fast, real-time capable system performance with high accuracy. The method involves smart feature computation techniques based on integral images. An experimental validation of the speed gain of the attention system VOCUS is provided, too. The real-time capability of the optimized VOCUS system has already been demonstrated in robotic applications.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/181A System for Continuous Learning of Visual Concepts2019-06-05T13:16:06+00:00Danijel Skocajojs.ub@uni-bielefeld.deGregor Bergincojs.ub@uni-bielefeld.deBarry Ridgeojs.ub@uni-bielefeld.deAles Stimecojs.ub@uni-bielefeld.deMatjaz Joganojs.ub@uni-bielefeld.deOndrej Vanekojs.ub@uni-bielefeld.deAles Leonardisojs.ub@uni-bielefeld.deManuela Hutterojs.ub@uni-bielefeld.deNick Hawesojs.ub@uni-bielefeld.deWe present an artificial cognitive system for learning visual concepts. It comprises of vision, communication and manipulation subsystems, which provide visual input, enable verbal and non-verbal communication with a tutor and allow interaction with a given scene. The main goal is to learn associations between automatically extracted visual features and words that describe the scene in an open-ended, continuous manner. In particular, we address the problem of cross-modal learning of visual properties and spatial relations. We introduce and analyse several learning modes requiring different levels of tutor supervision.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/183A vision based motion interface for mobile phones2019-06-05T13:15:43+00:00Mark Barnardojs.ub@uni-bielefeld.deJari Hannukselaojs.ub@uni-bielefeld.dePekka Sangiojs.ub@uni-bielefeld.deJanne Heikkiläojs.ub@uni-bielefeld.deIn this paper we present an interface system for the control of mobile devices based on motion and using existing camera technology. In this system the user can control the phone's functions by performing a series of motions with the camera and each command is defined by a unique series of these motions. A sequence of motion features is produced using the phone's camera and these characterise the translation motion of the phone. These sequences of motion features are classified using _Hidden Markov Models_ (HMMs). In order to improve the robustness of the system the results of this classification are then filtered using a likelihood ratio and the entropy of the sequence to reject possibly incorrect sequences. When tested on 570 previously unseen motion sequences the system incorrectly classified only 5 sequences.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/184Active Vision-based Localization For Robots In A Home-Tour Scenario2019-06-05T13:15:41+00:00Falk Schubertojs.ub@uni-bielefeld.deThorsten Spexardojs.ub@uni-bielefeld.deMarc Hanheideojs.ub@uni-bielefeld.deSven Wachsmuthojs.ub@uni-bielefeld.deSelf-Localization is a crucial task for mobile robots. It is not only a requirement for auto navigation but also provides contextual information to support human robot interaction (HRI). In this paper we present an active vision-based localization method for integration in a complex robot system to work in human interaction scenarios (e.g. _home-tour_) in a real world apartment. The holistic features used are robust to illumination and structural changes in the scene. The system uses only a single pan-tilt camera shared between different vision applications running in parallel to reduce the number of sensors. Additional information from other modalities (like laser scanners) can be used, profiting of an integration into an existing system. The camera view can be actively adapted and the evaluation showed that different rooms can be discerned.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/185Adaptive Image Sampling and Windows Classification for On-board Pedestrian Detection2019-06-05T13:15:40+00:00David Gerónimoojs.ub@uni-bielefeld.deAngel D. Sappaojs.ub@uni-bielefeld.deAntonio Lópezojs.ub@uni-bielefeld.deDaniel Ponsaojs.ub@uni-bielefeld.deOn-board pedestrian detection is in the frontier of the state-of-the-art since it implies processing outdoor scenarios from a mobile platform and searching for aspect-changing objects in cluttered urban environments. Most promising approaches include the development of classifiers based on feature selection and machine learning. However, they use a large number of features which compromises real-time. Thus, methods for running the classifiers in only a few image windows must be provided. In this paper we contribute in both aspects, proposing a camera pose estimation method for adaptive sparse image sampling, as well as a classifier for pedestrian detection based on Haar wavelets and edge orientation histograms as features and AdaBoost as learning machine. Both proposals are compared with relevant approaches in the literature, showing comparable results but reducing processing time by four for the sampling tasks and by ten for the classification one.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/186An Adaptive Vision System for Tracking Soccer Players from Various Camera Settings2019-06-05T13:15:39+00:00Suat Gedikliojs.ub@uni-bielefeld.deJan Bandouchojs.ub@uni-bielefeld.deNico von Hoyningen-Hueneojs.ub@uni-bielefeld.deBernhard Kirchlechnerojs.ub@uni-bielefeld.deMichael Beetzojs.ub@uni-bielefeld.deIn this paper we present Aspogamo, a vision system capable of estimating motion trajectories of soccer players taped on video. The system performs well in a multitude of application scenarios because of its adaptivity to various camera setups, such as single or multiple camera settings, static or dynamic ones. Furthermore, Aspogamo can directly process image streams taken from TV broadcast, and extract all valuable information despite scene interruptions and cuts between different cameras. The system achieves a high level of robustness through the use of modelbased vision algorithms for camera estimation and player recognition and a probabilistic multi-player tracking framework capable of dealing with occlusion situations typical in team-sports. The continuous interplay between these submodules is adding to both the reliability and the efficiency of the overall system.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/188An Un-awarely Collected Real World Face Database: The LabName-Door Face Database2019-06-05T13:15:37+00:00Hazim Kemal Ekenelojs.ub@uni-bielefeld.deRainer Stiefelhagenojs.ub@uni-bielefeld.deIn this paper we present a new face database that has been collected under real world conditions and without collaborating with the individuals whose images are being captured. The images in the database are recorded with a zoom camera monitoring the door of the laboratory. The developed capture software processes each frame and whenever it detects a face as well as the eyes, it saves the frame to the database. A face recognition software is also accompanied to the image acquisition system to help to label the identities of the individuals in the database. Recordings have been done for six months and this way ten thousands pictures of more than 100 individuals have been collected. From this set, approximately 33000 images of 30 people are made available to public. To give an idea about the difficulty level of doing face recognition under such a scenario, the well-known face recognition algorithms are tested on the collected database.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/189Architecture and Tracking Algorithms for a Distributed Mobile Industrial AR System2019-06-05T13:15:35+00:00Reinhard Kochojs.ub@uni-bielefeld.deJan-Friso Evers-Senneojs.ub@uni-bielefeld.deIngo Schillerojs.ub@uni-bielefeld.deHarald Wuestojs.ub@uni-bielefeld.deDidier Strickerojs.ub@uni-bielefeld.deIn Augmented Reality applications, a 3D object is registered with a camera and visual augmentations of the object are rendered into the users field of view with a head mounted display. For correct rendering, the 3D pose of the users view w.r.t. the 3D object must be registered and tracked in realtime, which is a computational intensive task. This contribution describes a distributed system that allows to track the 3D camera pose and to render images on a light-weight mobile frontend user interface system. The frontend system is connected by WLAN to a backend server that takes over the computational burdon for realtime tracking. We describe the system architecture and the tracking algorithms of our system.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/190Assisting persons with dementia during handwashing using a partially observable Markov decision process.2019-06-05T13:15:34+00:00Jesse Hoeyojs.ub@uni-bielefeld.deAxel von Bertoldiojs.ub@uni-bielefeld.dePascal Poupartojs.ub@uni-bielefeld.deAlex Mihailidisojs.ub@uni-bielefeld.deThis paper presents a real-time system to assist a person with dementia wash their hands. Assistance is given in the form of verbal and/or visual prompts, or through the enlistment of a human caregiver's help. The system uses only video inputs, and combines a Bayesian sequential estimation framework for tracking hands and towel, with a decision theoretic framework for computing policies of action -- specifically a partially observable Markov decision process (POMDP). A key element of the system is the ability to estimate and adapt to user states, such as awareness, responsiveness and overall dementia level. We demonstrate the system in a set of simulation experiments, and we show examples of real-time interactions with actors.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/191Attention and Visual Search : Active Robotic Vision Systems that Search2019-06-05T13:15:33+00:00John K. Tsotsosojs.ub@uni-bielefeld.deKsenia Shubinaojs.ub@uni-bielefeld.deVisual attention is a multi-faceted phenomenon, playing different roles in different situations and for different processing mechanisms. Regardless, attention is a mechanism that optimizes the search processes inherent in vision. This perspective leads to sound theoretical foundation for studies of attention in both machine and in the brain. The development of this foundation and the many ways in which attentional processes manifest themselves will be overviewed. One particular example of a practical robotic vision system that employs some of these attentional processes will be described. A difficult problem for robotic vision systems is visual search for a given target in an arbitrary 3D space. A solution to this problem will be described that optimizes the probability of finding the target given a fixed cost limit in terms of total number of robotic actions the robot requires to find its visual target. A robotic realization will be shown.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/195Automatic Calibration of a Urban Video Surveillance System through the Observation of Zebra Crossings2019-06-05T13:15:28+00:00Alberto Broggiojs.ub@uni-bielefeld.deAlessandra Fascioliojs.ub@uni-bielefeld.deRean Isabella Fedrigaojs.ub@uni-bielefeld.deStefano Ghidoniojs.ub@uni-bielefeld.deIn this paper, a method for automatic calibration of a camera stereo pair through the observation of zebra crossing signs is described. It is based on the well-known consideration that it is possible to obtain information about lens distortion and camera orientation by observing how a known pattern appears in the image; moreover, a major advantage of this system is that it does not require any ad-hoc calibration pattern, because it exploits the zebra crossing signs, a pattern usually present in images used for monitoring pedestrians while crossing a road. To achieve this goal, well-known techniques for removing lens distortion and perspective effect are combined with new methods for locating calibration points on the available pattern, and, finally, for evaluating the camera position.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/196Autonomic Computer Vision Systems2019-06-05T13:15:27+00:00James L. Crowleyojs.ub@uni-bielefeld.deDaniela Hallojs.ub@uni-bielefeld.deRemi Emonetojs.ub@uni-bielefeld.deMost computer vision systems perform well under controlled laboratory conditions, but require lengthy set up and "tuning" by experts when installed in new operating conditions. Unfortunately, for most real applications of computer vision, the operating conditions frequently change. These changes degrade system performance and can even cause complete system failure, requiring intervention by a trained engineer. The requirement for installation and frequent maintenance by highly trained experts seriously inhibits the commercial application of computer vision systems. In this talk we discuss ways in which autonomic computing can reduce the cost of installation and configuration, as well as enhance reliability, for practical computer vision systems. We begin by reviewing the origins of autonomic computing. We then describe the design of a computer vision system as a software component within a layered service architecture. We describe techniques for regulation of internal parameters, error detection and recovery, self description, and self configuration for vision systems. These methods will be illustrated with results from the IST projects FAME, CAVIAR and CHIL.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/197Binarized Eigenphases for Limited Memory Face Recognition Applications2019-06-05T13:15:26+00:00Naser G. Zaeriojs.ub@uni-bielefeld.deFarzin Mokhtarianojs.ub@uni-bielefeld.deAbdallah Cherriojs.ub@uni-bielefeld.deMost of the algorithms proposed for face recognition involve considerable amount of calculations, and hence they can not be used on devices of limited memory constraints. In this paper, we propose a novel solution for efficient face recognition problem for the systems that utilize low memory devices. The new technique applies the principal component analysis to the binarized phase spectrum of the Fourier transform of the covariance matrix constructed from the MPEG-7 Fourier Feature Descriptor vectors of the images. The binarization step that is applied to the phases adds many interesting advantages to the system. It will be shown that the proposed technique maximizes the recognition rate while achieving substantial savings in computational time, when compared to other known systems.2007-12-31T00:00:00+00:00Copyright (c) https://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/199Bridging the Web and Perceptual User Interfaces2019-06-05T13:15:24+00:00Ingo Lütkebohleojs.ub@uni-bielefeld.deSven Wachsmuthojs.ub@uni-bielefeld.deFranz Kummertojs.ub@uni-bielefeld.deWe present a system that bridges the perceptual user interfaces paradigm and web applications, and thus allows us to control a web application through hand-gestures. It exemplifies a general interaction architecture that enables multi-modal interaction for arbitrary, unchanged web applications and thus makes available a large number of real-world applications for multi-modal interaction. In addition, we demonstrate how knowledge about the user interface provides a powerful constraint for pattern analysis. First evaluation results for the approach are given with respect to an image viewing web application.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/200Central Catadioptric Camera Calibration using Planar Objects2019-06-05T13:15:23+00:00Cheng-I Chenojs.ub@uni-bielefeld.deYong-Sheng Chenojs.ub@uni-bielefeld.deCentral catadioptric cameras combine lenses with mirrors to enlarge the field of view while keeping a single effective viewpoint. In this paper we propose a novel method of calibrating the intrinsic parameters of central catadioptric cameras using a planar object. Based on the viewing sphere model, we can warp a portion of the catadioptric image to an image captured by a virtual perspective camera with given intrinsic and extrinsic parameters. We show that placing the planar object several times around the catadioptric camera is equivalent to placing the same object at different poses relative to a static virtual perspective camera. Therefore, homography method can be applied to calculate the relative poses of the planar object as well as the projection error of feature points on the planar object. By minimizing the projection error, we can obtain the optimized intrinsic parameters of the central catadioptric camera. Besides simplicity, experiments with simulation and real image data clearly demonstrate the high robustness and accuracy of the proposed calibration method.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/201Classifier training based on synthetically generated samples2019-06-05T13:15:22+00:00Hélène Hoesslerojs.ub@uni-bielefeld.deChristian Wöhlerojs.ub@uni-bielefeld.deFrank Lindnerojs.ub@uni-bielefeld.deUlrich Kreßelojs.ub@uni-bielefeld.deIn most image classification systems, the amount and quality of the training samples used to represent the different pattern classes are important factors governing the recognition performance. Hence, it is usually necessary to acquire a representative set of training samples by acquisition of data in real-world environments. Such procedures may require considerable efforts and furthermore often generate a training set which is unbalanced with respect to the number of available samples per class. In this contribution we regard classification tasks for which each real-world training sample is derived from an ideal class representative which undergoes a geometric and photometric transformation. This transformation depends on system-specific influencing quantities of the image formation process such as illumination, characteristics of the sensor and optical system, or camera motion. The parameters of the transformation model are learned from object classes for which a large number of real-world samples are available. For each individual real-world sample a set of model parameters is derived by correspondingly fitting the transformed ideal sample to the observed sample. The obtained probability distribution of model parameters is used to generate synthetic sample sets for all regarded pattern classes. This training approach is applied to a vehicle-based vision system for traffic sign recognition. Our experimental evaluation on a large set of real-world test data demonstrates that the classification rates obtained for classifiers trained with synthetic samples are comparable to those obtained based on real-world training data.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/205Cross-Modal Learning of Visual Categories using Different Levels of Supervision2019-06-05T13:15:18+00:00Mario Fritzojs.ub@uni-bielefeld.deGeert-Jan M. Kruijffojs.ub@uni-bielefeld.deBernt Schieleojs.ub@uni-bielefeld.deToday's object categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attractive due to their potential to use far more and unlabeled training data. This paper proposes a novel method that uses unsupervised training to obtain visual groupings of objects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a unified and scale-invariant object representation that allows to handle labeled as well as unlabeled information in a coherent way. One of the potential settings is to learn object category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous. First experiments demonstrate the ability of the system to learn meaningful generalizations across objects already from a few dialogue interactions.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/206Data fusion and eigenface based tracking dedicated to a Tour-Guide Robot2019-06-05T13:15:17+00:00Thierry Germaojs.ub@uni-bielefeld.deLudovic Brèthesojs.ub@uni-bielefeld.deFrédéric Lerasleojs.ub@uni-bielefeld.deThierry Simonojs.ub@uni-bielefeld.deThis article presents a key-scenario of H/R interaction for our tour-guide robot. Given this scenario, three visual modalities, the robot deals with, have been outlined, namely the "search of visitors" attending the exhibition, the "proximal interaction" through the robot interface and the "guidance mission". The paper focuses on the two last ones which involves face recognition and visual data fusion in a particle filtering framework. Evaluations on key-sequences in a human centred environment show the tracker robustness to background clutter, sporadic occlusions and group of persons. The tracker is able to cope with target loss by detecting and re-initializing automatically thanks to the face recognition outcome. Moreover, the multi-cues association proved to be more robust to clutter than any of the cues individually.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/207Digitalisation of Warped Documents Supported by 3D-Surface Reconstruction2019-06-05T13:15:16+00:00Erik Lilienblumojs.ub@uni-bielefeld.deBernd Michaelisojs.ub@uni-bielefeld.deThe high quality digitalisation of warped documents is still a big problem for most scanner technologies. The presented work is a contribution to develop a new technique handling this problem. Basic principle of the proposed method is a special kind of light section, that works with a comparatively very broad stripe lighting by using one additional matrix camera. We can reconstruct the 3d-surface of the document by simple capturing an image sequence of the stripe lighting of a common book scanner during the scanning process. Based on a surface model we transform the warped document in a plane. Result is the two-dimensional output being a nearly distortion-free digital copy of the original warped document.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/211EyeScreen: A Vision-Based Desktop Interaction System2019-06-05T13:14:31+00:00Yihua Xuojs.ub@uni-bielefeld.deJingjun Lvojs.ub@uni-bielefeld.deShanqing Liojs.ub@uni-bielefeld.deYunde Jiaojs.ub@uni-bielefeld.deEyeScreen provides a natural HCI interface with vision-based hand tracking and gesture recognition techniques. Multi-view video images captured from two cameras facing a computer screen are used to track and recognize finger and hand motions. Finger tracking is achieved by skin color detection and particle filtering, and is greatly enhanced by the proposed screen background subtraction method that removes the screen images in advance. Finger click on the screen can also be detected from multi-view information. Gesture recognition based on binocular vision is presented to improve the recognition rate. The experimental results show that EyeScreen is able to perform natural and robust interaction in desktop environment.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/212Face Alignment by 2.5D Active Appearance Model Optimized by Simplex2019-06-05T13:14:30+00:00Abdul Sattarojs.ub@uni-bielefeld.deYasser Aidarousojs.ub@uni-bielefeld.deSylvain Le Gallouojs.ub@uni-bielefeld.deRenaud Seguierojs.ub@uni-bielefeld.deIn this paper we propose an efficient algorithm to align the face in real time, based on Active Appearance Model (AAM) in 2.5D. The main objective is to make a robust, rapid and memory efficient application suitable for embedded systems, so they could align the pose rapidly by using less memory. Classical AAM is a high memory consumer algorithm, consequently transfer of this stored memory in an embedded system makes it a time consuming algorithm as well. Our 2.5D AAM is generated by taking 3D landmarks from frontal and profile view and 2D texture only from frontal view of the face image. Moreover we propose Nelder Mead Simplex technique for face search. It does not require large memory, thus becoming suitable for embedded systems by eliminating the excess memory and access time requirements. We illustrate 2.5D AAM optimized by Simplex for pose estimation and test it on three databases: M2VTS, synthetic images and webcam images. Results validate our combination of simplex and AAM in 2.5D.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/213Fast Outdoor Robot Localization Using Integral Invariants2019-06-05T13:14:29+00:00Christian Weissojs.ub@uni-bielefeld.deAndreas Masselliojs.ub@uni-bielefeld.deHashem Tamimiojs.ub@uni-bielefeld.deAndreas Zellojs.ub@uni-bielefeld.deGlobal Integral Invariant Features have shown to be useful for robot localization in indoor environments. In this paper, we present a method that uses Integral Invariants for outdoor environments. To make the Integral Invariant Features more distinctive for outdoor images, we first split the image into a grid of subimages. Then we calculate integral invariants for each grid cell individually and concatenate the results to get the feature vector for the image. Additionally, we combine this method with a particle filter to improve the localization results. We compare our approach to a Scale Invariant Feature Transform (SIFT)-based approach on images of two outdoor areas and under different illumination conditions. The results show that the SIFT approach is more exact, but the Grid Integral Invariant approach is faster and allows localization in significantly less than one second.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/214First steps towards an intentional vision system2019-06-05T13:14:28+00:00Julian Eggertojs.ub@uni-bielefeld.deSven Rebhanojs.ub@uni-bielefeld.deEdgar Koernerojs.ub@uni-bielefeld.deContrary to many standard vision systems which proceed in a cascaded feedforward manner, imposing a fixed order in the sequence of visual operations like detection preceding segmentation and classification, we develop here the idea of a vision system that flexibly controls the order and accessibility of visual processes during operation. Vision is hereby understood as the dynamic process of adaptation of visual parameters and modules as a function of underlying goals or intentions. This perspective requires a specific architectural organization, since vision is then a continuous balance between the sensory stimulation and internally generated information. In this paper we present the concept and the necessary main ingredients and show first steps towards the implementation of a real-time intentional vision system.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/215Free Space Estimation for Autonomous Navigation2019-06-05T13:14:27+00:00Nicolas Soquetojs.ub@uni-bielefeld.deMathias Perrollazojs.ub@uni-bielefeld.deRaphaël Labayradeojs.ub@uni-bielefeld.deDidier Aubertojs.ub@uni-bielefeld.deOne of the issue in autonomous navigation is the free space estimation. This paper presents an original framework and a method for the extraction of such an area by using a stereovision system. The _v-disparity_ algorithm is extended to provide a reliable and precise road profile on all types of roads. The free space is estimated by classifying the pixels of the disparity map. This classification is performed by using the road profile and the _u-disparity_ image. Each stage of the algorithm is presented and experimental results are shown.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/216Gain Adaptive Real-Time Stereo Streaming2019-06-05T13:14:26+00:00Seon Joo Kimojs.ub@uni-bielefeld.deD. Gallupojs.ub@uni-bielefeld.deJan-Michael Frahmojs.ub@uni-bielefeld.deA. Akbarzadehojs.ub@uni-bielefeld.deQ. Yangojs.ub@uni-bielefeld.deR. Yangojs.ub@uni-bielefeld.deD. Nistérojs.ub@uni-bielefeld.deM. Pollefeysojs.ub@uni-bielefeld.deThis paper introduces a multi-view stereo matcher that generates depth in real-time from a monocular video stream of a static scene. A key feature of our processing pipeline is that it estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in the stereo stage without impacting the real-time performance. This is very important for outdoor applications where the brightness range often far exceeds the dynamic range of the camera. Real-time performance is achieved by leveraging the processing power of the graphics processing unit (GPU) in addition to the CPU. We demonstrate the effectiveness of our approach on videos of urban scenes recorded by a vehicle-mounted camera with auto-gain enabled.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/217Gait-Based Pedestrian Detection for Automated Surveillance2019-06-05T13:14:25+00:00Imed Bouchrikaojs.ub@uni-bielefeld.deMark S. Nixonojs.ub@uni-bielefeld.deIn this paper, we explore a new approach for walking pedestrian detection in an unconstrained outdoor environment. The proposed algorithm is based on gait motion as the rhythm of the footprint pattern of walking people is considered the stable and characteristic feature for the classification of moving objects. The novelty of our approach is motivated by the latest research for people identification using gait. The experimental results confirmed the robustness of our method to discriminate between single walking subject, groups of people and vehicles with a detection rate of %100. Furthermore, the results revealed the potential of our method to extend visual surveillance systems to recognize walking people.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/218Gaze Control in a Multiple-Task Active-Vision System2019-06-05T13:14:24+00:00Daniel Hernandezojs.ub@uni-bielefeld.deJorge Cabreraojs.ub@uni-bielefeld.deAngel Naranjoojs.ub@uni-bielefeld.deAntonio Dominguezojs.ub@uni-bielefeld.deJose Isernojs.ub@uni-bielefeld.deVery little attention has been devoted to the problem of modular composition of vision capabilities in perception-action systems. While new algorithms and techniques have paved the way for important developments, the majority of vision systems are still designed and integrated in a very primitive way according to modern software engineering principles. This paper describes the architecture of an active vision system that has been conceived to ease the concurrent utilization of the system by several visual tasks. We describe in detail the functional architecture of the system and provide several solutions to the problem of sharing the visual attention of the system when several visual tasks need to be interleaved. The system's design hides this complexity to client processes that can be designed as if they were exclusive users of the visual system. Some preliminary results on a real robotic platform are also provided.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/220GPAPF: A Combined Approach for 3D Body Part Tracking2019-06-05T13:14:22+00:00Leonid Raskinojs.ub@uni-bielefeld.deMichael Rudzskyojs.ub@uni-bielefeld.deEhud Rivlinojs.ub@uni-bielefeld.deIn this paper we present a combined approach for body part tracking in 3D using multiple cameras, called GPAPF. This approach combines annealed particle filtering, which has been shown as effective tracker for body parts, with Gaussian Process Dynamical Model, which is used in order to reduce the dimensionality of the problem. That reduction improves the tracker's performance and increases the tracker's stability and ability to recover from the loosing the target. We also compare GPAPF tracker with the annealed particle filter and show that our tracker has a better performance even for low frame rate sequences.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/221How to formulate image processing applications?2019-06-05T13:14:21+00:00Arnaud Renoufojs.ub@uni-bielefeld.deRégis Clouardojs.ub@uni-bielefeld.deMarinette Revenuojs.ub@uni-bielefeld.deThis paper presents a system dedicated to the formulation of image processing applications for inexperienced users. We propose models and their formalization through ontologies that identify and organize the necessary and sufficient information to design such applications. We also explain the interaction means we develop to help the user to give a good formulation of the considered image processing problem.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/222Implementation of an Affine-Covariant Feature Detector in Field-Programmable Gate Arrays2019-06-05T13:14:20+00:00Cristina Cabaniojs.ub@uni-bielefeld.deW. James MacLeanojs.ub@uni-bielefeld.deThis article describes an FPGA-based implementation of the Harris-Affine feature detector introduced by Mikolajczyk and Schmid. The system is implemented on the Transmogrifier-4, a prototyping platform that includes four Altera Stratix S80 FPGAs and NTSC/VGA video interfaces. The system achieves a speed of 90-9000 times the speed of an equivalent software implementation, allowing it to process standard video (640 x 480 pixels) at 30 frames per second.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/224Individual Animal Identification using Visual Biometrics on Deformable Coat Patterns2019-06-05T13:14:18+00:00Tilo Burghardtojs.ub@uni-bielefeld.deNeill Campbellojs.ub@uni-bielefeld.deIn this paper we propose and evaluate an approach to the so far unsolved problem of robust _individual_ identification of patterned animals based on video filmed in widely unconstrained, natural habitats. Experimental results are presented for a prototype system trained on African penguins operating in a real-world animal colony of thousands. The system exploits the individuality of Turing-like camouflage patterns as identity cues since, for a wide range of species, these contain highly unique and compact distributions of phase singularities. The key problem solved in this paper is a distortion robust detection and individual comparison of non-linearly _deforming_ animals. We address the problem using a coarse-to-fine methodology that task-specifically extends and combines vision techniques in a three-stage approach: 1) Using a recently suggested integration of multiple instances of appearance detectors (Viola-Jones) and sparse feature trackers (Lucas-Kanade), a coarse, robust real-time detection of animals in appropriate poses is achieved. 2) An estimation of the 3D-deformed pose is derived by a fast, guided search on a precalculated pose-configuration model, we refer to as _Feature Prediction Tree_, which is learned off-line based on an animated, deformable 3D species-model. The estimate is refined using bundle adjustment posing a polygonal model into the scene. Following, a back-projection of the visible animal surface yields a normalised 2D texture map. 3) An extended variant of _Shape Context_ descriptors (built from filter-extracted phase singularities of characteristic texture areas) are employed as biometric templates. Finally, a distortion-robust identification is achieved by solving associated bipartite graph matching tasks for pairs of these descriptors.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/225Integrating Behavior-based Prediciton for Tracking Vehicles in Traffic Videos2019-06-05T13:14:17+00:00Ales Fexaojs.ub@uni-bielefeld.deHans-Hellmut Nagelojs.ub@uni-bielefeld.deRoad vehicles usually remain within marked lanes. Such an hypothesis reflects a longer temporal perspective than the frequently used assumption that a vehicle continues with the currently estimated speed and direction. We study the first, more general, hypothesis in particular to track road vehicles through extended periods of occlusion "without", however, relying on 3D-models of occluding foreground bodies. A potential onset of occlusion is detected by a fuzzy conjunction of large, "facet-specific" color changes and a low ratio of the number of pixels with a prediction-compatible Optical-Flow (OF) vector relative to the total number of pixels within a facet of the 3D-polyhedral vehicle model. Experimental results for the entire approach are presented.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/226Integrating Face-ID into an Interactive Person-ID Learning System2019-06-05T13:14:16+00:00Stephan Könnojs.ub@uni-bielefeld.deHartwig Holzapfelojs.ub@uni-bielefeld.deHazim Kemal Ekenelojs.ub@uni-bielefeld.deAlex Waibelojs.ub@uni-bielefeld.deAcquiring knowledge about persons is a key functionality for humanoid robots. By envisioning a robot that can provide personalized services the system needs to detect, recognize and memorize information about specific persons. To reach this goal we present an approach for extensible person identification based on visual processing, as one component of an interactive system able to interactively acquire information about persons. This paper describes an approach for face-ID recognition and identification over image sequences and its integration into the interactive system. We compare the approach of sequence hypotheses against results from single image hypotheses, and a standard approach and show improvements in both cases. We furthermore explore the usage of confidence scores to allow other system components to estimate the accuracy of face-ID hypotheses.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/227Intelligent modification for the daltonization process of digitized paintings2019-06-05T13:14:15+00:00Christos-Nikolaos Anagnostopoulosojs.ub@uni-bielefeld.deGeorge Tsekourasojs.ub@uni-bielefeld.deIoannis Anagnostopoulosojs.ub@uni-bielefeld.deChristos Kalloniatisojs.ub@uni-bielefeld.deDaltonization is a procedure for adapting colors in an image or a sequence of images for improving the color perception by a color-deficient viewer. In this paper an intelligent/enhanced daltonization method for individuals suffering from protanopia is proposed. The algorithm implements logical image masking in order to modify the colors that are confused and to preserve those colors that are perceived correctly. The proposed method modifies iteratively the parameters for image daltonization after the provision of the initial conditions. The distinctive characteristic of the proposed approach is that when it is combined with a color-checking module, optimum daltonization parameters are effectively identified. Examples are provided in details, as well as screenshots from the algorithm when it is applied in digitized paintings/artworks.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/228Learning Responses to Visual Stimuli: A Generic Approach2019-06-05T13:14:14+00:00Liam Ellisojs.ub@uni-bielefeld.deRichard Bowdenojs.ub@uni-bielefeld.deA general framework for learning to respond appropriately to visual stimulus is presented. By hierarchically clustering percept-action exemplars in the action space, contextually important features and relationships in the perceptual input space are identified and associated with response models of varying generality. Searching the hierarchy for a set of best matching percept models yields a set of action models with likelihoods. By posing the problem as one of cost surface optimisation in a probabilistic framework, a particle filter inspired forward exploration algorithm is employed to select actions from multiple hypotheses that move the system toward a goal state and to escape from local minima. The system is quantitatively and qualitatively evaluated in both a simulated shape sorter puzzle and a real-world autonomous navigation domain.2007-12-31T00:00:00+00:00Copyright (c) https://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/230Machine Perception using a Blackboard Architecture2019-06-05T13:14:13+00:00Tim P Guhlojs.ub@uni-bielefeld.deMurray P. Shanahanojs.ub@uni-bielefeld.deHere we present ongoing research in the application of symbolic reasoning to perception in general and vision in particular. Perception is treated as the combination of the possibly contradictory outputs of many specialized processes which communicate via a blackboard data structure. It is demonstrated that our design allows for bottom-up, horizontal and top-down information flow. Significant progress towards the analysis of unstructured scenes has been made. The principles involved have been explored experimentally and preliminary results are presented.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/231Maximum-Likelihood Stereo Correspondence using Field Programmable Gate Arrays2019-06-05T13:14:12+00:00Siraj Sabihuddinojs.ub@uni-bielefeld.deW. James MacLeanojs.ub@uni-bielefeld.deEstimation of depth within an imaged scene can be formulated as a stereo correspondence problem. Typical software approaches tend to be too slow for real time performance on high frame rate (>= 30fps) stereo acquisition systems. Hardware implementations of these same algorithms allow for parallelization, providing a marked improvement in performance. This paper will explore one such hardware implementation of a maximum-likelihood stereo correspondence algorithm on a Field Programmable Gate Array (FPGA). The proposed ""FastTrack"" hardware implementation is a first stage prototype that demonstrates comparable results to an equivalent software implementation. with the advantage of high-speed (eventually up to 200fps) stereo depth estimation.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/232Monitoring surrounding areas of truck-trailer combinations2019-06-05T13:14:11+00:00Tobias Ehlgenojs.ub@uni-bielefeld.deTomas Pajdlaojs.ub@uni-bielefeld.deDrivers of trucks and buses are not able to survey the surrounding area of their vehicles. In this paper a system is presented that provides a bird's-eye view of the surrounding area of a truck-trailer combination to the driver. This view enables the driver to maneuver the vehicle easily in complicated environments. The system consists of four omnidirectional cameras mounted on a truck and trailer. The omnidirectional images are combined in such a way that a bird's-eye view image is generated. A sensor measures the angle between truck and trailer, thus the bird's-eye view image is constructed appropriate to this angle.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/233Navigation of Nonholonomic Mobile Robot Using Visual Potential Field2019-06-05T11:13:14+00:00Naoya Ohnishiojs.ub@uni-bielefeld.deAtsushi Imiyaojs.ub@uni-bielefeld.deIn this paper, we develop an algorithm for the navigation of a nonholonomic mobile robot using the visual potential. The robot is equipped with a camera system which dynamically captures the environment. The visual potential is computed from an image sequence and optical flow computed from successive images captured by the camera mounted on the robot. Our robot selects a local pathway using the visual potential computed from its vision system without any knowledge of a robot workspace. We present experimental results of the obstacle avoidance in the real environment.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/234On-line Learning-based Object Tracking Using Boosted Features2019-06-05T11:13:13+00:00Bogdan Kwolekojs.ub@uni-bielefeld.deThe most informative and hard to classify examples are close to the decision boundary between object of interest and background. Gentle AdaBoost built on regression stumps focuses on hard examples that provide most new information during object tracking. They contribute to better learning of the classifier while tracking the object. The tracker is compared to recently proposed algorithm that uses on-line appearance models. The performance of the algorithm is demonstrated on freely available test sequences. The resulting algorithm runs in real-time.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/235Online Learning of Objects and Faces in an Integrated Biologically Motivated Architecture2019-06-05T11:13:12+00:00Heiko Wersingojs.ub@uni-bielefeld.deStephan Kirsteinojs.ub@uni-bielefeld.deMichael Goettingojs.ub@uni-bielefeld.deHolger Brandlojs.ub@uni-bielefeld.deMark Dunnojs.ub@uni-bielefeld.deInna Mikhailovaojs.ub@uni-bielefeld.deChristian Goerickojs.ub@uni-bielefeld.deJochen Steilojs.ub@uni-bielefeld.deHelge Ritterojs.ub@uni-bielefeld.deEdgar Koernerojs.ub@uni-bielefeld.deWe present a biologically motivated integrated vision system that is capable of online learning of several objects and faces in a unified representation. The training is unconstrained in the sense that arbitrary objects can be freely presented in front of a stereo camera system and labeled by speech input. We combine biological principles such as appearance-based representation in topographical feature detection hierarchies and context-driven transfer between different levels of object memory. The learning is driven by interactively sharing attention between user and system. It is fully online and avoids an artificial separation of the interaction into training and test phases.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/241Registering Conventional Images with Low Resolution Panoramic Images2019-06-05T11:13:05+00:00Fadi Dornaikaojs.ub@uni-bielefeld.deThis paper addresses the problem of registering high-resolution, small field-of-view images with low-resolution panoramic images provided by an panoramic catadioptric video sensor. Such systems may find application in surveillance and telepresence systems that require a large field of view and high resolution at selected locations. Although image registration has been studied in more conventional applications, the problem of registering panoramic and conventional video has not previously been addressed, and this problem presents unique challenges due to (i) the extreme differences in resolution between the sensors (more than a 16:1 linear resolution ratio in our application), and (ii) the resolution inhomogeneity of panoramic images. The main contributions of this paper are as follows. First, we introduce our foveated panoramic sensor design. Second, we describe an automatic and near real-time registration between the two image streams. This registration is based on minimizing the intensity discrepancy allowing the direct recovery of both the geometric and the photometric transforms. Registration examples using the developed methods are presented.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/243Robust Registration of Long Sport Video Sequence2019-06-05T11:13:02+00:00Guojun Liuojs.ub@uni-bielefeld.deXianglong Tangojs.ub@uni-bielefeld.deDa Sunojs.ub@uni-bielefeld.deJianhua Huangojs.ub@uni-bielefeld.deAutomatic registration plays an important role for a sport analysis system, the automation and accuracy of the registration for a long video sequence can still be an open problem for many practical applications. We propose a novel method to cope with it: (1) Reference frames can be introduced as a transaction of computing homography to map each frame of the imagery to the globally consistent model of the rink, that can reduce the accumulative error of successive registration and make the system more automatic. (2) An more distinctive invariant point feature (SIFT) can be used to provide reliable and robust matching across large range of affine distortion and change in illumination, that can improve the computational precision of homography. Experimental results show that the proposed algorithm is very efficient and effective on video recorded live by the authors in the World Short Track Speed Skating Championships.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/245Simultaneously Reconstructing Transparent and Opaque Surfaces from Texture Images2019-06-05T11:13:00+00:00Mohamad Ivan Fananyojs.ub@uni-bielefeld.deItsuo Kumazawaojs.ub@uni-bielefeld.deThis paper addresses the problem of reconstructing non-overlapping transparent and opaque surfaces from multiple view images. The reconstruction is attained through progressive refinement of an initial 3D shape by minimizing the error between the images of the object and the initial 3D shape. The challenge is to simultaneously reconstruct both the transparent and opaque surfaces given only a limited number of images. Any refinement methods can theoretically be applied if analytic relation between pixel value in the training images and vertices position of the initial 3D shape is known. This paper investigates such analytic relations for reconstructing opaque and transparent surfaces. The analytic relation for opaque surface follows diffuse reflection model, whereas for transparent surface follows ray tracing model. However, both relations can be converged for reconstruction both surfaces into texture mapping model. To improve the reconstruction results several strategies including regularization, hierarchical learning, and simulated annealing are investigated.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/247Stochastic Attentional Selection and Shift on the Visual Attention Pyramid2019-06-05T11:12:58+00:00Masayasu Atsumiojs.ub@uni-bielefeld.deThis paper proposes a computational model of visual attention which performs stochastic attentional selection and shift on the visual attention pyramid that is computed for each image frame of a video sequence. In this model, the visual attention pyramid is generated according to the rareness criteria by using intensity contrast, saturation contrast, hue contrast, orientation and motion energy on a Gaussian resolution pyramid. On this attention pyramid, stochastic attentional selection and shift is performed on mechanisms of the dynamic maintenance of IOR(Inhibition Of Return), the bottom-up spatial attention and the adaptive competitive filtering of attention. Experimental results show that this model achieves stochastic visual pop-out to artificial pop-out targets and stochastic attentional selection and shift, especially the whole-part attention shift and the motion-follow attention, in daily scenes.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/248Subunit Boundary Detection for Sign Language Recognition Using Spatio-temporal Modelling2019-06-05T11:12:56+00:00Junwei Hanojs.ub@uni-bielefeld.deGeorge Awadojs.ub@uni-bielefeld.deAlistair Sutherlandojs.ub@uni-bielefeld.deThe use of subunits offers a feasible way to recognize sign language with large vocabulary. The initial step is to partition signs into elementary units. In this paper, we firstly define a subunit as one continuous hand action in time and space, which comprises a series of interrelated consecutive frames. Then, we propose a solution to detect the subunit boundary according to spatio-temporal features using a three-stage hierarchy: in the first stage, we apply hand segmentation and tracking algorithm to capture motion speeds and trajectories; in the second stage, the obtained speed and trajectory information are combined to locate subunit boundaries; finally, temporal clustering by dynamic time warping (DTW) is adopted to merge similar segments and refine the results. The presented work does not need prior knowledge of the types of signs and is robust to signer behaviour variation. Moreover, it can provide a base for high-level sign language understanding. Experiments on many real-world signing videos show the effectiveness of the proposed work.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/249Supervised Pixel-Based Texture Classification with Gabor Wavelet Filters2019-06-05T11:12:55+00:00Jaime Christian Melendezojs.ub@uni-bielefeld.deMiguel Angel Garciaojs.ub@uni-bielefeld.deDomenec Puigojs.ub@uni-bielefeld.deThis paper proposes an efficient technique for pixel-based texture classification based on multichannel Gabor wavelet filters. The proposed technique is general enough to be applicable to other texture feature extraction methods that also characterize the texture around image pixels through feature vectors. During the training stage, a clustering technique is applied in order to compute a suitable set of prototypes that model every given texture pattern. Multisize evaluation windows are also utilized for improving the accuracy of the classifier near boundaries between regions of different texture. Experimental results with Brodatz compositions show the benefits of the proposed scheme in contrast with alternative approaches in terms of efficiency, memory and classification rates.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/250SVM-based Transfer of Visual Knowledge Across Robotic Platforms2019-06-05T11:12:54+00:00Jie Luoojs.ub@uni-bielefeld.deAndrzej Pronobisojs.ub@uni-bielefeld.deBarbara Caputoojs.ub@uni-bielefeld.deThis paper presents an SVM--based algorithm for the transfer of knowledge across robot platforms aiming to perform the same task. Our method exploits efficiently the transferred knowledge while updating incrementally the internal representation as new information is available. The algorithm is adaptive and tends to privilege new data when building the SV solution. This prevents the old knowledge to nest into the model and eventually become a possible source of misleading information. We tested our approach in the domain of vision-based place recognition. Extensive experiments show that using transferred knowledge clearly pays off in terms of performance and stability of the solution.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/251Task and Context aware Performance Evaluation of Computer Vision Algorithms2019-06-05T11:12:53+00:00Wolfgang Ponweiserojs.ub@uni-bielefeld.deMarkus Vinczeojs.ub@uni-bielefeld.deDeveloping a robust computer vision algorithm is very difficult because of the enormous variation of visual conditions. A systems technology solution to this challenge is an automatic selection and configuration of different existing algorithms according to the task and context of arbitrary applications. This paper presents a first attempt to generate the required mapping between the task/context to the optimal algorithm and algorithm configuration. This mapping is based on an extensive performance evaluation. To practically handle the exhaustive search for optimal solutions a new optimization challenge the Multiple-Multi Objective Optimization (M-MOP) and an according solution based on genetic algorithms is developed and evaluated. The results show the robustness of the approach and guide further development towards an automatic vision system generation.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/252Toward robust foveated wide field of view people detection2019-06-05T11:12:53+00:00Zoran Zivkovicojs.ub@uni-bielefeld.deBen Kröseojs.ub@uni-bielefeld.deWe present foveated vision system for rapid and robust person detection. The system consists of an omidirectional camera for people detection in a wide field of view and a pan-tilt camera that can focus on a particular location. Combining the information from both cameras leads to more reliable people detection. The people detection is based on fast human body part detectors and a probabilistic model of the spatial arrangement of the parts. The model can be extended to multiple cameras. The representation is robust to partial occlusions, part detector false alarms and missed detections of body parts. We also show how to use the fact that the persons walk on a known ground plane to increase the efficiency and reliability of the detection. The detection does not rely on static background and the system is suitable for mobile platforms.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/253Towards a Human-like Vision System for Resource-Constrained Intelligent Cars2019-06-05T11:12:52+00:00Thomas Michalkeojs.ub@uni-bielefeld.deAlexander Gepperthojs.ub@uni-bielefeld.deMartin Schneiderojs.ub@uni-bielefeld.deJannik Fritschojs.ub@uni-bielefeld.deChristian Goerickojs.ub@uni-bielefeld.deResearch on computer vision systems for driver assistance resulted in a variety of approaches mainly performing reactive tasks like, e.g., lane keeping. However, for a full understanding of generic traffic situations, integrated and more flexible approaches are needed. We present a system inspired by the human visual system. Based on combining task-dependent tunable visual saliency, an object recognizer, and a tracker it provides warnings in dangerous situations.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/254Tracking Nuclear Material at Low Frame Rate and Numerous False Detections2019-06-05T11:12:51+00:00Paolo Lombardiojs.ub@uni-bielefeld.deCristina Versinoojs.ub@uni-bielefeld.deIn Nuclear Safeguards, surveillance cameras monitor the correct processing of nuclear material. Nuclear inspectors are faced with tens of thousands of images to review, of which less than 1% is significant. Besides the reduction of the standard two-frame differencing filter, we further limit the image set to review by tracking on the distribution of image time-stamps. Traditional visual tracking cannot be applied, owing to the low frame rate, and the need for compatibility with the standard change detection algorithm. Our algorithm is based on a HMM model of the nuclear process, and handles multiple flasks and observations available only when the flasks are moved. The model makes use of descriptive statistics of the durations of processing stages to refine the HMM predictions.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/255View Independent Face Detection Based on Combination of Local and Global Kernels2019-06-05T11:12:50+00:00Kazuhiro Hottaojs.ub@uni-bielefeld.deIn this paper, local and global kernels are combined to use the detailed and rough similarities simultaneously. In recent years, many recognition methods based on local features were proposed. However, the combination of only local matching is not sufficient. Global viewpoint is also necessary to improve the generalization ability. Local feature matching measures the detailed similarity and global feature matching measures the rough similarity. Therefore, the error pattern is different in local and global features. If they are combined well, the generalization ability is improved. In the proposed method, local kernels and global kernel are combined by summation, and the combined kernel is used in SVM. The proposed method is applied to view independent face detection task. We confirm that the false positive is reduced by combining local and global kernels. The effectiveness of the proposed method is demonstrated by the comparison with only global and local kernel.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/256Visual Person Searches for Retail Loss Detection : Application and Evaluation2019-06-05T11:12:49+00:00Andrew W Seniorojs.ub@uni-bielefeld.deL. Brownojs.ub@uni-bielefeld.deC.-F. Shuojs.ub@uni-bielefeld.deY.-L. Tianojs.ub@uni-bielefeld.deM. Luojs.ub@uni-bielefeld.deY. Zhaiojs.ub@uni-bielefeld.deA. Hampapurojs.ub@uni-bielefeld.deWe describe a novel computer-vision based system for facilitating the search for people across multiple non-overlapping cameras. The system has been applied in a retail environment most specifically for returns fraud prevention. The system detects and tracks people in multiple cameras and enables rapid cross-camera association of tracks. The system has been tested in a real store environment and we present results with a breakdown of error types.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/257Visual Quality Control in Heat Shrink Tubing2019-06-05T11:12:21+00:00Alexander Barthojs.ub@uni-bielefeld.deRainer Herpersojs.ub@uni-bielefeld.deMarkus Greßnichojs.ub@uni-bielefeld.deIn this contribution a machine vision inspection system is presented which is designed as a length measuring sensor. It is developed to be applied to a range of heat shrink tubes, varying in length, diameter and color. The challenges of this task were the precision and accuracy demands as well as the real-time applicability of the entire approach since it should be realized in regular industrial line production. In production, heat shrink tubes are cut to specific sizes from a continuous tube. A multi-measurement strategy has been developed, which measures each individual tube segment several times with sub pixel accuracy while being in the visual field. The developed approach allows for a contact-free and fully automatic control of 100% of produced heat shrink tubes according to the given requirements with a measuring precision of 0.1mm. Depending on the color, length and diameter of the tubes considered, a true positive rate of 99.99% to 100% has been reached at a true negative rate of > 99.7.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/182A Three-Level Computational Attention Model2019-06-05T13:16:05+00:00Matei Mancasojs.ub@uni-bielefeld.deBernard Gosselinojs.ub@uni-bielefeld.deBenoît Macqojs.ub@uni-bielefeld.deThis article deals with a biologically-motivated three-level computational attention model architecture based on the rarity and the information theory framework. It mainly focuses on a low-level step which aims in fastly highlighting important areas and a middle-level step which analyses the behaviour of the detected areas. Their application on both still images and videos provide results to be used by the third high-level step.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/187An Attention Based Method For Motion Detection And Estimation2019-06-05T13:15:38+00:00Shijie Zhangojs.ub@uni-bielefeld.deFred Stentifordojs.ub@uni-bielefeld.deThe demand for automated motion detection and object tracking systems has promoted considerable research activity in the field of computer vision. A novel approach to motion detection and estimation based on visual attention is proposed in the paper. Two different thresholding techniques are applied and comparisons are made with Black's motion estimation technique based on the measure of overall derived tracking angle. The method is illustrated on various video data and results show that the new method can extract both motion and shape information.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/192Attention Based Auto Image Cropping2019-06-05T13:15:32+00:00Fred Stentifordojs.ub@uni-bielefeld.deMany images contain salient regions that are surrounded by too much uninteresting background material and are not as enlightening as a sensibly cropped version. The choice of the best picture window both at capture time and during subsequent processing is normally subjective and a wholly manual task. This paper proposes a method of automatically cropping visual material based upon a new measure of visual attention that reflects the informativeness of the image.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/193Attentional Robot Localization and Mapping2019-06-05T13:15:31+00:00Simone Frintropojs.ub@uni-bielefeld.dePatric Jensfeltojs.ub@uni-bielefeld.deHenrik I. Christensenojs.ub@uni-bielefeld.deIn this paper, we introduce an application of visual attention in the field of robotics: attentional visual SLAM (Simultaneous Localization and Mapping). A biologically motivated attention system finds regions of interest which serve as visual landmarks for the robot. The regions are tracked and matched over consecutive frames to build stable landmarks and to estimate the 3D position of the landmarks in the environment. Furthermore, matching of current landmarks to database entries enables loop closing and global localization. Additionally, the system is equipped with an active camera control, which supports the system with a tracking, a re-detection, and an exploration behaviour.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/202Computational Attention for Defect Localisation2019-06-05T13:15:21+00:00Matei Mancasojs.ub@uni-bielefeld.deDevrim Unayojs.ub@uni-bielefeld.deBernard Gosselinojs.ub@uni-bielefeld.deBenoît Macqojs.ub@uni-bielefeld.deThis article deals with a biologically-motivated three-level computational attention model architecture based on the rarity and the information theory framework. It mainly focuses on a low-level step and its application in pre-attentive defect localisation for apple quality grading and tumour localisation for medical images.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/203Computational Attention for Event Detection2019-06-05T13:15:20+00:00Matei Mancasojs.ub@uni-bielefeld.deLaurent Couvreurojs.ub@uni-bielefeld.deBernard Gosselinojs.ub@uni-bielefeld.deBenoît Macqojs.ub@uni-bielefeld.deThis article deals with a biologically-motivated three-level computational attention model architecture based on the rarity and the information theory framework. It mainly focuses on low-level and medium-level steps and their application in pre-attentive detection of tumours in CT scans and unusual events in audio recordings.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/204Control of Attention by Nonconscious Information: Do Intentions Play a Role?2019-06-05T13:15:19+00:00Ingrid Scharlauojs.ub@uni-bielefeld.deThe present study explores the deployment of attention towards nonconscious information. It is both theoretically and empirically likely that the deployment of attention can be controlled by information which is not consciously registered (attentional priming), similar to the control of sensorimotor responses by nonconscious information (response priming). However, not much is known about the functional basis of attentional priming. The present experiment explore whether and how strongly intentions (current action pans) determine whether attention is allocated towards invisible information (so called direct parameter specification). The results demonstrate that intention-mediated control is possible, but it seems to break down easily, that is to provide a weak and non-robust type of control.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/208Dynamic Visual Attention: competitive versus motion priority scheme2019-06-05T13:14:34+00:00Alexandre Burojs.ub@uni-bielefeld.dePascal Wurtzojs.ub@uni-bielefeld.deRené M. Müriojs.ub@uni-bielefeld.deHeinz Hügliojs.ub@uni-bielefeld.deDefined as attentive process in presence of visual sequences, dynamic visual attention responds to static and motion features as well. For a computer model, a straightforward way to integrate these features is to combine all features in a competitive scheme: the saliency map contains a contribution of each feature, static and motion. Another way of integration is to combine the features in a motion priority scheme: in presence of motion, the saliency map is computed as the motion map, and in absence of motion, as the static map. In this paper, four models are considered: two models based on a competitive scheme and two models based on a motion priority scheme. The models are evaluated experimentally by comparing them with respect to the eye movement patterns of human subjects, while viewing a set of video sequences. Qualitative and quantitative evaluations, performed in the context of simple synthetic video sequences, show the highest performance of the motion priority scheme, compared to the competitive scheme.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/237Pop-out and IOR in Static Scenes with Region Based Visual Attention2019-06-05T11:13:09+00:00Muhammad Zaheer Azizojs.ub@uni-bielefeld.deBärbel Mertschingojs.ub@uni-bielefeld.deThis paper proposes a novel approach to construct the saliency map by combining region-based maps of distinct features. The multiplication style feature fusion process in the natural visual attention is modelled as weighted average of the features under influence of the external top-down and the internal bottom-up inhibitions. The recently discovered aspect of feature-based inhibition is also included in the procedure of IOR along with the commonly implemented spatial and feature-map based inhibitions. Results obtained from the proposed method are compatible with the well known attention models but with the advantages of faster computation, direct usability of focus of attention in machine vision, and broader coverage of visually prominent objects.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/238Presentation Agents That Adapt to Users' Visual Interest and Follow Their Preferences2019-06-05T11:13:08+00:00Arjen Hoekstraojs.ub@uni-bielefeld.deHelmut Prendingerojs.ub@uni-bielefeld.deNikolaus Beeojs.ub@uni-bielefeld.deDirk Heylenojs.ub@uni-bielefeld.deMitsuru Ishizukaojs.ub@uni-bielefeld.deThis research proposes an interactive presentation system that employs eye gaze as an intuitive and unobtrusive input modality. Eye movements are an excellent clue to users' attention, visual interest, and preference. By analyzing and interpreting eye behavior in real-time, our system can adapt to the current (visual) interest state of the user, and thus provide a more personalized and 'attentive' experience of the presentation. The system implements a virtual presentation room, where research content is presented by a team of two highly realistic 3D agents in a dynamic and interactive way. A small preliminary study was conducted to investigate users' gaze behavior with a non-interactive version of the system. A demo video based on our system was awarded as the best application of life-like agents at the GALA event in 2006.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/244Salient Visual Features to Help Close the Loop in 6D SLAM2019-06-05T11:13:01+00:00Lars Kunzeojs.ub@uni-bielefeld.deKai Lingemannojs.ub@uni-bielefeld.deAndreas Nüchterojs.ub@uni-bielefeld.deJoachim Hertzbergojs.ub@uni-bielefeld.deOne fundamental problem in mobile robotics research is _Simultaneous Localization and Mapping_ (SLAM): A mobile robot has to localize itself in an unknown environment, and at the same time generate a map of the surrounding area. One fundamental part of SLAM algorithms is loop closing: The robot detects whether it has reached an area that has been visited before, and uses this information to improve the pose estimate in the next step. In this work, visual camera features are used to assist closing the loop in an existing 6 degree of freedom SLAM (6D SLAM) architecture. For our robotics application we propose and evaluate several detection methods, including salient region detection and maximally stable extremal region detection. The detected regions are encoded using SIFT descriptors and stored in a database. Loops are detected by matching of the images' descriptors. A comparison of the different feature detection methods shows that the combination of salient and maximally stable extremal regions suggested by Newman and Ho performs moderately.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/210Exploratory Learning Strucutre in Artificial Cognitive Systems2019-06-05T13:14:32+00:00Michael Felsbergojs.ub@uni-bielefeld.deJohan Wiklundojs.ub@uni-bielefeld.deErik Jonssonojs.ub@uni-bielefeld.deAnders Moeojs.ub@uni-bielefeld.deGösta Granlundojs.ub@uni-bielefeld.deOne major goal of the COSPAL project is to develop an artificial cognitive system architecture with the capability of exploratory learning. Exploratory learning is a strategy that allows to apply generalization on a conceptual level, resulting in an extension of competences. Whereas classical learning methods aim at best possible generalization, i.e., concluding from a number of samples of a problem class to the problem class itself, exploration aims at applying acquired competences to a new problem class. Incremental or online learning is an inherent requirement to perform exploratory learning. Exploratory learning requires new theoretic tools and new algorithms. In the COSPAL project, we mainly investigate reinforcement-type learning methods for exploratory learning and in this paper we focus on its algorithmic aspect. Learning is performed in terms of four nested loops, where the outermost loop reflects the user-reinforcement-feedback loop, the intermediate two loops switch between different solution modes at symbolic respectively sub-symbolic level, and the innermost loop performs the acquired competences in terms of perception-action cycles. We present a system diagram which explains this process in more detail. We discuss the learning strategy in terms of learning scenarios provided by the user. This interaction between user ('teacher') and system is a major difference to many existing systems where the system designer places his world model into the system. We believe that this is the key to extendable robust system behavior and successful interaction of humans and artificial cognitive systems. We furthermore address the issue of bootstrapping the system, and, in particular, the visual recognition module. We give some more in-depth details about our recognition method and how feedback from higher levels is implemented. The described system is however work in progress and no final results are available yet. The available preliminary results that we have achieved so far, clearly point towards a successful proof of the architecture concept.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/219Gaze shift reflex in a humanoid active vision system2019-06-05T13:14:23+00:00Ansgar R. Koeneojs.ub@uni-bielefeld.deJan Morénojs.ub@uni-bielefeld.deVlad Trifaojs.ub@uni-bielefeld.deGordon Chengojs.ub@uni-bielefeld.deFull awareness of sensory surroundings requires active attentional and behavioural exploration. In visual animals, visual, auditory and tactile stimuli elicit gaze shifts (head and eye movements) aimed at optimising visual perception of stimuli. Such gaze shifts can either be top-down attention driven (e.g. visual search) or they can be reflex movements triggered by unexpected changes in the surroundings. Here we present a model active vision system with focus on multi-sensory integration and the generation of desired gaze shift commands. Our model is based on recent data from studies of primate superior colliculus and is developed as part of the sensory-motor control of the humanoid robot CB.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/223Implicit Modeling of Object Topology with Guidance from Temporal View Attention2019-06-05T13:14:19+00:00Peter Michael Goebelojs.ub@uni-bielefeld.deMarkus Vinczeojs.ub@uni-bielefeld.deObject recognition developed to the most common approach of detecting arbitrary objects based on their appearance. However, viewpoint dependency, occlusions, algorithmic constraints, and noise are hindrances for proper object detection from a single view. As blob based segmentation cannot support learning and understanding of the object under consideration, contour based approaches are more prospective. As a consequence of aforementioned obstacles, objects are segmented often partly with more or less drop outs in contour that yields poor recognition performance. Since recognition of the "yet unknown" by the mammalian brain is supported by curiosity and experimental willingness, unknown objects are observed at least from a number of different viewpoints. These different views are considered by cognitive processes, yielding an implicit view of the object under observation. It is the objective of this paper to present an approach based on findings from biological studies and cognitive science, which enables the cognitive investigation of natural scenes and their further cognitive understanding. We proposed in another paper the architecture and a simulation of the first five bottom layers implementing the striate visual cortex as the first level of cognitive modeling of behaviors. In this work we focus on the aggregation layer, which forms object prototypes from geon recipes. The proposed implementation is exemplified again with the Necker cube.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/229Lifting Wavelet Based Cognitive Vision System2019-06-05T13:14:13+00:00Yuka Higashijimaojs.ub@uni-bielefeld.deShigeru Takanoojs.ub@uni-bielefeld.deKoichi Niijimaojs.ub@uni-bielefeld.deThis paper presents a cognitive vision system based on the learning of lifting wavelets. The learning process consists of four steps: 1. Extract training and query object images automatically from adjacent video frames using our proposed cosine-maximization method; 2. Compute autocorrelation vectors from the extracted training images, and their discriminant vectors by linear discriminant analysis; 3. Map the autocorrelation vectors onto the discriminant vector space to obtain feature vectors; 4. Learn lifting parameters in the feature vectors using the idea of discriminant analysis. The recognition of a query object is performed by measuring cosine distance between its feature vector and the feature vectors for training object images. Our experimental results on vehicle types recognition show that the proposed system performs better than the discriminant analysis of original images.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/236Open-Ended Inference of Relational Representations in the COSPAL Perception-Action Architecture2019-06-05T11:13:11+00:00David Windridgeojs.ub@uni-bielefeld.deJosef Kittlerojs.ub@uni-bielefeld.deThe COSPAL architecture for autonomous artifical cognition utilises incremental perception-action learning in order to generate hierarchically-grounded abstract representations of an agent's environment on the basis of its action capabilities. We here give an overview of the top-level relational module of this architecture. The first stage of the process hence involves the application of ILP to attempted action outcomes in order to determine the set of generalised rule protocols governing actions within the agent's environment (initially defined via an a priori low-level representation). In the second stage, imposing certain constraints on legitimate first-order logic induction permits a compact reparameterisation of the percept space such that novel perceptual-capabilities are always correlated with novel action capabilites. We thereby define a meaningful empirical criterion for perceptual inference. Novel perceptual capabilities are of a higher abstract order than the a priori environment representation, allowing more sophisticated exploratory action to be taken. Gathering of further exploratory data for rule induction hence takes place in an iterative cycle. Application of this mechanism within a simulated shape-sorter puzzle environment indicates that this approach significantly accelerates learning of the correct environment model.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedingshttps://biecoll.ub.uni-bielefeld.de/index.php/icvs/article/view/246Spatio-Temporal Reasoning for Reliable Facial Expression Interpretation2019-06-05T11:12:59+00:00Javier Orozcoojs.ub@uni-bielefeld.deF. Alejandro Garcíaojs.ub@uni-bielefeld.deJosep Lluis Arcosojs.ub@uni-bielefeld.deJordi Gonzàlezojs.ub@uni-bielefeld.deUnderstanding human behaviours and emotions has received contributions from image analysis and pattern recognition techniques in order to tackle this challenge. The most popular facial expression classifiers deal with eyebrows and lips while avoiding eyelid motion. According to psychologists, eye motion is relevant for trust and deceit analysis as well for dichotomizing near facial expressions. Unlike previous approaches, we include the eyelid motion by constructing an appearance-based tracker (ABT). Subsequently, a Case-Based Reasoning (CBR) approach is applied by training a case-base with seven facial actions. We classify new facial expressions with respect to previous solutions, previously assessing confidence for the proposed solutions. Therefore, the proposed system yields efficient classification rates comparable to the best previous facial expression classifiers. The ABT and CBR combination provides trusty solutions by evaluating the confidence of the solution quality for eyebrows, mouth and eyes. Consequently, this method is robust and accurate for facial motion coding, and for confident classifications. The training is progressive, the quality of the solution increases with respect to previous solutions and do not need re-training processes.2007-12-31T00:00:00+00:00Copyright (c) 2023 International Conference on Computer Vision Systems : Proceedings