We are interested in studying and developing vision-based systems that can navigate and interact with complex, uncontrolled and dynamic environments. A short summary of the most relevant projects that we have developed in this direction is available here below.


Computational Photography
Beyond pixels
picture of a camera
Computational photography is a new emerging field that combines computing, digital sensor design, and controlled illumination to enable novel imaging applications. One can enhance the dynamic range of the sensor, digitally vary focus, resolution, and depth of field, analyze reflectance and lighting. These are only some of the applications that are made possible by computational photography and that we are interesting in exploring.

Our light field camera and the superresolution project

Supervised Classification
Transformation Invariance
picture of stop signs
The classification of objects in images requires invariance to changes in their scale, orientation, location, and in their photometry, due to varying illumination and noise. One way to address such classification problem is to incorporate invariance to these changes at run-time. This approach however is limited by the discretization of the class of transformations and by the number of independent transformations that require invariance. Rather, we consider incorporating invariance during training. We avoid enlarging the training set by adding so-called virtual samples, i.e., samples obtained by morphing the original data set for a finite set of transformations, and instead incorporate invariance in the classification error function used for training. Given a sample, one can compute in analytic form the error of a classifier against a distribution of a set of transformations. This is, for example, the approach used in Vicinal Risk minimization and we formulate it in the boosting framework.

Real-Time Structure from Motion and Virtual Object Insertion
Robustness to outliers
picture of a virtual object
Interacting with a complex, unknown, dynamic environment requires continuously updated knowledge of its shape and motion and robustness to unmodeled events. We propose several algorithms aimed at inferring shape, motion and appearance causally and incrementally. Once motion and shape are accurately inferred, then one can insert computer graphics objects or animations in the real scene in real-time.

3D Estimation and Image Restoration
Exploting defocus and motion-blur
example of defocused images
Images contain several cues that allow us to infer spatial properties of the scene being depicted. For example, texture, shading, shadows, silhouettes are all pictorial cues that, when coupled with suitable prior assumptions, allow one to infer spatial properties of the scene. In addition to pictorial cues, which are present in one single image, one can exploit cues that make use of multiple images. One such a cue is defocus, where images are captured by changing the focus setting of the camera. By doing so, objects at different depth appear blurred in different ways. Similarly, objects moving with different motion produce different motion-blurred images, and therefore images can be exploited for the inverse problem of inferring shape and motion.

Shape, Reflectance and Illumination Estimation
Non-Lambertian surfaces
example of BRDF
Measuring the 3d surface of a scene is challenging because images vary greatly with the material and the geometry of the surfaces, the illumination conditions of the scene and the geometry of the sensor. The main challenge is in establishing correspondence between regions in the images that are projections of the same object. One way to simplify this problem is to assume that all objects are Lambertian, i.e. such that their appearance does not change with the vantage point. The immediate consequence of such assumption is that correspondence can be established by direct comparison of images, thus bypassing the estimation of light sources and reflectance of the surfaces. However, in nature a large number of materials are non Lambertian. For instance, polished metal, plastic, porcelain, marble, skin and varnished surfaces are non Lambertian. We propose to exploit occlusion boundaries to recover the illumination of the scene and the reflectance of the objects, as they provide strong geometric cues that are robust to changes in the appearance due to camera motion.

Segmentation of Dynamic Textures
example of segmentation
Natural events, such as smoke, flames or water waves, exhibit motion patterns whose complexity can hardly be captured by tracking every pixel. What one could do, for instance, is to capture the stochastic properties of such events. One way to do so, is to employ the so-called dynamic textures. Furthermore, video sequences might simultaneously depict multiple natural events that are "stochastically homogeneous" only within a certain unknown region. Finding such regions is the problem of dynamic texture segmentation.

Retinal Imaging: Monocular or Stereo Fundus Camera?
monocular fundus camera
Glaucoma progression causes optic nerve fibers atrophy and changes in the three-dimensional (3D) shape of the optic disc. The most common clinical routine for glaucoma patients is the examination of the optic nerve head. To this purpose one can employ two-dimensional (2D) or 3D imaging modalities. 2D imaging can be provided by monocular fundus cameras, while 3D imaging can be provided, for example, by stereo fundus cameras, Heidelberg scanning laser tomography, scanning laser polarimetry, and optical coherence tomography. We consider only monocular and stereo fundus cameras, which are the most accessible 3D imaging systems. Furthermore, as 2D imaging is based on 2D visual cues, which may be prone to misjudgement, we focus our analysis on 3D imaging only. We show that traditional monocular systems do not provide depth information of the retina and hence they cannot be used in place of stereo systems. We also experimentally determine the maximum accuracy achievable by a stereo system and show its suitability for assessing glaucoma.