The Fourth International Workshop on Cooperative Distributed Vision

Presentations by Invited Speakers

Multi-camera Tracking and Visualization for Surveillance and Sports

Robert T. Collins (The Robitics Institue, CMU, USA)

We present two systems developed at the Robotics Institute of Carnegie Mellon University (CMU) that use multiple active cameras to track and display objects moving through an outdoor scene. The Video Surveillance and Monitoring (VSAM) system uses a network of smart sensors to perform campus surveillance. Using a cost-based scheduling approach, multiple sensors are automatically tasked to cooperatively track objects over long distances and through occlusion. Multi-sensor fusion algorithms use viewpoint-independent descriptors to combine these object hypotheses into coherent 2D and 3D graphical visualizations of the dynamic scene. The second system presented is designed for sports broadcasting. Multiple cameras surrounding a stadium form a master-slave servo system that allows a single cameraman to track a player simultaneously from many viewpoints. Playing back frames from one time step across a sequence of cameras gives the appearance of moving around the action while it is frozen in time. This heightens the viewer's ability to perceive the 3D spatial relationships between players, the ball, and field markers.

An Object Recognition System Using Local Image Features of Intermediate Complexity

David G. Lowe (Computer Science Department, University of British Columbia, Canada)

This talk will describe a system for 3D object recognition that uses a large number of local image features of intermediate complexity. Each feature is invariant to location, scale, and image orientation, and partially invariant to illumination and 3D viewpoint. Recent research in neuroscience has shown that object recognition in primates also depends on intermediate-complexity features with similar invariance properties. The features are used as input to a nearest-neighbour learning approach that identifies candidate object matches. Final verification is achieved by finding a low-residual least-squares solution for the unknown model parameters. Unlike most other approaches to object recognition, the recognition requires no prior segmentation and is unaffected by background clutter. Experimental results show that robust object recognition can be achieved under a broad range of real-world imaging conditions with a computation time of about one second.

Image and Video-Based Modeling and Rendering

Richard Szeliski (Vision-Based Modeling Group, Microsoft Research, USA)

Obtaining photo-realistic geometric and photometric models is an important component of image-based rendering systems that use real-world imagery as their input. Applications of such systems include novel view generation and the mixing of live imagery with synthetic computer graphics. In this talk, I review a number of image-based representations (and their associated reconstruction algorithms) we have developed in the last few years. I begin by reviewing some recent approaches to the classic problem of recovering a depth map from two or more images. I then describe some of our newer representations and reconstruction algorithms, including volumetric representations, layered plane-plus-parallax representations (including the recovery of transparent and reflected layers), and multiple depth maps. Each of these techniques has its own strengths and weaknesses, which I will address. I will also present our work in video-based rendering, in which we synthesize novel video from short sample clips by discovering their (quasi-repetive) temporal structure.