The Fourth International Workshop on
Cooperative Distributed Vision
Presentations by Invited Speakers
Robert T. Collins
(The Robitics Institue, CMU, USA)
We present two systems developed at the Robotics Institute of
Carnegie Mellon University (CMU) that use multiple active cameras
to track and display objects moving through an outdoor scene.
The Video Surveillance and Monitoring
(VSAM) system uses a network of smart sensors to perform
campus surveillance.
Using a cost-based scheduling approach, multiple sensors are
automatically tasked to cooperatively track objects over long distances
and through occlusion.
Multi-sensor fusion algorithms use viewpoint-independent descriptors
to combine these object hypotheses into coherent
2D and 3D graphical visualizations of the dynamic scene.
The second system presented is designed for sports broadcasting.
Multiple cameras surrounding a stadium form a master-slave servo system
that allows a single cameraman to track a player simultaneously
from many viewpoints.
Playing back frames from one time step across a sequence of cameras
gives the appearance of moving around the action while it
is frozen in time. This heightens the viewer's ability to perceive
the 3D spatial relationships between players, the ball, and field
markers.
David G. Lowe
(Computer Science Department, University of British Columbia, Canada)
This talk will describe a system for 3D object recognition that uses a
large number of local image features of intermediate complexity. Each
feature is invariant to location, scale, and image orientation, and
partially invariant to illumination and 3D viewpoint. Recent research
in neuroscience has shown that object recognition in primates also
depends on intermediate-complexity features with similar invariance
properties. The features are used as input to a nearest-neighbour
learning approach that identifies candidate object matches. Final
verification is achieved by finding a low-residual least-squares
solution for the unknown model parameters. Unlike most other
approaches to object recognition, the recognition requires no prior
segmentation and is unaffected by background clutter. Experimental
results show that robust object recognition can be achieved under a
broad range of real-world imaging conditions with a computation
time of about one second.
Richard Szeliski
(Vision-Based Modeling Group, Microsoft Research, USA)
Obtaining photo-realistic geometric and photometric models is an important
component of image-based rendering systems that use real-world imagery as
their input. Applications of such systems include novel view generation
and the mixing of live imagery with synthetic computer graphics. In this
talk, I review a number of image-based representations (and their
associated reconstruction algorithms) we have developed in the last few
years. I begin by reviewing some recent approaches to the classic problem
of recovering a depth map from two or more images. I then describe some of
our newer representations and reconstruction algorithms, including
volumetric representations, layered plane-plus-parallax representations
(including the recovery of transparent and reflected layers), and multiple
depth maps. Each of these techniques has its own strengths and weaknesses,
which I will address. I will also present our work in video-based
rendering, in which we synthesize novel video from short sample clips by
discovering their (quasi-repetive) temporal structure.