Third International Workshop on
Cooperative Distributed Vision
Presentations by Invited Speakers
Aaron F. Bobick*,
Stephen Intille** and Yuri Ivanov**
(* Georgia Institute of Technology, USA,
** M.I.T. Media Laboratory, USA)
Many distributed vision applications concern the recognition of
multi-agent action. Unlike the activity of single-agents over short
time scales, multi-agent action is typically loosely defined by
statistical tendencies, requires certain causal interactions, and
evolves over extended periods of time. Recognition techniques
designed to observe single-agent action (such as hidden Markov models)
are unlikely to succeed in these situations. Here we present two
approaches to the statistical recognition of multi-agent action. The
first is based upon stochastic parsing of parallel event streams.
This method is useful where there are a priori definitions of actions
involving a small number of agents, but where the detection of
individual elements is uncertain. The fundamental idea is to divide
the recognition task into the two levels of statistical detection of
underlying features and structural parsing of those detections. The
second approach relies on the uncertain integration of confirming
evidence of large scale coordinated activity, such as a team executing
a particular football play. In presentation and comparison of these
two techniques we will attempt characterize multi-agent action
recognition problems in terms of being structural or statistical, and
in terms of their spatial-temporal rigidity.
Larry S. Davis, Eugene Borovikov, Ross Cutler, and Thanarat Horprasert
(University of Maryland, USA)
We describe research being conducted in the University of Maryland
Keck Laboratory for the Analysis of Visual Motion. The Keck
Laboratory is a multi-perspective computer vision Laboratory
containing sixty four digital, progressive scan cameras (forty eight
monochromatic and sixteen single CCD color) configured into sixteen
groups of four cameras. Each group of four is a quadranocular stereo
rig consisting of three monochromatic and one color camera. The
cameras are attached to a network of sixteen PCs used for both
data collection and real time video analysis.
We first describe the architecture of the system in detail, and then
present two applications:
Real time multi-perspective tracking of body parts for motion capture.
We have developed a real time 3D motion capture system that integrates
images from a large number of color cameras to both detect and track
human body parts in 3D. A preliminary version of this system
(developed in collaboration with ATR Media Integration &
Communications Research Laboratories and the M.I.T. Media Laboratory)
was demonstrated at SIGGRAPH98. That version, based on the W4
system for visual surveillance developed in our laboratory: Detected
people by background modeling and background subtraction, Found body
parts in each image via shape analysis of foreground regions,
Triangulated those body parts using robust triangulation procedures,
and then Smoothed the 3D body part trajectories and predicted
locations of those parts in each view using a lightweight version of
the dynamical models developed by Chris Wren and Sandy Pentland from
M.I.T. Media Laboratory.
Animated the captured motion using graphical models developed by with
ATR Media Integration & Communications Research Laboratories. We
describe improved versions of the background modeling and tracking
components of that system Real-time volume intersection. Models of
human shape can also be constructed using volume intersection methods.
Here, we use the same background modeling and subtraction methods as
in our motion capture system, but then utilize parallel and
distributed algorithms for constructing an oct-tree representation of
the volume of the person being observed. Details of this algorithm
will be described.
Simone Santini and Ramesh Jain (PRAJA Inc., USA)
Presence Technology (PT) is targeted to the needs of people who want to
be part of a remote, live environment. Presence systems blend component
technologies like computer vision, signal understanding, heterogeneous
sensor fusion, live-media delivery, telepresence, databases, and
multimedia information systems into a novel set of functionality that
enables the user to perceive, move around, enquire about, and interact
with the remote, live environment through her reception and control
devices. PT creates the opportunity to perform different tasks: watch an
event, tour and explore a location, meet and communicate with others,
monitor the environment for a potential situation, perform a query on the
perceived objects and events, and recreate past observations.
Technically, the framework offers computer-mediated access to
multi-sensory information in an environment, integrates the sensory
information into a situation model of the environment, and delivers, at
the user's request, the relevant part of the assimilated information
through a multimodal interface.
PT is an extension of the Multiple Perspective Interactive Video project
at the Visual Computing Laboratory, University of California, San Diego.
In this paper we will present results from PRAJA Presence system
implemented to bring an early version of PT for different application.
We will present a demo of this system to explain different technical
components of the system.
Monique Thonnat and Nathanael Rota (INRIA Sophia Antipolis, France)
This paper presents recent work on behavior analysis.
More precisely the objective is to
infer high level description of human behavior from image
The main motivation of this research activity is to automatically
generate alarms towards human operators when interesting scenarios
have been recognized by the system.
First, after a presentation of the state of the art in this domain,
we introduce the general scheme of our approach which is based on
the use of predefined scenarios and a priori contextual information.
Secondly, we detail the current low level image processing techniques
used for mobile object detection. Third we describe
the role of a priori contextual information and different ways
of representing this information. Then we address the problem of
high level description of mobile object behavior using generic observable
events and application dependent scenarios.
Finally results obtained on different visual surveillance
applications in the European Esprit projects Passwords and AVS-PV
are shown and discussed.
This papers concludes with future works for enhancing the robustness
of such image understanding systems and to improve their capabilities to be