Abstract
We propose an object detection system that uses the locations
of tracked low-level feature points as input, and produces
a set of independent coherent motion regions as output.
As an object moves, tracked feature points on it span
a coherent 3D region in the space-time volume defined by
the video. In the case of multi-object motion, many possible
coherent motion regions can be constructed around the
set of all feature point tracks. Our approach is to identify
all possible coherent motion regions, and extract the subset
that maximizes an overall likelihood function while assigning
each point track to at most one motion region. We
solve the problem of finding the best set of coherent motion
regions with a simple greedy algorithm, and show that
our approach produces semantically correct detections and
counts of similar objects moving through crowded scenes.