Extended Subwindow Search and Pictorial Structures
In computer vision, the pictorial structure model represents an object in an image by parts that are arranged in a deformable configuration. Each part describes an object's local photometric appearance, and the configuration encodes the global geometric layout. This model has been very successful in recent object recognition systems.
We extend the pictorial structure model in three aspects. First, when the model contains only a single part, we develop new methods ranging from regularized subwindow search, nested window search, to twisted window search, for handling richer priors and more flexible shapes. Second, we develop the notion of a weak pictorial structure, as opposed to the strong one, for the characterization of a loose geometric layout in a rotationally invariant way. Third, we develop nested models to encode topological inclusion relations between parts to represent richer patterns.
We show that all the extended models can be efficiently matched to images by using dynamic programming and variants of the generalized distance transform, which computes the lower envelope of transformed cones on a dense image grid. This transform turns out to be important for a wide variety of computer vision tasks and often accelerates the computation at hand by an order of magnitude. We demonstrate improved results in either quality or speed, and sometimes both, in object matching, saliency measure, online and offline tracking, object localization and recognition.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations