Visual Recognition Models for Scenes of Partial Object Occlusion
Date
2025
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Visual recognition models, including deep neural networks such as convolutional neural networks and Vision Transformers, achieve near-human classification accuracy on general image recognition datasets, but this accuracy has been shown to be greatly reduced when images contain partial occlusion, or coverage of classifiable objects from view of a camera.We create the Image Recognition Under Occlusion (IRUO) dataset and benchmark, a large-scale image benchmark. IRUO contains tens of thousands of images with labeled levels of real partial occlusion, comparisons of both modern general-purpose deep learning models and models with methods to address partial occlusion, and a human study with 20 participants to estimate maximum possible accuracy on occluded images. We find that Vision Transformer-based models outperform convolutional models on occluded images, and existing methods to address occlusion robustness have severe limitations on a large and diverse dataset such as IRUO. However, Vision Transformer models demonstrate limitations compared to humans when classifying images containing diffuse occlusion, or occlusion that is sparse and discontinuous in space. We create efficient prepended embeddings designed to filter out diffuse occlusion, achieving better classification accuracy than base Vision Transformer models on images containing diffuse occlusion, while minimizing computational overhead compared to these models.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Kassaw, Kaleb (2025). Visual Recognition Models for Scenes of Partial Object Occlusion. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32731.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.