ADVANCING VISION INTELLIGENCE THROUGH THE DEVELOPMENT OF EFFICIENCY, INTERPRETABILITY AND FAIRNESS IN DEEP LEARNING MODELS

Loading...
Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

4
views
63
downloads

Abstract

Deep learning has demonstrated remarkable success in developing vision intelligence across a variety of application domains, including autonomous driving, facial recognition, medical image analysis, \etc.However, developing such vision systems poses significant challenges, particularly in relation to ensuring efficiency, interpretability, and fairness. Efficiency requires a model to leverage the least possible computational resources while preserving performance relative to more computationally-demanding alternatives, which is essential for the practical deployment of large-scale models in real-time applications. Interpretability demands a model to align with the domain-specific knowledge of the task it addresses while having the capability for case-based reasoning. This characteristic is especially crucial in high-stakes areas such as healthcare, criminal justice, and financial investment. Fairness ensures that computer vision models do not perpetuate or exacerbate societal biases in downstream applications such as web image search, text-guided image generation, \etc. In this dissertation, I will discuss the contributions that I have made in advancing vision intelligence regarding to efficiency, interpretability and fairness in computer vision models.

The first part of this dissertation will focus on how to design computer vision models to efficiently process very large images.We propose a novel CNN architecture termed { \em Zoom-In Network} that leverages a hierarchical attention sampling mechanisms to select important regions of images to process. Such approach without processing the entire image yields outstanding memory efficiency while maintaining classification accuracy on various tiny object image classification datasets.

The second part of this dissertation will discuss how to build post-hoc interpretation method for deep learning models to obtain insights reasoned from the predictions.We propose a novel image and text insight-generation framework based on attributions from deep neural nets. We test our approach on an industrial dataset and demonstrate our method outperforms competing methods.

Finally, we study fairness in large vision-language models.More specifically, we examined gender and racial bias in text-based image retrieval for neutral text queries. In an attempt to address bias in the test-time phase, we proposed post-hoc bias mitigation to actively balance the demographic group in the image search results. Experiments on multiple datasets show that our method can significantly reduce bias while maintaining satisfactory retrieval accuracy at the same time.

My research in enhancing vision intelligence via developments in efficiency, interpretability, and fairness, has undergone rigorous validation using publicly available benchmarks and has been recognized at leading peer-reviewed machine learning conferences.This dissertation has sparked interest within the AI community, emphasizing the importance of improving computer vision models through these three critical dimensions, namely, efficiency, interpretability and fairness.

Description

Provenance

Citation

Citation

Kong, Fanjie (2024). ADVANCING VISION INTELLIGENCE THROUGH THE DEVELOPMENT OF EFFICIENCY, INTERPRETABILITY AND FAIRNESS IN DEEP LEARNING MODELS. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30822.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.