ADVANCING VISION INTELLIGENCE THROUGH THE DEVELOPMENT OF EFFICIENCY, INTERPRETABILITY AND FAIRNESS IN DEEP LEARNING MODELS

dc.contributor.advisor

Henao, Ricardo RH

dc.contributor.advisor

Li, Hai HL

dc.contributor.author

Kong, Fanjie

dc.date.accessioned

2024-06-06T13:44:22Z

dc.date.available

2024-06-06T13:44:22Z

dc.date.issued

2024

dc.department

Electrical and Computer Engineering

dc.description.abstract

Deep learning has demonstrated remarkable success in developing vision intelligence across a variety of application domains, including autonomous driving, facial recognition, medical image analysis, \etc.However, developing such vision systems poses significant challenges, particularly in relation to ensuring efficiency, interpretability, and fairness. Efficiency requires a model to leverage the least possible computational resources while preserving performance relative to more computationally-demanding alternatives, which is essential for the practical deployment of large-scale models in real-time applications. Interpretability demands a model to align with the domain-specific knowledge of the task it addresses while having the capability for case-based reasoning. This characteristic is especially crucial in high-stakes areas such as healthcare, criminal justice, and financial investment. Fairness ensures that computer vision models do not perpetuate or exacerbate societal biases in downstream applications such as web image search, text-guided image generation, \etc. In this dissertation, I will discuss the contributions that I have made in advancing vision intelligence regarding to efficiency, interpretability and fairness in computer vision models.

The first part of this dissertation will focus on how to design computer vision models to efficiently process very large images.We propose a novel CNN architecture termed { \em Zoom-In Network} that leverages a hierarchical attention sampling mechanisms to select important regions of images to process. Such approach without processing the entire image yields outstanding memory efficiency while maintaining classification accuracy on various tiny object image classification datasets.

The second part of this dissertation will discuss how to build post-hoc interpretation method for deep learning models to obtain insights reasoned from the predictions.We propose a novel image and text insight-generation framework based on attributions from deep neural nets. We test our approach on an industrial dataset and demonstrate our method outperforms competing methods.

Finally, we study fairness in large vision-language models.More specifically, we examined gender and racial bias in text-based image retrieval for neutral text queries. In an attempt to address bias in the test-time phase, we proposed post-hoc bias mitigation to actively balance the demographic group in the image search results. Experiments on multiple datasets show that our method can significantly reduce bias while maintaining satisfactory retrieval accuracy at the same time.

My research in enhancing vision intelligence via developments in efficiency, interpretability, and fairness, has undergone rigorous validation using publicly available benchmarks and has been recognized at leading peer-reviewed machine learning conferences.This dissertation has sparked interest within the AI community, emphasizing the importance of improving computer vision models through these three critical dimensions, namely, efficiency, interpretability and fairness.

dc.identifier.uri

https://hdl.handle.net/10161/30822

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Computer engineering

dc.subject

Artificial intelligence

dc.subject

Computer science

dc.subject

Artificial Intelligence

dc.subject

Computer Vision

dc.subject

Fairness

dc.subject

Machine Learning

dc.title

ADVANCING VISION INTELLIGENCE THROUGH THE DEVELOPMENT OF EFFICIENCY, INTERPRETABILITY AND FAIRNESS IN DEEP LEARNING MODELS

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kong_duke_0066D_17772.pdf
Size:
4.88 MB
Format:
Adobe Portable Document Format

Collections