Browsing by Author "Li, Hai HL"
- Results Per Page
- Sort Options
Item Open Access ADVANCING VISION INTELLIGENCE THROUGH THE DEVELOPMENT OF EFFICIENCY, INTERPRETABILITY AND FAIRNESS IN DEEP LEARNING MODELS(2024) Kong, FanjieDeep learning has demonstrated remarkable success in developing vision intelligence across a variety of application domains, including autonomous driving, facial recognition, medical image analysis, \etc.However, developing such vision systems poses significant challenges, particularly in relation to ensuring efficiency, interpretability, and fairness. Efficiency requires a model to leverage the least possible computational resources while preserving performance relative to more computationally-demanding alternatives, which is essential for the practical deployment of large-scale models in real-time applications. Interpretability demands a model to align with the domain-specific knowledge of the task it addresses while having the capability for case-based reasoning. This characteristic is especially crucial in high-stakes areas such as healthcare, criminal justice, and financial investment. Fairness ensures that computer vision models do not perpetuate or exacerbate societal biases in downstream applications such as web image search, text-guided image generation, \etc. In this dissertation, I will discuss the contributions that I have made in advancing vision intelligence regarding to efficiency, interpretability and fairness in computer vision models.
The first part of this dissertation will focus on how to design computer vision models to efficiently process very large images.We propose a novel CNN architecture termed { \em Zoom-In Network} that leverages a hierarchical attention sampling mechanisms to select important regions of images to process. Such approach without processing the entire image yields outstanding memory efficiency while maintaining classification accuracy on various tiny object image classification datasets.
The second part of this dissertation will discuss how to build post-hoc interpretation method for deep learning models to obtain insights reasoned from the predictions.We propose a novel image and text insight-generation framework based on attributions from deep neural nets. We test our approach on an industrial dataset and demonstrate our method outperforms competing methods.
Finally, we study fairness in large vision-language models.More specifically, we examined gender and racial bias in text-based image retrieval for neutral text queries. In an attempt to address bias in the test-time phase, we proposed post-hoc bias mitigation to actively balance the demographic group in the image search results. Experiments on multiple datasets show that our method can significantly reduce bias while maintaining satisfactory retrieval accuracy at the same time.
My research in enhancing vision intelligence via developments in efficiency, interpretability, and fairness, has undergone rigorous validation using publicly available benchmarks and has been recognized at leading peer-reviewed machine learning conferences.This dissertation has sparked interest within the AI community, emphasizing the importance of improving computer vision models through these three critical dimensions, namely, efficiency, interpretability and fairness.
Item Open Access Power-efficient Spiking Neuromorphic Designs using CMOS and Emerging Devices(2024) Li, ZiruThe artificial intelligence (AI) algorithms have played critical roles in a variety of application scenarios in our daily life. The size of state-of-the-art large-scale AI models widely adopted in different domains have been proliferating to tens of billions of parameters. The dedicated AI hardware tailored for data-intensive and computation-intensive AI algorithms consume tremendous power due to data transmission of model parameters and massive computation. The solutions to boosting the power efficiency of AI hardware are two-fold. On the one hand, continuous research effort have been paid to search for more efficient computing paradigm of neural networks. For instance, the bio-inspired neuromorphic computing paradigm stems from the investigation of the natural neural system. The neuromorphic spiking-neural-networks (SNNs) emulate the human brain which transmits information efficiently through spike events. On the other hand, hardware designers have been seeking architecture- and circuit-level solutions to reducing the memory access and computation costs. Processing-in-memory (PIM) paradigm, which is one of the promising solutions, eliminates the power and latency of data transmission by performing data operations directly within the memory.
In this dissertation, my research work on power-efficient neuromorphic designs will be introduced. These neuromorphic designs harness the spike-based data processing and in-memory-computing paradigm. With the help of architecture-level techniques and dedicated circuits with CMOS and emerging memory devices, the proposed designs achieve significant improvement in terms of power efficiency and performance.