Inter-pixel Modeling Framework for Deep Learning-based Image Understanding
Abstract
Image understanding, as a mid to high-level task in computer vision, aims to enable machines to understand and interpret visual information from images or videos. Common image understanding tasks include image classification, object detection, image segmentation, object tracking etc. In recent years, the performance of image understanding algorithms has been significantly advanced by deep learning. Currently, most researchers focus on improving the performance of networks by studying the network internal modules and structures while putting less attention on the problem modeling side of deep learning. In this thesis, based on three classic and important image understanding image understanding tasks (image and medical segmentation, salient object detection, and camouflaged object detection), we will primarily explore the issue of inter-pixel relationship modeling in deep learning from the network modeling perspective, while also considering network internal structural design to efficiently and effectively improve the performance of the current deep learning methods in the fields of medical image and natural image understanding. In Section 1, we went through the current progress of deep learning-based image understanding and the specific design for existing inter-pixel modeling methods in deep learning. In Section 2, we developed a deep connectivity modeling method for general image and medical segmentation. Based on this method, in Section 2.2, we proposed a novel framework called BiconNet to incorporate connectivity modeling into existing segmentation networks for different applications. In Sections 2.3, we utilized BiconNet for salient object detection and conducted extensive experiments on public dataset to demonstrate its effectiveness. In Section 2.4, we further extended BiconNet to the field of medical segmentation by incorporating it with a medical segmentation network for esophageal OCT segmentation. Through comprehensive experiments, we demonstrated the potential of our model in detecting Barrett’s esophagus. In Section 3, we further extended the idea of original connectivity modeling and developed directional connectivity modeling. We proposed a novel deep learning framework called DconnNet for general medical image segmentation. We conducted experiments on several important medical tasks and datasets, including Retinal OCT fluid segmentation (Section 3.3), skin lesion segmentation (Section 3.4), and vessel segmentation (Section 3.5). In all tasks, we demonstrated the superior performance of DconnNet over existing medical segmentation networks. In Section 4, we developed a new framework, SDCTrans, for Microbial Keratitis (MK) Biomarkers Segmentation on Slit-Lamp Photography. SDCTrans is a novel work by incorporating directional connectivity modeling into transformer-based backbone. It also contains a self-knowledge distillation module to further improve the performance of directional features flow. We collected and published a large-scale fine-annotated MK dataset. Through comprehensive experiments, we demonstrated the superiority of the proposed SDCTrans over current state-of-the-art models. We also show that our SDCTrans matches, if not outperforms, the performance of expert human graders in MK biomarker identification and visual acuity outcome estimation. In Section 5, we proposed a novel inter-pixel response-aware loss function, Spatial Coherence Loss (SCLoss), that incorporates the mutual response between adjacent pixels into the widely used single-response loss functions, for salient and camouflaged object detection. Through comprehensive experiments, we demonstrate that replacing popular loss functions with SCLoss can improve the performance of current state-of-the-art salient or camouflaged object detection models. We also demonstrate that combining SCLoss with other loss functions can further improve performance and result in SOTA outcomes for different applications. In conclusion, this dissertation provides a deep learning framework for modeling inter-pixel relationships for general image understanding. The proposed methods can easily be incorporated with existing works in deep learning or be functioned as an individual network. In either case, we demonstrated their superiority over existing methods or networks in medical segmentation, salient object detection, and camouflaged image detection. Furthermore, with the comprehensive studies on medical applications, we demonstrated the potential of the framework in detecting and monitoring crucial diseases such as Diabetic macular edema, Barrett’s esophagus, and Microbial Keratitis, leading to timely and precise medical decisions, ultimately improving patient outcomes.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Yang, Ziyun (2024). Inter-pixel Modeling Framework for Deep Learning-based Image Understanding. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32589.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.