From Labeled to Unlabeled Data: Understand Deep Visual Representations under Causal Lens

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Deep vision models have been highly successful in various computer vision applications such as image classification, segmentation, and object detection. These models encode visual data into low-dimensional representations, which are then utilized in downstream tasks. Typically, the most accurate models are fine-tuned using fully labeled data, but this approach may not generalize well to different applications. Self-supervised learning has emerged as a potential solution to this issue, where the deep vision encoder is pretrained with unlabeled data to learn more generalized representations. However, the underlying mechanism governing the generalization and specificity of representations seeks more understanding. Causality is an important concept in visual representation learning as it can help improve the generalization of models by providing a deeper understanding of the underlying relationships between features and objects in the visual world.

Through works presented in this dissertation, we provide a causal interpretation of the mechanism underlying deep vision models' ability to learn representations in both labeled and unlabeled environments and improve the generalization and the specificity of extracted representations through the interpreted causal factors. Specifically, we tackle the problem from 4 aspects: Causally Interpret Supervised Deep Vision Models; Supervised Learning with Underlabeled Data; Self-supervised Learning with Unlabeled Data; Causally Understand Unsupervised Visual Representation Learning.

Firstly, we interpret the prediction of a deep vision model by identifying causal pixels in the input images via 'inversing' the model weights. Secondly, we recognise the challenges of learning an accurate object detection model with missing labels in the dataset and we address this underlabel data issue by adapting positive-unlabeled learning approach instead of the positive-negative approach. Thirdly, we focus on improving both generalization and specificity of unsupervised representations based on prior causal relations; Finally, we enhance the stability of the unsupervised representations during the inference by intervening data variables under a well constructed causal framework.

We establish a causal relationship between deep vision models and their input/output for different applications with (partially) labeled data, and strengthen generalized representations through extensive analytical understanding of unsupervised representation learning under various hypothesized causal frameworks.





Yang, Yuewei (2023). From Labeled to Unlabeled Data: Understand Deep Visual Representations under Causal Lens. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.