Adaptive, Context-Aware Assessment and Guidance for Augmented Reality

Loading...

Date

2025

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

2
views
1
downloads

Attention Stats

Abstract

Augmented Reality (AR) has shown great promise across a range of applications, including gaming, education, and healthcare. To deliver immersive user experiences, AR systems must accurately interpret the surrounding environment to properly position and render virtual content. This necessitates adaptive scene understanding techniques, such as object detection (OD), to improve the human-perceived visual coherence of AR scenes. However, assessing and enhancing visual coherence typically relies on labor-intensive human user studies, which slow the AR development cycle. Therefore, achieving immersive AR experiences requires not only adaptive scene understanding but also scalable, human-perceived assessment and content adjustment mechanisms.

To this end, this dissertation focuses on (1) improving the AR scene understanding through adaptive guided data acquisition and test-time domain adaptation under changing conditions; and (2) accelerating the AR development circle by enabling scalable assessment of human-perceived visual coherence and building adaptive, context-aware, self-improving AR systems.

We start with our work on exploiting guided data acquisition for AR scene understanding. We design a real-time data importance estimation method and integrate it into BiGuide, a bi-level image data acquisition system we create for OD tasks. BiGuide assesses the informativeness and diversity of captured images and dynamically guides users in collecting important data via image-level and object instance-level guidance. We prototype BiGuide in an edge-based architecture using commodity smartphones as mobile clients, and evaluate its performance via an IRB-approved study with 20 users. Our evaluation demonstrates that OD models trained on the data collected by BiGuide outperform models trained on the data collected by two baseline systems, achieving detection accuracy improvements of up to 33.07% and 14.57%, respectively. Over 85% of the users found BiGuide fast, helpful, and easy to understand and follow.

We then describe our efforts on performing adaptive and robust scene understanding in changing environments. We introduce the test-time adaptation coordinator (TACo) for OD, which (1) reduces computational overhead via a lightweight pseudo-label refinement method that determines when adaptation is necessary; and (2) enhances model robustness to domain shifts using a dynamic domain bank guided by our proposed ``good-help-bad'' strategy. Across three public datasets, TACo reduces computation by up to 60% while maintaining or improving detection performance compared to baselines. We further extend our analysis to 3D OD under domain shift using visual-inertial simultaneous localization and mapping (VI-SLAM) data, exploring how scene and object characteristics impact performance. To address privacy concerns in edge–cloud AR deployments, we also propose Infoscissors, a practical defense mechanism for collaborative inference that effectively limits information leakage.

Finally, we present our work on assessing and enhancing human-perceived visual coherence in AR scenes with vision-language models (VLMs). To support this effort, we curate two new datasets, DiverseAR+ and RateAR, comprising AR images and videos captured under diverse visual and environmental conditions. We evaluate four commercial VLMs on their ability to perceive and describe AR scenes from a human-centric perspective using only 2D stimuli, achieving a True Positive Rate (TPR) of up to 94% for perception and 86% for description. We then assess five commercial VLMs' capabilities in evaluating key visual coherence factors (placement, scale, and shadow), achieving strong alignment with human ratings: Spearman’s rank-order correlation coefficients up to 0.89, Pearson linear correlation coefficient up to 0.84, and Kendall’s rank-order correlation coefficients up to 0.78. Building on these insights, we develop a self-improving AR content adjustment system and conduct a user study with 21 participants, collecting pre- and post-adjustment ratings. Over 90% of the users found this system helpful in improving placement and size coherence or in assisting with virtual content setup.

Description

Provenance

Subjects

Electrical engineering, Augmented Reality, Domain Adaptation, Machine Learning, Object Detection, Quality Assessment, Vision Language Model

Citation

Citation

Duan, Lin (2025). Adaptive, Context-Aware Assessment and Guidance for Augmented Reality. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/34116.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.