Human-in-the-Loop Robot Planning with Non-Contextual Bandit Feedback

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



In this paper, we consider robot navigation problems in environments populated by humans. The goal is to determine collision-free and dynamically feasible trajectories that also maximize human satisfaction, by ensuring that robots are available to assist humans with their work as needed and avoid actions that cause discomfort. In practice, human satisfaction is subjective and hard to describe mathematically. As a result, the planning problem we consider in this paper may lack important contextual information. To address this challenge, we propose a semi-supervised Bayesian Optimization (BO) method to design globally optimal robot trajectories using bandit human feedback, in the form of complaints or satisfaction ratings, that expresses how desirable a trajectory is. Since trajectory planning is typically a high-dimensional optimization problem in the space of waypoints that need to be decided, BO may require prohibitively many queries for human feedback to return a good solution. To this end, we use an autoencoder to reduce the high-dimensional space into a low dimensional latent space, which we update using human feedback. Moreover, we improve the exploration efficiency of BO by biasing the search for new trajectories towards dynamically feasible and collision-free trajectories obtained using off-the-shelf motion planners. We demonstrate the efficiency of our proposed trajectory planning method in a scenario with humans that have diversified and unknown demands.






Zhou, Yijie (2020). Human-in-the-Loop Robot Planning with Non-Contextual Bandit Feedback. Master's thesis, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.