# Browsing by Author "Zavlanos, Michael M"

###### Results Per Page

###### Sort Options

Item Open Access A NEW ZEROTH-ORDER ORACLE FOR DISTRIBUTED AND NON-STATIONARY LEARNING(2021) Zhang, YanZeroth-Order (ZO) methods have been applied to solve black-box or simulation-based optimization prroblems. These problems arise in many important applications nowa- days, e.g., generating adversarial attacks on machine learning systems, learning to control the system with complicated physics structure or human in the loop. In these problem settings, the objective function to optimize does not have an explicit math- ematical form and therefore its gradient cannot be obtained. This invalidates all gradient-based optimization approaches. On the other hand, ZO methods approxi- mate the gradient by using the objective function values. Many existing ZO methods adopt the two-point feedback scheme to approximate the unknown gradient due to its low estimation variance and fast convergence speed. Specifically, two-point ZO method estimates the gradient at the current iterate of the algorithm by querying the objective function value twice at two distinct neighbor points around the current iterate. Such scheme becomes infeasible or difficult to implement when the objective function is time-varying, or when multiple agents collaboratively optimize a global objective function that depends on all agents’ decisions, because the value of the objective function can be queried only once at a single decision point. However, con- ventional ZO method based on one-point feedback is subject to large variance of the gradient estimation and therefore slows down the convergence.

In this dissertation, we propose a novel one-point ZO method based on the residual feedback. Specifically, the residual feedback scheme estimates the gradient using the residual between the values of the objective function at two consecutive iterates of the algorithm. When optimizing a deterministic Lipschitz function, we show that the query complexity of ZO with the proposed one-point residual feedback matches that of ZO with the existing two-point schemes. Moreover, the query complexity of the proposed algorithm can be improved when the objective function has Lipschitz gradient. Then, for stochastic bandit optimization problems, we show that ZO with one-point residual feedback achieves the same convergence rate as that of two-point scheme with uncontrollable data samples.

Next, we apply the proposed one-point residual-feedback gradient estimator to solve online optimizaiton problems, where the objective function varies over time. In the online setting, since each objective function can only be evaluated once at a single decision point, existing two-point ZO methods are not feasible and only one-point ZO methods can be used. We develop regret bounds for ZO with the proposed one- point residual feedback scheme for both convex and nonconvex online optimization problems. Specifically, for both deterministic and stochastic problems and for both Lipschitz and smooth objective functions, we show that using residual feedback can produce gradient estimates with much smaller variance compared to conventional one- point feedback methods. As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems. Additionally, our regret bounds rely on weaker assumptions than those used in conventional one-point feedback methods.

The proposed residual-feedback scheme is next decentralized to conduct dis- tributed policy optimization in the multi-agent reinforcement learning (MARL) prob- lems. Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multi-hop neighbors may incur significant communication overhead. The advantage of the pro- posed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, the local ZO policy gradients relying on one-point residual-feedback significantly reduces the vari- ance of the local policy gradient estimates compared to the conventional one-point policy gradient estimates, improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to a neighborhood of the global optimal policy that depends on the number of consensus steps used to calculate the local estimates of the global accumulated rewards.Another challenge in the distributed ZO optimization problems is that the agents may conduct local updates in an asynchronous fashion when they do not have ac- cess to a global clock. To deal with this challenge, we propose an asynchronous zeroth-order distributed optimization method that relies on the proposed one-point residual feedback gradient estimator. We show that this estimator is unbiased under asynchronous updating, and theoretically analyze its convergence. We demonstrate the effectiveness of all proposed algorithms via extensive numerical experiments.

Item Embargo Adaptive Planning in Changing Policies and Environments(2023) Sivakumar, Kavinayan PillaiarBeing able to adapt to different tasks is a staple of learning, as agents aim to generalize across different situations. Specifically, it is important for agents to adapt to the policies of other agents around them. In swarm settings, multi-agent sports settings, or other team-based environments, agents learning from one another can save time and reduce errors in performance. As a result, traditional transfer reinforcement learning proposes ways to decrease the time it takes for an agent to learn from an expert agent. However, the problem of transferring knowledge across agents that operate in different action spaces and are therefore heterogeneous poses new challenges. Mainly, it is difficult to translate between heterogeneous agents whose action spaces are not guaranteed to intersect.

We propose a transfer reinforcement learning algorithm between heterogeneous agents based on a subgoal trajectory mapping algorithm. We learn a mapping between expert and learner trajectories that are expressed through subgoals. We do so by training a recurrent neural network on trajectories in a training set. Then, given a new task, we input the expert's trajectory of subgoals into the trained model to predict the optimal trajectory of subgoals for the learner agent. We show that the learner agent is able to learn an optimal policy faster with this predicted trajectory of subgoals.

It is equally important for agents to adapt to the intentions of agents around them. To this end, we propose an inverse reinforcement learning algorithm to estimate the reward function of an agent as it updates its policy over time. Previous work in this field assume the reward function is approximated by a set of linear feature functions. Choosing an expressive enough set of feature functions can be challenging, and failure to do so can skew the learned reward function. Instead, we propose an algorithm to estimate the policy parameters of the agent as it learns, bundling adjacent trajectories together in a new form of behavior cloning we call bundle behavior cloning. Our complexity analysis shows that using bundle behavior cloning, we can attain a tighter bound on the difference between the distribution of the cloned policy and that of the true policy than the same bound achieved in standard behavior cloning. We show experiments where our method achieves the same overall reward using the estimated reward function as that learnt from the initial trajectories, as well as testing the feasibility of bundle behavior cloning with different neural network structures and empirically testing the effect of the bundle choice on performance.

Finally, due to the need for agents to adapt to environments that are prone to change due to damage or detection, we propose the design of a robotic sensing agent to detect damage. In such dangerous environments, it may be unsafe for human operators to manually take measurements. Current literature in structural health monitoring proposes sequential sensing algorithms to optimize the number of locations measurements need to be taken at before locating sources of damage. As a result, the robotic sensing agent we designed is mobile, semi-autonomous, and precise in measuring a location on the model structure we built. We detail the components of our robotic sensing agent, as well as show measurement data taken from our agent at two locations on the structure displaying little to no noise in the measurement.

Item Open Access Decentralized State Estimation using Robotic Sensor Networks(2016) Freundlich, CharlesThis dissertation proposes three control algorithms for active sensing with one or several autonomous robots.

The algorithms all rely on models of the information content of the sensor measurement with respect to the relative poses between sensors and subjects.

The approaches each predict how new information may impact the uncertainty in the subjects, controlling sensors to new locations or trajectories from where these uncertainties will me minimized.

The first algorithm deals with the Next-Best-View (NBV) problem for a single robot, where the goal is to control a mobile camera so that the next image of a set of possibly mobile targets will be as informative as possible.

The NBV controller is designed for a rig that hosts two cameras in a fronto-parallel arrangement, commonly known as stereo vision.

Assuming that the objects, landmarks, or targets being estimated are visible by both cameras in the rig and that these observations are corrupted by zero-mean Gaussian errors, the control algorithm moves the rig through pose space in order to reduce the expected Kalman-filtered uncertainty in the next location point-estimate.

This is done by differentiating the KF output error covariance matrix with respect to the sensor pose, which results in a nonlinear control problem.

The controller is decomposed so that first the robot computes the NBV in coordinates relative to the body-frame of the stereo rig, and then it moves in pose space to realize this view.

When an image is acquired, a switching signal changes the goal of pose control, giving rise to a stable hybrid system.

Experiments of on a real robot localizing targets in a laboratory setting are presented.

The second algorithm addresses the problem of estimating a finite set of hidden state vectors using a mobile robotic sensor network.

For every hidden state that needs to be estimated, a local Dynamic Program (DP) in the joint state-space of robot positions and state uncertainties determines robot paths and associated sequences of state observations that collectively minimize the estimation uncertainty.

It divides the collection of hidden states into clusters based on a prior belief of their geographic locations and, for each cluster, defines a second DP that determines how far along the local optimal trajectories the robot should travel before transitioning to estimating the next hidden state within the cluster.

Finally, a distributed assignment algorithm dynamically allocates controllers to the robot team from the set of optimal control policies at every cluster.

Assuming Gaussian priors on the hidden state vectors, the distributed state estimation method scales gracefully to large teams of mobile robots and hidden vectors and provide extensive simulations and real-world experiments using stereoscopic vision sensors to illustrate the approach.

The third chapter addresses the problem of controlling a network of mobile sensors so that a set of hidden states are estimated up to a user-specified accuracy. The sensors take measurements and fuse them online using an Information Consensus Filter (ICF). At the same time, the local estimates guide the sensors to their next best configuration. This leads to an LMI-constrained optimization problem that we solve by means of a new distributed random approximate projections method. The new method is robust to the state disagreement errors that exist among the robots as the ICF fuses the collected measurements. Assuming that the noise corrupting the measurements is zero-mean and Gaussian and that the robots are self localized in the environment, the integrated system converges to the next best positions from where new observations will be taken. This process is repeated with the robots taking a sequence of observations until the hidden states are estimated up to the desired user-specified accuracy. It presents simulations of sparse landmark localization, where the robotic team is achieves the desired estimation tolerances while exhibiting interesting emergent behavior.

Experiments of the first two algorithms are also presented.

Item Open Access Deep Reinforcement Learning with Temporal Logic Specifications(2018) Gao, QitongIn this thesis, we propose a model-free reinforcement learning method to synthesize control policies for mobile robots modeled by Markov Decision Process (MDP) with unknown transition probabilities that satisfy Linear Temporal Logic (LTL) specifications. The key idea is to employ Deep Q-Learning techniques that rely on Neural Networks (NN) to approximate the state-action values of the MDP and design a reward function that depends on the accepting condition of the Deterministic Rabin Automaton (DRA) that captures the LTL specification. Unlike relevant works, our method does not require learning the transition probabilities in the MDP, constructing a product MDP, or computing Accepting Maximal End Components (AMECs). This significantly reduces the computational cost and also renders our method applicable to planning problems where AMECs do not exist. In this case, the resulting control policies minimize the frequency with which the system enters bad states in the DRA that violate the task specifications. To the best of our knowledge, this is the first model-free deep reinforcement learning algorithm that can synthesize policies that maximize the probability of satisfying an LTL specification even if AMECs do not exist. We validate our method through numerical experiments.

Item Open Access Distributed Intermittent Connectivity Control of Mobile Robot Networks(2018) Kantaros, YiannisWireless communication is known to play a pivotal role in enabling teams of robots to successfully accomplish global coordinated tasks. In fact, network connectivity is an underlying assumption in every distributed control and optimization algorithm. For this reason, in recent years, there is growing research in designing controllers that ensure point-to-point or end-to-end network connectivity for all time. Nevertheless, all these methods severely restrict the robots from accomplishing their tasks, as motion planning is always restricted by connectivity constraints on the network. Instead, a much preferred solution is to enable robots to communicate in an intermittent fashion, and operate in disconnect mode the rest of the time giving rise to an intermittently connected communication network. While in disconnect mode, the robots can accomplish their tasks free of communication constraints. The goal of this dissertation is to design a distributed intermittent connectivity framework that (i) ensures that the communication network is connected over time, infinitely often (ii) is flexible enough to account for arbitrary dynamic tasks, and (iii) can be applied to large-scale networks.

The great challenge in developing intermittent connectivity protocols for networks of mobile robots is to decide (i) which robots talk to which, (ii) where, and (iii) when, so that the communication network is connected over time infinitely often. To address these challenges, we decompose the network into small groups of robots, also called teams, so that every robot belongs to at least one team and that there is a path, i.e., a sequence of teams, where consecutive teams have non-empty intersections, connecting every two teams of robots, so that information can propagate in the network. First, given such fixed teams, we design infinite sequences of communication events for all robots, also called communication schedules, independent of the tasks assigned to the robots, that determine when every team should communicate, so that the communication network is connected over time infinitely often. The designed communication schedules ensure that all teams communicate infinitely often, i.e., that the communication network is connected over time infinitely often. Between communication events the robots can move in the workspace free of communication constraints to accomplish their assigned tasks. Theoretical guarantees and numerical experiments corroborate the proposed framework. This is the first distributed intermittent connectivity framework that can be applied to large-scale networks and is flexible enough to account for arbitrary dynamic robot tasks.

Next, given user-specified fixed teams, we integrate the respective communication schedules with task planning. Specifically, we consider high-level complex tasks captured by temporal logic formulas, state-estimation tasks, and time-critical dynamic tasks. The proposed distributed integrated path planning and intermittent connectivity frameworks determine both where and when every team should communicate so that the assigned task is accomplished, the communication network is connected over time infinitely often, and a user-specified metric, such as total traveled distance or consumed energy, is minimized. We show that employing the proposed intermittent connectivity framework for such tasks results in significant performance gains compared to the existing solutions in the literature that maintain connectivity for all time. Theoretical guarantees, numerical and experimental studies support the proposed distributed control algorithms.

Finally, we propose a fully autonomous intermittent connectivity framework that can handle arbitrary dynamic tasks and also allows the robots to locally and online update the structure of the teams and the communication schedules, effectively allowing them to decide who they should talk to, so that they can better accomplish newly assigned tasks. The structure of the teams, the associated communication locations, and the time instants when communication within teams will occur are integrated online with task planning giving rise to paths, i.e., sequences of waypoints, that ensure that the assigned task is accomplished, the communication network is connected over time infinitely often, and a user specified metric is minimized. This is the first fully autonomous, distributed, and

online intermittent connectivity framework that can handle arbitrary dynamic tasks and also controls the topology of the intermittently connected robot network to better accomplish these tasks. At the same time, the proposed framework scales well with the size of the robot network. Theoretical guarantees and numerical experiments corroborate the proposed distributed control scheme.

Item Open Access Distributed Optimization Algorithms for Networked Systems(2015) Chatzipanagiotis, NikolaosDistributed optimization methods allow us to decompose an optimization problem

into smaller, more manageable subproblems that are solved in parallel. For this

reason, they are widely used to solve large-scale problems arising in areas as diverse

as wireless communications, optimal control, machine learning, artiﬁcial intelligence,

computational biology, ﬁnance and statistics, to name a few. Moreover, distributed

algorithms avoid the cost and fragility associated with centralized coordination, and

provide better privacy for the autonomous decision makers. These are desirable

properties, especially in applications involving networked robotics, communication

or sensor networks, and power distribution systems.

In this thesis we propose the Accelerated Distributed Augmented Lagrangians

(ADAL) algorithm, a novel decomposition method for convex optimization prob-

lems with certain separability structure. The method is based on the augmented

Lagrangian framework and addresses problems that involve multiple agents optimiz-

ing a separable convex objective function subject to convex local constraints and

linear coupling constraints. We establish the convergence of ADAL and also show

that it has a worst-case O(1/k) convergence rate, where k denotes the number of

iterations.

Moreover, we show that ADAL converges to a local minimum of the problem

for cases with non-convex objective functions. This is the ﬁrst published work that

formally establishes the convergence of a distributed augmented Lagrangian method

ivfor non-convex optimization problems. An alternative way to select the stepsizes

used in the algorithm is also discussed. These two contributions are independent

from each other, meaning that convergence of the non-convex ADAL method can

still be shown using the stepsizes from the convex case, and, similarly, convergence

of the convex ADAL method can be shown using the stepsizes proposed in the non-

convex proof.

Furthermore, we consider cases where the distributed algorithm needs to operate

in the presence of uncertainty and noise and show that the generated sequences of

primal and dual variables converge to their respective optimal sets almost surely. In

particular, we are concerned with scenarios where: i) the local computation steps

are inexact or are performed in the presence of uncertainty, and ii) the message

exchanges between agents are corrupted by noise. In this case, the proposed scheme

can be classiﬁed as a distributed stochastic approximation method. Compared to

existing literature in this area, our work is the ﬁrst that utilizes the augmented

Lagrangian framework. Moreover, the method allows us to solve a richer class of

problems as compared to existing methods on distributed stochastic approximation

that consider only consensus constraints.

Extensive numerical experiments have been carried out in an eﬀort to validate

the novelty and eﬀectiveness of the proposed method in all the areas of the afore-

mentioned theoretical contributions. We examine problems in convex, non-convex,

and stochastic settings where uncertainties and noise aﬀect the execution of the al-

gorithm. For the convex cases, we present applications of ADAL to certain popular

network optimization problems, as well as to a two-stage stochastic optimization

problem. The simulation results suggest that the proposed method outperforms

the state-of-the-art distributed augmented Lagrangian methods that are known in

the literature. For the non-convex cases, we perform simulations on certain simple

non-convex problems to establish that ADAL indeed converges to non-trivial local

vsolutions of the problems; in comparison, the straightforward implementation of the

other distributed augmented Lagrangian methods on the same problems does not

lead to convergence. For the stochastic setting, we present simulation results of

ADAL applied on network optimization problems and examine the eﬀect that noise

and uncertainties have in the convergence behavior of the method.

As an extended and more involved application, we also consider the problem

of relay cooperative beamforming in wireless communications systems. Speciﬁcally,

we study the scenario of a multi-cluster network, in which each cluster contains

multiple single-antenna source destination pairs that communicate simultaneously

over the same channel. The communications are supported by cooperating amplify-

and-forward relays, which perform beamforming. Since the emerging problem is non-

convex, we propose an approximate convex reformulation. Based on ADAL, we also

discuss two diﬀerent ways to obtain a distributed solution that allows for autonomous

computation of the optimal beamforming decisions by each cluster, while taking into

account intra- and inter-cluster interference eﬀects.

Our goal in this thesis is to advance the state-of-the-art in distributed optimization by proposing methods that combine fast convergence, wide applicability, ease

of implementation, low computational complexity, and are robust with respect to

delays, uncertainty in the problem parameters, noise corruption in the message ex-

changes, and inexact computations.

Item Open Access Formal Verification of Stochastic ReLU Neural Network Control System(2020) Sun, ShiqiIn this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussian noise, which we abstract by a suitable graph. Then, we formulate a Satisfiability Modulo Convex (SMC) problem to estimate upper bounds on the transition probabilities between nodes in the graph. Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes. Additionally, using the proposed SMC formula, we devise a heuristic method to refine the abstraction of the system in order to further improve the estimated safety bounds. Finally, we corroborate the efficacy of the proposed method with a robot navigation example and present comparative results with commonly employed verification schemes.

Item Open Access Human-in-the-Loop Robot Planning with Non-Contextual Bandit Feedback(2020) Zhou, YijieIn this paper, we consider robot navigation problems in environments populated by humans. The goal is to determine collision-free and dynamically feasible trajectories that also maximize human satisfaction, by ensuring that robots are available to assist humans with their work as needed and avoid actions that cause discomfort. In practice, human satisfaction is subjective and hard to describe mathematically. As a result, the planning problem we consider in this paper may lack important contextual information. To address this challenge, we propose a semi-supervised Bayesian Optimization (BO) method to design globally optimal robot trajectories using bandit human feedback, in the form of complaints or satisfaction ratings, that expresses how desirable a trajectory is. Since trajectory planning is typically a high-dimensional optimization problem in the space of waypoints that need to be decided, BO may require prohibitively many queries for human feedback to return a good solution. To this end, we use an autoencoder to reduce the high-dimensional space into a low dimensional latent space, which we update using human feedback. Moreover, we improve the exploration efficiency of BO by biasing the search for new trajectories towards dynamically feasible and collision-free trajectories obtained using off-the-shelf motion planners. We demonstrate the efficiency of our proposed trajectory planning method in a scenario with humans that have diversified and unknown demands.

Item Open Access MODEL-BASED LEARNING AND CONTROL OF ADVECTION-DIFFUSION TRANSPORT USING MOBILE ROBOTS(2019) Khodayi-mehr, RezaMathematical models that describe different processes and phenomena are of paramount importance in many robotics applications. Nevertheless, utilization of high-fidelity models, particularly Partial Differential Equations (PDEs), has been hindered for many years due to the lack of adequate computational resources onboard mobile robots. One such problem of interest for the roboticists, that can hugely benefit from more descriptive models, is Chemical Plume Tracing (CPT). In the CPT problem, one or multiple mobile robots are equipped with chemical concentration and flow sensors and attempt to localize chemical sources in an environment of interest. This problem has important applications ranging from environmental monitoring and protection to search and rescue missions. The transport of a chemical in a fluid medium is mathematically modeled by the Advection-Diffusion (AD) Partial Differential Equation (PDE). Despite versatility, rigorous derivation, and powerful descriptive nature, the AD-PDE has seldom been used in its general form for the solution of the CPT problem due to high computational cost. Instead, often simplified scenarios that render closed-form solutions for the AD-PDE or various heuristics are used in the robotics literature.

Using the AD-PDE to model the transport phenomenon enables generalization of the CPT problem to estimate other properties of the sources, e.g., their intensity, in addition to their locations. We refer to this problem as Source Identification (SI) which we define as the problem of estimating the properties of the sources using concentration measurements that are generated under the action of those sources. We can also put one step further and consider the problem of controlling a set of sources, carried by a team of mobile robots, to generate and maintain desired concentration levels in select regions of the environment with the objective of cloaking those regions from external environmental conditions; we refer to this problem as the AD-PDE control problem that has important applications in search and rescue missions.

Both SI and AD-PDE control problems can be formulated as PDE-constrained optimization problems. Solving such optimization problems onboard mobile robots is challenging due to the following reasons: (i) the computational cost of solving the AD-PDE using traditional numerical discretization schemes, e.g., the Finite Element (FE) method, is prohibitively high, (ii) obtaining accurate knowledge of the environment and Boundary and Initial Conditions (BICs), required to solve the AD-PDE, is difficult and prone to error and finally, (iii) obtaining accurate estimates of the velocity and diffusivity fields is challenging since for typical transport mediums like air even in very small velocities, the flow is turbulent. In addition, we need to plan the actions of the mobile robots, e.g., measurement collection for SI or release rates for the AD-PDE control problem, to ensure that they accomplish their tasks optimally. This can be done by formulating a planning problem that often is solved online to take into account the latest information that becomes available to robots. Solving this planning problem by itself is a challenging task that has been the subject of heavy research in the robotics literature. The reason is that (i) the objective is often nonlinear, (ii) the planning is preferred to be done for more than the immediate action to avoid myopic, suboptimal plans, and (iii) the environment that the robots operate in is often non-convex and cluttered with obstacles.

In order to address the computational challenges that rise due to the use of numerical schemes, we propose using multiple mobile robots that decompose the high-dimensional optimization variables among themselves or using nonlinear representations of the sources. In addition we utilize Model Order Reduction (MOR) approaches that facilitate the evaluation of the AD-DPE at the expense of accuracy. In order to alleviate the loss of accuracy, we also propose a novel MOR method using Neural Networks that can straight-forwardly replace the traditional MOR methods in our formulations. To deal with uncertainty in the the PDE input-data, i.e., the geometry of environment, BICs, and the velocity and diffusivity fields, we formulate a stochastic version of the SI problem that provides posterior probabilities over all possible values of these uncertain parameters. Finally, to obtain the velocity and corresponding diffusivity fields that are required for the solution of the AD-PDE, we rely on Bayesian inference to incorporate empirical measurements, collected and analyzed by mobile robots, into the numerical solutions obtained from computational fluid dynamics models.

In order to demonstrate the applicability of our proposed model-based approaches, we have devised and constructed an experimental setup and designed a mobile robot equipped with concentration and flow sensors. To the best of our knowledge, this dissertation is the first work to use the AD-PDE, in its general form, to solve realistic problems onboard mobile robots. Note that although here we focus on the AD-PDE and particularly chemical propagation, many other transport phenomena including heat and acoustic transport can be considered and the same principles apply. Our results are a proof of concept that we hope will convince many roboticists to use more general mathematical models in their solutions.

Item Open Access Passive Acoustic Localization and Tracking with Mobile Robots(2021) Calkins, William LucasAcoustic sensing has received a lot of attention in the underwater domain as this is usually the only form of sensing available. As robotic platforms have been ever increasing in terms of computational capabilities, there now exists the ability to autonomously make decisions and navigate without human intervention. This dissertation proposes and demonstrates acoustic sensing onboard mobile robotic platforms in passive bearing-only tracking of surface vessels in the water and in detecting nearby obstacles in aerial systems.

First, we consider the problem of target tracking with a bearing-only sensor in the presence of merged measurements. Assuming the number of targets in the domain is known, we incorporated a merged measurement model into a nonlinear joint probabilistic data association filter (JDPAF). We demonstrate the ability to track multiple targets through merging events. Furthermore, we propose a novel planning algorithm that incorporates the merged measurement model into the planning process. The result is a planning trajectory biased away from regions where targets will merged in the measurement space, as this leads to higher uncertainty in the target state estimates. We present experimental results with unmanned ground vehicles equipped with camera sensors acting as a surrogate for a bearing-only passive sonar sensor.

Next, we consider the problem of bearing-only tracking of multiple targets using a port-starboard ambiguous sensor. This is the type of sensor used onboard our Autonomous Underwater vehicles (AUVs). We address the problem of resolving the ambiguity by using a likelihood ratio detection and tracking (LRDT) method. The LRDT serves as a front end detector to initialize tracks and pass off to a tracking algorithm. We show that as long as the ambiguity is resolved, the JPDAF algorithm can track targets even with ambiguous measurements. We run our detector-tracker system on a dataset taken in Boston Harbor in August 2018. We show effective functioning of the detector tracker system and provide a discussion for improvements that we are currently working on at the time of writing this dissertation.

We also explore acoustic sensing in aerial vehicles using the self-generated noise caused by the vehicles normal operation. We first propose an algorithm to actively control the distance between a motor propeller system (MPS) and large obstacle using data from a single microphone. By first recording and storing the free-field response of the MPS, we show that by subtracting the power spectrum of the free-field response from the power spectrum when a wall is present, we can reveal a broadband interference pattern. The dominant oscillating frequency of this interference pattern is linearly related to the distance from the microphone to the wall. By performing a fast Fourier transform on the difference between the spectra, we show that we can extract this distance and actively control it in real time. We present a test rig demonstrating the algorithm experimentally.

Finally, we offer an improvement to the aerial acoustic sensing system by adding an additional microphone. We develop a novel cross-correlation processing algorithm that is able to extract the distance from the microphones to the wall. This method does not rely on computing the free-field response of the MPS. We demonstrate the algorithm in experiment by controlling the altitude of a blimp-like vehicle using only the self-generated noise and two microphones placed on the bottom of the vehicle.

Item Open Access Scalable Control Synthesis for Multi-Robot Systems under Temporal Logic Specifications(2020) Luo, XushengThe study of high-level complex tasks for robotics, captured by temporal logics, e.g., Linear Temporal Logic (LTL), has gained significant research interest in the last decade, which extends the traditional point-to-point navigation by incorporating temporal goals. This dissertation proposes and evaluates scalable path planning and control synthesis methods for robotic systems under temporal logic specifications. The scalability is measured by the number of robots, the size of the environment, and the complexity of temporal logic specifications.

First, we consider the optimal control synthesis to satisfy the task specified by temporal logic specifications. Given the same discrete workspace, rather than solving a new formula from scratch, we propose a method that exploits experience from solving similar LTL tasks before. The key idea is to decompose complex LTL tasks into simpler subtasks appropriately and define sets of skills, or plans, needed to solve these subtasks. These skills can be stored in a library of reusable skills and can be used to quickly synthesize plans for new tasks that have not been encountered before, meanwhile expanding the library with new skills. We present numerical experiments that show that our approach generally outperforms these methods in terms of time to generate feasible plans. We also show that our proposed algorithm is probabilistically complete and asymptotically optimal.

Next, we consider the problem of optimally allocating tasks, expressed as global LTL specifications to teams of heterogeneous mobile robots. The robots are classified in different types that capture their different capabilities in accomplishing tasks, and each task may require robots of multiple types. The specific robots assigned to each task are immaterial, as long as they are of the desired type. Given a discrete workspace, our goal is to design paths, i.e., sequences of discrete states, for the robots so that the LTL specification is satisfied. To obtain a scalable solution to this complex assignment problem, we propose a hierarchical approach that first allocates specific robots to tasks using the information about tasks provided by the Nondeterministic Büchi Automaton (NBA) that captures the LTL specification, and then designs low-level executable plans for the robots that respect the high-level assignment. We provide theoretical results showing completeness and soundness of our proposed method and present numerical simulations demonstrating that our method can generate robot paths with lower cost, considerably faster than existing methods.

The majority of existing LTL planning methods rely on the construction of a discrete product automaton that combines a discrete abstraction of robot mobility and the NBA corresponding to the LTL specification. However, constructing expressive discrete abstractions makes the synthesis problem computationally intractable. Finally, we propose a new sampling-based LTL planning algorithm that does not require any discrete abstraction of robot mobility. Instead, it incrementally builds trees that explore the product state-space, until a maximum number of iterations is reached or a feasible plan is found. To accelerate the construction of feasible plans, we introduce bias in the sampling process which is guided by transitions in the Büchi automaton that belong to the shortest path to the accepting states. We show that our planning algorithm, with and without bias, is probabilistically complete and asymptotically optimal. Finally, we present numerical experiments showing that our method outperforms relevant temporal logic planning methods.

Item Open Access Sensor Planning for Bayesian Nonparametric Target Modeling(2016) Wei, HongchuanBayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.

Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.

Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with

little or no prior knowledge

Item Open Access Transfer learning in continuous RL under unobservable contextual information(2020) Liu, ChenyuIn this paper, we consider a transfer Reinforcement Learning (RL) problem in continuous stateand action spaces, under unobserved contextual information. The context here can represent a specific unique mental view of the world that an expert agent has formed through past interactions with this world. We assume that this context is not accessible to a learner agent who can only observe the expert data and does not know how they were generated. Then, our goal is to use the context-aware continuous expert data to learn an optimal context-unaware policy for the learner using only a few new data samples. To this date, such problems are typically solved using imitation learning that assumes that both the expert and learner agents have access to the same information. However, if the learner does not know the expert context, using the expert data alone will result in a biased learner policy and will require many new data samples to improve. To address this challenge, in this paper, we formulate the learning problem that the learner agent solves as a causal bound-constrained Multi-Armed-Bandit (MAB) problem. The arms of this MAB correspond to a set of basis policy functions that can be initialized in an unsupervised way using the expert data and represent the different expert behaviors affected by the unobserved context. On the other hand, the MAB constraints correspond to causal bounds on the accumulated rewards of these basis policy functions that we also compute from the expert data. The solution to this MAB allows the learner agent to select the best basis policy and improve it online. And the use of causal bounds reduces the exploration variance and, therefore, improves the learning rate. We provide numerical experiments on an autonomous driving example that show that our proposed transfer RL method improves the learner’s policy faster compared to imitation learning methods and enjoys much lower variance during training