Browsing by Author "Ferrari, Silvia"
Results Per Page
Sort Options
Item Open Access A Cell Decomposition Approach to Robotic Trajectory Planning via Disjunctive Programming(2012) Swingler, AshleighThis thesis develops a novel solution method for the problem of collision-free, optimal control of a robotic vehicle in an obstacle populated environment. The technique presented combines the well established approximate cell decomposition methodology with disjunctive programming in order to address both geometric and kinematic trajectory concerns. In this work, an algorithm for determining the shortest distance, collision-free path of a robot with unicycle kinematics is developed. In addition, the research defines a technique to discretize nonholonomic vehicle kinematics into a set of mixed integer linear constraints. Results obtained using the Tomlab/CPLEX mixed integer quadratic programming software exhibit that the method developed provides a powerful initial step in reconciling geometric path planning methods with optimal control techniques.
Item Open Access A Distributed Optimal Control Approach for Multi-agent Trajectory Optimization(2013) Foderaro, GregThis dissertation presents a novel distributed optimal control (DOC) problem formulation that is applicable to multiscale dynamical systems comprised of numerous interacting systems, or agents, that together give rise to coherent macroscopic behaviors, or coarse dynamics, that can be modeled by partial differential equations (PDEs) on larger spatial and time scales. The DOC methodology seeks to obtain optimal agent state and control trajectories by representing the system's performance as an integral cost function of the macroscopic state, which is optimized subject to the agents' dynamics. The macroscopic state is identified as a time-varying probability density function to which the states of the individual agents can be mapped via a restriction operator. Optimality conditions for the DOC problem are derived analytically, and the optimal trajectories of the macroscopic state and control are computed using direct and indirect optimization algorithms. Feedback microscopic control laws are then derived from the optimal macroscopic description using a potential function approach.
The DOC approach is demonstrated numerically through benchmark multi-agent trajectory optimization problems, where large systems of agents were given the objectives of traveling to goal state distributions, avoiding obstacles, maintaining formations, and minimizing energy consumption through control. Comparisons are provided between the direct and indirect optimization techniques, as well as existing methods from the literature, and a computational complexity analysis is presented. The methodology is also applied to a track coverage optimization problem for the control of distributed networks of mobile omnidirectional sensors, where the sensors move to maximize the probability of track detection of a known distribution of mobile targets traversing a region of interest (ROI). Through extensive simulations, DOC is shown to outperform several existing sensor deployment and control strategies. Furthermore, the computation required by the DOC algorithm is proven to be far reduced compared to that of classical, direct optimal control algorithms.
Item Open Access A Q-Learning Approach to Minefield Characterization from Unmanned Aerial Vehicles(2012) Daugherty, Stephen GreysonThe treasure hunt problem to determine how a computational agent can maximize its ability to detect and/or classify multiple targets located in a region of interest (ROI) populated with multiple obstacles. One particular instance of this problem involves optimizing the performance of a sensor mounted on an unmanned aerial vehicle (UAV) flying over a littoral region in order to detect mines buried underground.
Buried objects (including non-metallic ones) have an effect on the thermal conductivity and heat retention of the soil in which they reside. Because of this, objects that are not very deep below the surface often create measurable thermal anomalies on the surface soil. Because of this, infrared (IR) sensors have the potential to find mines and minelike objects (referred to in this thesis as clutters).
As the sensor flies over the ROI, sensor data is obtained. The sensor receives the data as pixellated infrared light signatures. Using this, ground temperature measurements are recorded and used to generate a two-dimensional thermal profile of the field of view (FOV) and map that profile onto the geography of the ROI.
The input stream of thermal data is then passed to an image processor that estimates the size and shape of the detected target. Then a Bayesian Network (BN) trained from a database of known mines and clutters is used to provide the posterior probability that the evidence obtained by the IR sensor for each detected target was the result of a mine or a clutter. The output is a confidence level (CL), and each target is classified as a mine or a clutter according to the most likely explanation (MLE) for the sensor evidence. Though the sensor may produce incomplete, noisy data, inferences from the BN attenuate the problem.
Since sensor performance depends on altitude and environmental conditions, the value of the IR information can be further improved by choosing the flight path intelligently. This thesis assumes that the UAV is flying through an environmentally homogeneous ROI and addresses the question of how the optimal altitude can be determined for any given multi-dimensional environmental state.
In general, high altitudes result in poor resolution, whereas low altitudes result in very limited FOVs. The problem of weighing these tradeoffs can be addressed by creating a scoring function that is directly dependent on a comparison between sensor outputs and ground truth. The scoring function provides a flexible framework through which multiple mission objectives can be addressed by assigning different weights to correct detections, correct non-detections, false detections, and false non-detections.
The scoring function provides a metric of sensor performance that can be used as feedback to optimize the sensor altitude as a function of the environmental conditions. In turn, the scoring function can be empirically evaluated over a number of different altitudes and then converted to empirical Q scores that also weigh future rewards against immediate ones. These values can be used to train a neural network (NN). The NN filters the data and interpolates between discrete Q-values to provide information about the optimal sensor altitude.
The research described in this thesis can be used to determine the optimal control policy for an aircraft in two different situations. The global maximum of the Q-function can be used to determine the altitude at which a UAV should cruise over an ROI for which the environmental conditions are known a priori. Alternatively, the local maxima of the Q-function can be used to determine the altitude to which a UAV should move if the environmental variables change during flight.
This thesis includes the results of computer simulations of a sensor flying over an ROI. The ROI is populated with targets whose characteristics are based on actual mines and minelike objects. The IR sensor itself is modeled by using a BN to create a stochastic simulation of the sensor performance. The results demonstrate how Q-learning can be applied to signals from a UAV-mounted IR sensor whose data stream is preprocessed by a BN classifier in order to determine an optimal flight policy for a given set of environmental conditions.
Item Open Access An Information-driven Approach for Sensor Path Planning(2011) Lu, WenjieThis thesis addresses the problem of information-driven sensor path planning for the purpose of target detection, measurement, and classification using non-holonomic mobile sensor agents (MSAs). Each MSA is equipped with two types of sensors. One is the measuring sensor with small FOV, while the other is the detecting sensor with large FOV. The measuring sensor could be ground penetrating radar (GPR), while the detecting sensor can be infrared radar (IR). The classification of a target can be reduced to the problem of estimating one or more random variables associated with this target from partial or imperfect measurements from sensorscite{stengel}, and can be represented by a probability mass function (PMF). Previous work shows the performance of MSAs can be greatly improved by planning their motion and control laws based on their sensing objectives. Because of the stochastic nature of sensing objective, the expected measurement benefit of a target, i.e, the information value, is defined as the expected entropy reduction of its classification PMF before the next measurement is taken of this target. The information value of targets is combined with other robot motion planning methods to address the sensor planning problem.
By definition, the entropy reduction can be represented by conditional mutual information of PMF given a measurement. MSAs are deployed in an obstacle-populated environment, and must avoid collisions with obstacles, as well as, in some cases, targets.
This thesis first presents a modified rapidly-exploring random trees (RRTs) approach with a novel milestone sampling method. The sampling function for RRTs takes into account the information value of targets, and sensor measurements of obstacle locations, as well as MSAs' configurations (e.g., position and orientation) and velocities to generate new milestones for expanding the trees online. By considering the information value, the sample function favors expansions toward targets with higher expected measurement benefit. After sampling, the MSAs navigate to the selected milestones based on the critic introduced later and take measurements of targets within their FOVs. Then, the thesis introduces an information potential method (IPM) approach which combined information values of targets with the potential functions. Targets with high information value have larger influence distance and tend to have high probability to be measured by the MSAs. Additionally, this information potential field is utilized to generate the milestones in a local probabilistic roadmap method to help MSAs escape their local minima.
The proposed two methods are applied to a landmine classification problem. It is assumed that geometries and locations of partial obstacles and targets are available as prior information, as well as previous measurements on targets concerning their classification. The experiments show that paths of MSAs using the modified RRTs and IPM take advantages of the information value by favoring targets with high information value. Furthermore, the results show that the IPM
outperforms other approaches such as the modified RRTs with information value and classical potential field method that does not take target information value into account.
Item Open Access An Integrated Online Path Planning and Control Approach for Robotic Sensor Networks(2010) Zhang, GuoxianThis dissertation addresses an information-driven sensor path planning problem which has various applications such as robot cleaning, environment monitoring, and manufacturing. Information-driven sensor path planning is concerned with planning the measurements of a sensor or a sensor network in order to support sensing objectives, such as target detection, classification and localization, based on prior information. When the sensor's field-of-view or visibility region is bounded, the sensor's position and orientation determine what targets can be measured at any given time. Therefore, the sensor path must be planned in concert with the measurement sequence. When sensors are installed on robotic platforms and are deployed in an obstacle-populated environment, the sensor path must also avoid collisions between the platform and the obstacles or other robotic sensors. Addressing this sensor path planning problem, this dissertation first presents a general and systematic approach for deriving information value functions that represent the expected utility of sensor decisions in a canonical sensor planning problem. The resulting information functions and search strategies are compared through extensive numerical simulations involving direct-search, alert-confirm, task-driven, and log-likelihood-ratio search strategies, and the maximum a-posteriori, maximum-likelihood,
and Neyman-Pearson decision rules. After that a novel off-line information roadmap method is developed to navigate single robotic sensor in which obstacles, targets, sensor's platform and field of view are represented as closed and bounded subsets of an Euclidean workspace. The information roadmap is sampled from a normalized information theoretic metric that favors samples with a high value of information in configuration space. Finally, when multiple robotic sensors are deployed in the workspace, and information of the workspace such as geometry, location, and prior measurements on targets and obstacles can become available online, another novel sensor path planning method, named information potential method, is proposed to take into account the new information obtained over time. Targets with high information value tend to have high probability to be measured by the robotic sensor network. A hybrid control system is utilized to coordinate and control each robotic sensor in the network to detect and measure obstacles and targets in the workspace. The potential function is also utilized to generate the milestones in a local probabilistic roadmap method to help robotic sensors escape their local minima.
The proposed methods are applied to a landmine classification problem to plan the path of a robotic sensor network in which each robot is equipped with a ground-penetrating radar. Other sensors, such as infrared sensors on unmanned aerial vehicles (UAVs), are utilized a priori for target detection and cursory classification. In the off-line sensor path planning applications for a single robotic sensor, experiments show that paths obtained from the information roadmap exhibit a classification efficiency significantly higher than that of existing robot motion strategies. Also, the information roadmap can be used to deploy non-overpass capable robots that must avoid targets as well as obstacles. Then in the multiple online robotic sensor network path planing applications, experiments show that path obtained from the information potential method takes advantages of the online information and coordination among robotic sensors, and the results show that the information potential method outperforms other strategies such as rapidly-exploring random trees and classical potential field methods that does not take target information value into account.
Item Open Access An Integrated Online Path Planning and Control Approach for Robotic Sensor Networks(2010) Zhang, GuoxianThis dissertation addresses an information-driven sensor path planning problem which has various applications such as robot cleaning, environment monitoring, and manufacturing. Information-driven sensor path planning is concerned with planning the measurements of a sensor or a sensor network in order to support sensing objectives, such as target detection, classification and localization, based on prior information. When the sensor's field-of-view or visibility region is bounded, the sensor's position and orientation determine what targets can be measured at any given time. Therefore, the sensor path must be planned in concert with the measurement sequence. When sensors are installed on robotic platforms and are deployed in an obstacle-populated environment, the sensor path must also avoid collisions between the platform and the obstacles or other robotic sensors. Addressing this sensor path planning problem, this dissertation first presents a general and systematic approach for deriving information value functions that represent the expected utility of sensor decisions in a canonical sensor planning problem. The resulting information functions and search strategies are compared through extensive numerical simulations involving direct-search, alert-confirm, task-driven, and log-likelihood-ratio search strategies, and the maximum a-posteriori, maximum-likelihood,
and Neyman-Pearson decision rules. After that a novel off-line information roadmap method is developed to navigate single robotic sensor in which obstacles, targets, sensor's platform and field of view are represented as closed and bounded subsets of an Euclidean workspace. The information roadmap is sampled from a normalized information theoretic metric that favors samples with a high value of information in configuration space. Finally, when multiple robotic sensors are deployed in the workspace, and information of the workspace such as geometry, location, and prior measurements on targets and obstacles can become available online, another novel sensor path planning method, named information potential method, is proposed to take into account the new information obtained over time. Targets with high information value tend to have high probability to be measured by the robotic sensor network. A hybrid control system is utilized to coordinate and control each robotic sensor in the network to detect and measure obstacles and targets in the workspace. The potential function is also utilized to generate the milestones in a local probabilistic roadmap method to help robotic sensors escape their local minima.
The proposed methods are applied to a landmine classification problem to plan the path of a robotic sensor network in which each robot is equipped with a ground-penetrating radar. Other sensors, such as infrared sensors on unmanned aerial vehicles (UAVs), are utilized a priori for target detection and cursory classification. In the off-line sensor path planning applications for a single robotic sensor, experiments show that paths obtained from the information roadmap exhibit a classification efficiency significantly higher than that of existing robot motion strategies. Also, the information roadmap can be used to deploy non-overpass capable robots that must avoid targets as well as obstacles. Then in the multiple online robotic sensor network path planing applications, experiments show that path obtained from the information potential method takes advantages of the online information and coordination among robotic sensors, and the results show that the information potential method outperforms other strategies such as rapidly-exploring random trees and classical potential field methods that does not take target information value into account.
Item Open Access Autonomous Sensor Path Planning and Control for Active Information Gathering(2014) Lu, WenjieSensor path planning and control refer to the problems of determining the trajectory and feedback control law that best support sensing objectives, such as monitoring, detection, classification, and tracking. Many autonomous systems developed, for example, to conduct environmental monitoring, search-and-rescue operations, demining, or surveillance, consist of a mobile vehicle instrumented with a suite of proprioceptive and exteroceptive sensors characterized by a bounded field-of-view (FOV) and a performance that is highly dependent on target and environmental conditions and, thus, on the vehicle position and orientation relative to the target and the environment. As a result, the sensor performance can be significantly improved by planning the vehicle motion and attitude in concert with the measurement sequence. This dissertation develops a general and systematic approach for deriving information-driven path planning and control methods that maximize the expected utility of the sensor measurements subject to the vehicle kinodynamic constraints.
The approach is used to develop three path planning and control methods: the information potential method (IP) for integrated path planning and control, the optimized coverage planning based on the Dirichlet process-Gaussian process (DP-GP) expected Kullback-Leibler (KL) divergence, and the optimized visibility planning for simultaneous target tracking and localization. The IP method is demonstrated on a benchmark problem, referred to as treasure hunt, in which an active vision sensor is mounted on a mobile unicycle platform and is deployed to classify stationary targets characterized by discrete random variables, in an obstacle-populated environment. In the IP method, an artificial potential function is generated from the expected conditional mutual information of the targets and is used to design a closed-loop switched controller. The information potential is also used to construct an information roadmap for escaping local minima. Theoretical analysis shows that the closed-loop robotic system is asymptotically stable and that an escaping path can be found when the robotic sensor is trapped in a local minimum. Numerical simulation results show that this method outperforms rapidly-exploring random trees and classical potential methods. The optimized coverage planning method maximizes the DP-GP expected KL divergence approximated by Monte Carlo integration in order to optimize the information value of a vision sensor deployed to track and model multiple moving targets. The variance of the KL approximation error is proven to decrease linearly with the inverse of the number of samples. This approach is demonstrated through a camera-intruder problem, in which the camera pan, tilt, and zoom variables are controlled to model multiple moving targets with unknown kinematics by nonparametric DP-GP mixture models. Numerical simulations as well as physical experiments show that the optimized coverage planning approach outperforms other applicable algorithms, such as methods based on mutual information, rule-based systems, and randomized planning. The third approach developed in this dissertation, referred to as optimized visibility motion planning, uses the output of an extended Kalman filter (EKF) algorithm to optimize the simultaneous tracking and localization performance of a robot equipped with proprioceptive and exteroceptive sensors, that is deployed to track a moving target in a global positioning system (GPS) denied environment.
Because active sensors with multiple modes can be modeled as a switched hierarchical system, the sensor path planning problem can be viewed as a hybrid optimal control problem involving both discrete and continuous state and control variables. For example, several authors have shown that a sensor with multiple modalities is a switched hybrid system that can be modeled by a hierarchical control architecture with components of mission planning, trajectory planning, and robot control. Then, the sensor performance can be represented by two Lagrangian functions, one function of the discrete state and control variables, and one function of the continuous state and control variables. Because information value functions are typically nonlinear, this dissertation also presents an adaptive dynamic programming approach for the model-free control of nonlinear switched systems (hybrid ADP), which is capable of learning the optimal continuous and discrete controllers online. The hybrid ADP approach is based on new recursive relationships derived in this dissertation and is proven to converge to the solution of the hybrid optimal control problem. Simulation results show that the hybrid ADP approach is capable of converging to the optimal controllers by minimizing the cost-to-go online based on a fully observable state vector.
Item Open Access Control and Optimization of Track Coverage in Underwater Sensor Networks(2007-12-14) Baumgartner, Kelli A. CrewsSensor network coverage refers to the quality of service provided by a sensor network surveilling a region of interest. So far, coverage problems have been formulated to address area coverage or to maintain line-of-sight visibility in the presence of obstacles (i.e., art-gallery problems). Although very useful in many sensor applications, none of the existing formulations address coverage as it pertains to target tracking by means of multiple sensors, nor do they provide a closed-form function that can be applied to the problem of allocating sensors for the surveilling objective of maximizing target detection while minimizing false alarms. This dissertation presents a new coverage formulation addressing the quality of service of sensor networks that cooperatively detect targets traversing a region of interest, and is readily applicable to the current sensor network coverage formulations. The problem of track coverage consists of finding the positions of n sensors such that the amount of tracks detected by at least k sensors is optimized. This dissertation studies the geometric properties of the network, addressing a deterministic track-coverage formulation and binary sensor models. It is shown that the tracks detected by a network of heterogeneous omnidirectional sensors are the geometric transversals of non-translates families of disks. A novel methodology based on cones and convex analysis is presented for representing and measuring sets of transversals as closed-form functions of the sensors positions and ranges. As a result, the problem of optimally deploying a sensor network with the aforementioned objectives can be formulated as an optimization problem subject to mission dynamics and constraints. The sensor placement problem, in which the sensors are placed such that track coverage is maximized for a fixed sensor network, is formulated as a nonlinear program and solved using sequential quadratic programming. The sensor deployment, involving a dynamic sensor network installed on non-maneuverable sonobuoys deployed in the ocean, is formulated as an optimization problem subject to inverse dynamics. Both a finite measure of the cumulative coverage provided by a sensor network over a fixed period of time and the oceanic-induced current velocity field are accounted for in order to optimize the dynamic sensor network configuration. It is shown that a state-space representation of the motions of the individual sensors subject to the current vector field can be derived from sonobuoys oceanic drift models and obtained from CODAR measurements. Also considered in the sensor model are the position-dependent acoustic ranges of the sensors due to the effects from heterogenous environmental conditions, such as ocean bathymetry, surface temporal variability, and bottom properties. A solution is presented for the initial deployment scheme of the non-maneuverable sonobuoys subject to the ocean's current, such that sufficient track coverage is maintained over the entire mission. As sensor networks are subject to random disturbances due to unforseen heterogenous environmental conditions propagated throughout the sensors trajectories, the optimal initial positions solution is evaluated for robustness through Monte Carlo simulations. Finally, the problem of controlling a network of maneuverable underwater vehicles, each equipped with an onboard acoustic sensor is formulated using optimal control theory. Consequently, a new optimal control problem is presented that integrates sensor objectives, such as track coverage, with cooperative path planning of a mobile sensor network subject to time-varying environmental dynamics.Item Open Access Indirect Training Algorithms for Spiking Neural Networks Controlled Virtual Insect Navigation(2015) Zhang, XuEven though Articial Neural Networks have been shown capable of solving many problems such as pattern recognition, classication, function approximation, clinics, robotics, they suers intrinsic limitations, mainly for processing large amounts of data or for fast adaptation to a changing environment. Several characteristics, such as iterative learning algorithms or articially designed neuron model and network architecture, are strongly restrictive compared with biological processing in natural neural networks. Spiking neural networks as the newest generation of neural models can overcome the weaknesses of ANNs. Because of the biologically realistic properties, the electrophysiological recordings of neural circuits can be compared to the outputs of the corresponding spiking neural network simulated on the computer, determining the plausibility of the starting hypothesis. Comparing with ANN, it is known that any function that can be computed by a sigmoidal neural network can also be computed by a small network of spiking neurons. In addition, for processing a large amount of data, SNNs can transmit and receive a large amount of data through the timing of the spikes and remarkably decrease the interactions load between neurons. This makes possible for very ecient parallel implementations.
Many training algorithms have been proposed for SNN training mainly based on the direct update of the synaptic plasticities or weights. However, the weights can not be changed directly and, instead, can be changed by the interactions of pre- and postsynaptic neural activities in many potential applications of adaptive spiking neural networks, including neuroprosthetic devices and CMOS/memristor nanoscale neuromorphic chips. The eciency of the bio-inspired, neuromorphic processing exposes the shortcomings of digital computing. After trained, the simulated neuromorphic model can be applied to speaker recognition, looming detection and temporal pattern matching. The properties of the neuromorphic chip enable it to solve the same problem while using fewer energies comparing with other hardware. The neuromorphic chips need applicable training methods that do not require direct manipulations of the connection strength.
Nowadays, thanks to fast improvements in hardware for neural stimulation and recording technologies, neurons in vivo and vitro can be controlled to re precisely in milliseconds. These improvements enable the study on the link between synaptic level and functional-level plasticity in the brain. However, existing training methods rely on learning rules for manipulating synaptic weights and on detailed knowledge of the network connectivity and synaptic strengths. New training algorithms that do not require the knowledge of the synaptic weights or connections are needed while they cannot require direct manipulations of the synaptic strength.
This thesis presents indirect training methods to train spiking neural networks,
which can both modeling neuromorphic chips and biological neural networks in vivo, via extra stimulus without the knowledge of synaptic strengths and connections. The algorithms are based on the spike timing-dependent plasticity rule by controlling input spike trains. One of the algorithms minimizes the error between the synaptic weight and the optimal weight, by stimulating the input neuron with an adaptive pulse training determined by the gradient of the error function. Another algorithm uses numerical gradient of the output error with respect to the weight change to control the training stimulus, which are injected to the neural network for controlling a virtual insect for navigating and nding target in an unknown terrain. Finally, the newest algorithm uses indirect perturbation of the temporal dierences between the extra stimulus in order to train a large spiking neural network. The trained spiking neural network can control both a unicycle modeled virtual insect and a virtual insect
moving in a tripod gait. The results show that these indirect training algorithms can train SNNs for solving control problems. In the thesis, the trained insect can and its target while avoiding obstacles in an unknown terrain. Future studies will focus on improving the insect's movement to using more complex locomotion model. The training algorithms will also be applied to biological neural networks and CMOS memristors. The trained neural networks will also be used for controlling flying microrobots.
Item Open Access Information-driven Sensor Path Planning and the Treasure Hunt Problem(2008-04-25) Cai, ChenghuiThis dissertation presents a basic information-driven sensor management problem, referred to as treasure hunt, that is relevant to mobile-sensor applications such as mine hunting, monitoring, and surveillance. The objective is to classify/infer one or multiple fixed targets or treasures located in an obstacle-populated workspace by planning the path and a sequence of measurements of a robotic sensor installed on a mobile platform associated with the treasures distributed in the sensor workspace. The workspace is represented by a connectivity graph, where each node represents a possible sensor deployment, and the arcs represent possible sensor movements. A methodology is developed for planning the sensing strategy of a robotic sensor deployed. The sensing strategy includes the robotic sensor's path, because it determines which targets are measurable given a bounded field of view. Existing path planning techniques are not directly applicable to robots whose primary objective is to gather sensor measurements. Thus, in this dissertation, a novel approximate cell-decomposition approach is developed in which obstacles, targets, the sensor's platform and field of view are represented as closed and bounded subsets of an Euclidean workspace. The approach constructs a connectivity graph with observation cells that is pruned and transformed into a decision tree, from which an optimal sensing strategy can be computed. It is shown that an additive incremental-entropy function can be used to efficiently compute the expected information value of the measurement sequence over time. The methodology is applied to a robotic landmine classification problem and the board game of CLUE$^{\circledR}$. In the landmine detection application, the optimal strategy of a robotic ground-penetrating radar is computed based on prior remote measurements and environmental information. Extensive numerical experiments show that this methodology outperforms shortest-path, complete-coverage, random, and grid search strategies, and is applicable to non-overpass capable platforms that must avoid targets as well as obstacles. The board game of CLUE$^{\circledR}$ is shown to be an excellent benchmark example of treasure hunt problem. The test results show that a player implementing the strategies developed in this dissertation outperforms players implementing Bayesian networks only, Q-learning, or constraint satisfaction, as well as human players.Item Open Access Non-parametric approximate linear programming for MDPs(2012) Pazis, JasonOne of the most difficult tasks in value function based methods for learning in Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. This thesis presents a novel Non-Parametric approach to Approximate Linear Programming (NP-ALP), which requires nothing more than a smoothness assumption on the value function. NP-ALP can make use of real-world, noisy sampled transitions rather than requiring samples from the full Bellman equation, while providing the first known max-norm, finite sample performance guarantees for ALP under mild assumptions. Additionally NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.Item Open Access Probabilistic inferential decision-making under time pressure in rhesus macaques (Macaca mulatta)(Journal of Comparative Psychology) Toader, Andrew; Rao, Hrishikesh; Ryoo, Minyoung; Bohlen, Martin; Cruger, Jessi; Oh-Descher, Hanna; Ferrari, Silvia; Egner, Tobias; Beck, Jeffrey; Sommer, MarcDecisions often involve the consideration of multiple cues, each of which may inform selection on the basis of learned probabilities. Our ability to use probabilistic inference for decisions is bounded by uncertainty and constraints such as time pressure. Previous work showed that when humans choose between visual objects in a multiple-cue, probabilistic task, they cope with time pressure by discounting the least informative cues, an example of satisficing or “good enough” decision-making. We tested two rhesus macaques (Macaca mulatta) on a similar task to assess their capacity for probabilistic inference and satisficing in comparison with humans. On each trial, a monkey viewed two compound stimuli consisting of four cue dimensions. Each dimension (e.g., color) had two possible states (e.g., red or blue) with different probabilistic weights. Selecting the stimulus with highest total weight yielded higher odds of receiving reward. Both monkeys learned the assigned weights at high accuracy. Under time pressure, both monkeys were less accurate as a result of decreased use of cue information. One monkey adopted the same satisficing strategy used by humans, ignoring the least informative cue dimension. Both monkeys, however, exhibited a strategy not reported for humans, a “group-the-best” strategy in which the top two cues were used similarly despite their different assigned weights. The results validate macaques as an animal model of probabilistic decision-making, establishing their capacity to discriminate between objects using at least four visual dimensions simultaneously. The time pressure data suggest caution, however, in using macaques as models of human satisficing.Item Open Access Sensor Planning for Bayesian Nonparametric Target Modeling(2016) Wei, HongchuanBayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.
Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.
Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with
little or no prior knowledge
Item Open Access Solving Partial Differential Equations Using Artificial Neural Networks(2013) Rudd, KeithThis thesis presents a method for solving partial differential equations (PDEs) using articial neural networks. The method uses a constrained backpropagation (CPROP) approach for preserving prior knowledge during incremental training for solving nonlinear elliptic and parabolic PDEs adaptively, in non-stationary environments. Compared to previous methods that use penalty functions or Lagrange multipliers,
CPROP reduces the dimensionality of the optimization problem by using direct elimination, while satisfying the equality constraints associated with the boundary and initial conditions exactly, at every iteration of the algorithm. The effectiveness of this method is demonstrated through several examples, including nonlinear elliptic
and parabolic PDEs with changing parameters and non-homogeneous terms. The computational complexity analysis shows that CPROP compares favorably to existing methods of solution, and that it leads to considerable computational savings when subject to non-stationary environments.
The CPROP based approach is extended to a constrained integration (CINT) method for solving initial boundary value partial differential equations (PDEs). The CINT method combines classical Galerkin methods with CPROP in order to constrain the ANN to approximately satisfy the boundary condition at each stage of integration. The advantage of the CINT method is that it is readily applicable to PDEs in irregular domains and requires no special modification for domains with complex geometries. Furthermore, the CINT method provides a semi-analytical solution that is infinitely differentiable. The CINT method is demonstrated on two hyperbolic and one parabolic initial boundary value problems (IBVPs). These IBVPs are widely used and have known analytical solutions. When compared with Matlab's nite element (FE) method, the CINT method is shown to achieve significant improvements both in terms of computational time and accuracy. The CINT method is applied to a distributed optimal control (DOC) problem of computing optimal state and control trajectories for a multiscale dynamical system comprised of many interacting dynamical systems, or agents. A generalized reduced gradient (GRG) approach is presented in which the agent dynamics are described by a small system of stochastic dierential equations (SDEs). A set of optimality conditions is derived using calculus of variations, and used to compute the optimal macroscopic state and microscopic control laws. An indirect GRG approach is used to solve the optimality conditions numerically for large systems of agents. By assuming a parametric control law obtained from the superposition of linear basis functions, the agent control laws can be determined via set-point regulation, such
that the macroscopic behavior of the agents is optimized over time, based on multiple, interactive navigation objectives.
Lastly, the CINT method is used to identify optimal root profiles in water limited ecosystems. Knowledge of root depths and distributions is vital in order to accurately model and predict hydrological ecosystem dynamics. Therefore, there is interest in accurately predicting distributions for various vegetation types, soils, and climates. Numerical experiments were were performed that identify root profiles that maximize transpiration over a 10 year period across a transect of the Kalahari. Storm types were varied to show the dependence of the optimal profile on storm frequency and intensity. It is shown that more deeply distributed roots are optimal for regions where
storms are more intense and less frequent, and shallower roots are advantageous in regions where storms are less intense and more frequent.