A Q-Learning Approach to Minefield Characterization from Unmanned Aerial Vehicles
The treasure hunt problem to determine how a computational agent can maximize its ability to detect and/or classify multiple targets located in a region of interest (ROI) populated with multiple obstacles. One particular instance of this problem involves optimizing the performance of a sensor mounted on an unmanned aerial vehicle (UAV) flying over a littoral region in order to detect mines buried underground.
Buried objects (including non-metallic ones) have an effect on the thermal conductivity and heat retention of the soil in which they reside. Because of this, objects that are not very deep below the surface often create measurable thermal anomalies on the surface soil. Because of this, infrared (IR) sensors have the potential to find mines and minelike objects (referred to in this thesis as clutters).
As the sensor flies over the ROI, sensor data is obtained. The sensor receives the data as pixellated infrared light signatures. Using this, ground temperature measurements are recorded and used to generate a two-dimensional thermal profile of the field of view (FOV) and map that profile onto the geography of the ROI.
The input stream of thermal data is then passed to an image processor that estimates the size and shape of the detected target. Then a Bayesian Network (BN) trained from a database of known mines and clutters is used to provide the posterior probability that the evidence obtained by the IR sensor for each detected target was the result of a mine or a clutter. The output is a confidence level (CL), and each target is classified as a mine or a clutter according to the most likely explanation (MLE) for the sensor evidence. Though the sensor may produce incomplete, noisy data, inferences from the BN attenuate the problem.
Since sensor performance depends on altitude and environmental conditions, the value of the IR information can be further improved by choosing the flight path intelligently. This thesis assumes that the UAV is flying through an environmentally homogeneous ROI and addresses the question of how the optimal altitude can be determined for any given multi-dimensional environmental state.
In general, high altitudes result in poor resolution, whereas low altitudes result in very limited FOVs. The problem of weighing these tradeoffs can be addressed by creating a scoring function that is directly dependent on a comparison between sensor outputs and ground truth. The scoring function provides a flexible framework through which multiple mission objectives can be addressed by assigning different weights to correct detections, correct non-detections, false detections, and false non-detections.
The scoring function provides a metric of sensor performance that can be used as feedback to optimize the sensor altitude as a function of the environmental conditions. In turn, the scoring function can be empirically evaluated over a number of different altitudes and then converted to empirical Q scores that also weigh future rewards against immediate ones. These values can be used to train a neural network (NN). The NN filters the data and interpolates between discrete Q-values to provide information about the optimal sensor altitude.
The research described in this thesis can be used to determine the optimal control policy for an aircraft in two different situations. The global maximum of the Q-function can be used to determine the altitude at which a UAV should cruise over an ROI for which the environmental conditions are known a priori. Alternatively, the local maxima of the Q-function can be used to determine the altitude to which a UAV should move if the environmental variables change during flight.
This thesis includes the results of computer simulations of a sensor flying over an ROI. The ROI is populated with targets whose characteristics are based on actual mines and minelike objects. The IR sensor itself is modeled by using a BN to create a stochastic simulation of the sensor performance. The results demonstrate how Q-learning can be applied to signals from a UAV-mounted IR sensor whose data stream is preprocessed by a BN classifier in order to determine an optimal flight policy for a given set of environmental conditions.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations