Browsing by Author "Collins, Leslie M"
Results Per Page
Sort Options
Item Open Access Adaptive Brain-Computer Interface Systems For Communication in People with Severe Neuromuscular Disabilities(2016) Mainsah, Boyla OBrain-computer interfaces (BCI) have the potential to restore communication or control abilities in individuals with severe neuromuscular limitations, such as those with amyotrophic lateral sclerosis (ALS). The role of a BCI is to extract and decode relevant information that conveys a user's intent directly from brain electro-physiological signals and translate this information into executable commands to control external devices. However, the BCI decision-making process is error-prone due to noisy electro-physiological data, representing the classic problem of efficiently transmitting and receiving information via a noisy communication channel.
This research focuses on P300-based BCIs which rely predominantly on event-related potentials (ERP) that are elicited as a function of a user's uncertainty regarding stimulus events, in either an acoustic or a visual oddball recognition task. The P300-based BCI system enables users to communicate messages from a set of choices by selecting a target character or icon that conveys a desired intent or action. P300-based BCIs have been widely researched as a communication alternative, especially in individuals with ALS who represent a target BCI user population. For the P300-based BCI, repeated data measurements are required to enhance the low signal-to-noise ratio of the elicited ERPs embedded in electroencephalography (EEG) data, in order to improve the accuracy of the target character estimation process. As a result, BCIs have relatively slower speeds when compared to other commercial assistive communication devices, and this limits BCI adoption by their target user population. The goal of this research is to develop algorithms that take into account the physical limitations of the target BCI population to improve the efficiency of ERP-based spellers for real-world communication.
In this work, it is hypothesised that building adaptive capabilities into the BCI framework can potentially give the BCI system the flexibility to improve performance by adjusting system parameters in response to changing user inputs. The research in this work addresses three potential areas for improvement within the P300 speller framework: information optimisation, target character estimation and error correction. The visual interface and its operation control the method by which the ERPs are elicited through the presentation of stimulus events. The parameters of the stimulus presentation paradigm can be modified to modulate and enhance the elicited ERPs. A new stimulus presentation paradigm is developed in order to maximise the information content that is presented to the user by tuning stimulus paradigm parameters to positively affect performance. Internally, the BCI system determines the amount of data to collect and the method by which these data are processed to estimate the user's target character. Algorithms that exploit language information are developed to enhance the target character estimation process and to correct erroneous BCI selections. In addition, a new model-based method to predict BCI performance is developed, an approach which is independent of stimulus presentation paradigm and accounts for dynamic data collection. The studies presented in this work provide evidence that the proposed methods for incorporating adaptive strategies in the three areas have the potential to significantly improve BCI communication rates, and the proposed method for predicting BCI performance provides a reliable means to pre-assess BCI performance without extensive online testing.
Item Open Access Automatic Identification of Training & Testing Data for Buried Threat Detection using Ground Penetrating Radar(2017) Reichman, DanielGround penetrating radar (GPR) is one of the most popular and successful sensing modalities that has been investigated for landmine and subsurface threat detection. The radar is attached to front of a vehicle and collects measurements on the path of travel. At each spatial location queried, a time-series of measurements is collected, and then the measured set of data are often visualized as images within which the signals corresponding to buried threats exhibit a characteristic appearance. This appearance is typically hyperbolic and has been leveraged to develop several automated detection methods. Many of the detection methods applied to this task are supervised, and therefore require labeled examples of threat and non-threat data for training. Labeled examples are typically obtained by collecting data over deliberately buried threats at known spatial locations. However, uncertainty exists with regards to the temporal locations in depth at which the buried threat signal exists in the imagery. This uncertainty is an impediment to obtaining labeled examples of buried threats to provide to the supervised learning model. The focus of this dissertation is on overcoming the problem of identifying training data for supervised learning models for GPR buried threat detection.
The ultimate goal is to be able to apply the lessons learned in order to improve the performance of buried threat detectors. Therefore, a particular focus of this dissertation is to understand the implications of particular data selection strategies, and to develop principled general strategies for selecting the best approaches. This is done by identifying three factors that are typically considered in the literature with regards to this problem. Experiments are conducted to understand the impact of these factors on detection performance. The outcome of these experiments provided several insights about the data that can help guide the future development of automated buried threat detectors.
The first set of experiments suggest that a substantial number of threat signatures are neither hyperbolic nor regular in their appearance. These insights motivated the development of a novel buried threat detector that improves over the state-of-the-art benchmark algorithms on a large collection of data. In addition, this newly developed algorithm exhibits improved characteristics of robustness over those algorithms. The second set of experiments suggest that automating the selection of data corresponding to the buried threats is possible and can be used to replace manually designed methods for this task.
Item Open Access Bayesian Techniques for Adaptive Acoustic Surveillance(2010) Morton, Kenneth DAutomated acoustic sensing systems are required to detect, classify and localize acoustic signals in real-time. Despite the fact that humans are capable of performing acoustic sensing tasks with ease in a variety of situations, the performance of current automated acoustic sensing algorithms is limited by seemingly benign changes in environmental or operating conditions. In this work, a framework for acoustic surveillance that is capable of accounting for changing environmental and operational conditions, is developed and analyzed. The algorithms employed in this work utilize non-stationary and nonparametric Bayesian inference techniques to allow the resulting framework to adapt to varying background signals and allow the system to characterize new signals of interest when additional information is available. The performance of each of the two stages of the framework is compared to existing techniques and superior performance of the proposed methodology is demonstrated. The algorithms developed operate on the time-domain acoustic signals in a nonparametric manner, thus enabling them to operate on other types of time-series data without the need to perform application specific tuning. This is demonstrated in this work as the developed models are successfully applied, without alteration, to landmine signatures resulting from ground penetrating radar data. The nonparametric statistical models developed in this work for the characterization of acoustic signals may ultimately be useful not only in acoustic surveillance but also other topics within acoustic sensing.
Item Open Access Classification and Characterization of Heart Sounds to Identify Heart Abnormalities(2019) LaPorte, EmmaThe main function of the human heart is to act as a pump, facilitating the delivery of oxygenated blood to the many cells within the body. Heart failure (HF) is the medical condition in which a heart cannot adequately pump blood to the body, often resulting from other conditions such as coronary artery disease, previous heart attacks, high blood pressure, diabetes, or abnormal heart valves. HF afflicts approximately 6.5 million adults in the US alone [1] and manifests itself often in the form of fatigue, shortness of breath, increased heart rate, confusion, and more, resulting in a lower quality of life for those afflicted. At the earliest stage of HF, an adequate treatment plan could be relatively manageable, including healthy lifestyle changes such as eating better and exercising more. However, the symptoms (and the heart) worsen overtime if left untreated, requiring more extreme treatment such as surgical intervention and/or a heart transplant [2]. Given the magnitude of this condition, there is potential for large impact both in (1) automating (and thus expediting) the diagnosis of HF and (2) in improving HF treatment options and care. These topics are explored in this work.
An early diagnosis of HF is beneficial because HF left untreated will result in an increasingly severe condition, requiring more extreme treatment and care. Typically, HF is first diagnosed by a physician during auscultation, which is the act of listening to sounds from the heart through a stethoscope [3]. Therefore, physicians are trained to listen to heart sounds and identify them as normal or abnormal. Heart sounds are the acoustic result of the internal pumping mechanism of the heart. Therefore, when the heart is functioning normally, there is a resulting acoustic spectrum representing normal heart sounds, that a physician listens to and identifies as normal. However, when the heart is functioning abnormally, there is a resulting acoustic spectrum that differs from normal heart sounds, that a physician listens to and identifies as abnormal [3]–[5].
One goal of this work is to automate the auscultation process by developing a machine learning algorithm to identify heart sounds as normal or abnormal. An algorithm is developed for this work that extracts features from a digital stethoscope recording and classifies the recording as normal or abnormal. An extensive feature extraction and selection analysis is performed, ultimately resulting in a classification algorithm with an accuracy score of 0.85. This accuracy score is comparable to current high-performing heart sound classification algorithms [6].
The purpose of the first portion of this work is to automate the HF diagnosis process, allowing for more frequent diagnoses and at an earlier stage of HF. For an individual already diagnosed with HF, there is potential to improve current treatment and care. Specifically, if the HF is extreme, an individual may require a surgically implanted medical device called a Left Ventricular Assist Device (LVAD). The purpose of an LVAD is to assist the heart in pumping blood when the heart cannot adequately do so on its own. Although life-saving, LVADs have a high complication rate. These complications are difficult to identify prior to a catastrophic outcome. Therefore, there is a need to monitor LVAD patients to identify these complications. Current methods of monitoring individuals and their LVADs are invasive or require an in-person hospital visit. Acoustical monitoring has the potential to non-invasively remotely monitor LVAD patients to identify abnormalities at an earlier stage. However, this is made difficult because the LVAD pump noise obscures the acoustic spectrum of the native heart sounds.
The second portion of this work focuses on this specific case of HF, in which an individual’s treatment plan includes an LVAD. A signal processing pipeline is proposed to extract the heart sounds in the presence of the LVAD pump noise. The pipeline includes down sampling, filtering, and a heart sound segmentation algorithm to identify states of the cardiac cycle: S1, S2, systole, and diastole. These states are validated using two individuals’ digital stethoscope recordings by comparing the labeled states to the characteristics expected of heart sounds. Both subjects’ labeled states closely paralleled the expectations of heart sounds, validating the signal processing pipeline developed for this work.
This exploratory analysis can be furthered with the ongoing data collection process. With enough data, the goal is to extract clinically relevant information from the underlying heart sounds to assess cardiac function and identify LVAD disfunction prior to a catastrophic outcome. Ultimately, this non-invasive, remote model will allow for earlier diagnosis of LVAD complications.
In total, this work serves two main purposes: the first is developing a machine learning algorithm that automates the HF diagnosis process; the second is extracting heart sounds in the presence of LVAD noise. Both of these topics further the goal of earlier diagnosis and therefore better outcomes for those afflicted with HF.
Item Open Access Data Driven Style Transfer for Remote Sensing Applications(2022) Stump, EvanRecent recognition models for remote sensing data (e.g., infrared cameras) are based upon machine learning models such as deep neural networks (DNNs) and typically require large quantities of labeled training data. However, many applications in remote sensing suffer from limited quantities of training data. To address this problem, we explore style transfer methods to leverage preexisting large and diverse datasets in more data-abundant sensing modalities (e.g., color imagery) so that they can be used to train recognition models on data-scarce target tasks. We first explore the potential efficacy of style transfer in the context of Buried Threat Detection using ground penetrating radar data. Based upon this work we found that simple pre-processing of downward-looking GPR makes it suitable to train machine learning models that are effective at recognizing threats in hand-held GPR. We then explore cross modal style transfer (CMST) for color-to-infrared stylization. We evaluate six contemporary CMST methods on four publicly-available IR datasets, the first comparison of its kind. Our analysis reveals that existing data-driven methods are either too simplistic or introduce significant artifacts into the imagery. To overcome these limitations, we propose meta-learning style transfer (MLST), which learns a stylization by composing and tuning well-behaved analytic functions. We find that MLST leads to more complex stylizations without introducing significant image artifacts and achieves the best overall performance on our benchmark datasets.
Item Open Access Deep Learning for Applications in Inverse Modeling, Legislator Analysis, and Computer Vision for Security(2023) Spell, Gregory PaulTo judiciously use machine learning – particularly deep learning – requires identifying how to extract features from data and effectively leveraging those features to make predictions. This dissertation concerns deep learning methods for three applications: inverse modeling, legislator analysis, and computer vision for security. To address inverse problems, we present a new method, the Mixture Manifold Network, which uses multiple neural backward models in a forward-backward architecture. We experimentally demonstrate that the Mixture Manifold Network performs better than computationally fast generative model baselines, while performance approaching that of computationally slow iterative methods. For legislator modeling, we seek to learn representations that capture legislator attitudes that may not be contained in their voting records. We present a model that instead considers their tweeting behavior, and we use reactions to former President Donald Trump on Twitter as an illustrative example. For computer vision, we address two security-related applications using deep convolutional feature extractors. In the first of these, we leverage domain adaptation with deep object detection for threatening items – such as guns, knives, and blunt objects – in X-ray scans of air passenger luggage. In the second, we apply an occlusion-robust classifier to infrared imagery. For each application above, we describe the datasets for the problem, how the presented methods extract features from that data, and how efficacious predictions are produced from each of our proposed models
Item Open Access Deep Learning for the modeling and design of artificial electromagnetic materials(2023) Ren, SimiaoArtificial electromagnetic materials (AEMs) are materials that exhibit unusual electromagnetic properties. With sub-wavelength, periodic structures, AEMs can achieve incredible abilities to manipulate light, like the cloaking effect of the “invisibility cloak” in the Harry Potter movie. Apart from the cinematic application of invisibility, AEMs have important applications ranging from high-efficiency solar panels to next-generation communications systems. The major goal of this thesis is to develop deep learning tools to design materials that have increasingly customized interactions with electromagnetic waves, thus enabling more useful technologies. In turn, this necessitates the modeling and design of increasingly complicated materials. Modeling of these materials is difficult because (i) the physics of advanced materials is intrinsically more complicated with no simple analytical form, (ii) the manufacture of such nano-structures is prohibitively expensive, and (iii) the computational electromagnetic simulation software is too slow to iterative through trail-n-error. Recently, the advancement of deep learning bring new perspectives on such a problem. In this thesis, we explore deep learning for the modeling and design of advanced photonic materials. In particular, we explore and make important contributions to two fundamental areas: inverse design, and active learning. In inverse design, we develop an accurate method, “Neural-adjoint,” and show its dominance not only in simple inverse problems but also in contemporary AEM design problems. We further analyze and benchmark eight state-of-the-art deep inverse approaches in the AEM inverse design and discover that the one-to-manyness of the problem is an important factor in such a problem. Then, motivated by the immediate drawback that all deep inverse models require a large set of labeled data, we investigate the benefit of active learning in the setting of AEM design and scientific computing in general. By setting the problem close to a real application where pool size is unknown, we find the majority of deep regression pool-based active learning methods in our benchmark lack robustness and don’t outperform even random sampling consistently.
Item Open Access Evaluating Human Performance in Virtual Reality Based on Psychophysiological Signal Analysis(2018) Clements, JillianPhysiological signals measured from the body, such as brain activity and motor behavior, can be used to infer different physiological states or processes in humans. Signal processing and machine learning often play a fundamental role in this assessment, providing unique approaches to analyzing and interpreting physiological data for a variety of applications, such as medical diagnosis and human-computer interaction. In this work, these approaches were utilized and adapted for two separate applications: brain-computer interfaces (BCIs) and the assessment of visual-motor skill in virtual reality (VR).
The goal of BCI technology is to allow people with severe motor impairments to control a device without the need for voluntary muscle control. Conventional BCIs operate by converting electrophysiological signals measured from the brain into meaningful control commands, eliminating the need for physical interaction with the system. However, despite encouraging improvements over the last decade, BCI use remains primarily in research laboratories. One of the biggest obstacles limiting their daily in-home use is the significant amount of time and expertise that is often required to set up the biosensors (electrodes) for recording brain activity. The most common modality for brain recording is electroencephalography (EEG), which typically employs gel-based “wet” electrodes for recording signals with high signal-to-noise ratios (SNRs). However, while wet electrodes record higher quality signals than dry electrodes, they often hinder frequent use because of the complex and time-consuming process of applying the electrodes to the scalp. Therefore, in this research, a signal processing solution was implemented to help mitigate noise in a dry electrode system to facilitate a more practical BCI device for everyday use in people with severe motor impairments. This solution utilized a Bayesian algorithm that automatically determined the amount of EEG data to collect online based on the quality of incoming data. The hypothesis for this research was that the algorithm would detect the need for additional data collection in low SNR scenarios, such as those in the dry electrode systems, and collect sufficient data to improve BCI performance. In addition to this solution, two anomaly detection techniques were implemented to characterize the differences between the wet and dry electrode recordings to determine if any additional types of signal processing would further improve BCI performance with dry electrodes. Taken as a whole, this research demonstrated the impact of noise in dry electrode recordings on BCI performance and showed the potential of a signal processing approach for noise mitigation. However, further signal processing efforts are likely necessary for full mitigation and adoption of dry electrodes for use in the home.
The second study presented in this work focused on signal processing and machine learning techniques for assessing visual-motor skill during a simulated marksmanship task in immersive VR. Immersive VR systems offer flexible control of an interactive environment, along with precise position and orientation tracking of realistic movements. These systems can also be used in conjunction with brain monitoring techniques, such as EEG, to record neural signals as individuals perform complex motor tasks. In this study, these elements were fused to investigate the psychophysiological mechanisms underlying visual-motor skill during a multi-day simulated marksmanship training regimen. On each of 3 days, twenty participants performed a task where they were instructed to shoot simulated clay pigeons that were launched from behind a trap house using a mock firearm controller. Through the practice of this protocol, participants significantly improved their shot accuracy and precision. Furthermore, systematic changes in the variables extracted from the EEG and kinematic signals were observed that accompanied these improvements in performance. Using a machine learning approach, two predictive classification models were developed to automatically determine the combinations of EEG and kinematic variables that best differentiated successful (target hit) from unsuccessful (target miss) trials and high-performing participants (top fourth) from low-performing participants (bottom fourth). Finally, in order to capture the more complex patterns of human motion in the spatiotemporal domain, time series methods for motion trajectory prediction were developed that utilized the raw tracking data to estimate the future motion of the firearm controller. The objective of this approach was to predict whether the controller’s virtually projected ray would intersect with the target before the trigger was pulled to shoot, with the eventual goal of alerting participants in real-time when shooting may be suboptimal.
Overall, the findings from this research project point towards a comprehensive psychophysiological signal processing approach that can be used to characterize and predict human performance in VR, which has the potential to revolutionize the design of current simulation-based training programs for realistic visual-motor tasks.
Item Open Access Exploiting Multi-Look Information for Landmine Detection in Forward Looking Infrared Video(2013) Malof, JordanForward Looking Infrared (FLIR) cameras have recently been studied as a sensing modality for use in landmine detection systems. FLIR-based detection systems benefit from larger standoff distances and faster rates of advance than other sensing modalities, but they also present significant challenges for detection algorithm design. FLIR video typically yields multiple looks at each object in the scene, each from a different camera perspective. As a result each object in the scene appears in multiple video frames, and each time at a different shape and size. This presents questions about how best to utilize such information. Evidence in the literature suggests such multi-look information can be exploited to improve detection performance but, to date, there has been no controlled investigation of multi-look information in detection. Any results are further confounded because no precise definition exists for what constitutes multi-look information. This thesis addresses these problems by developing a precise mathematical definition of "a look", and how to quantify the multi-look content of video data. Controlled experiments are conducted to assess the impact of multi-look information on FLIR detection using several popular detection algorithms. Based on these results two novel video processing techniques are presented, the plan-view framework and the FLRX algorithm, to better exploit multi-look information. The results show that multi-look information can have a positive or negative impact on detection performance depending on how it is used. The results also show that the novel algorithms presented here are effective techniques for analyzing video and exploiting any multi-look information to improve detection performance.
Item Embargo Harnessing Recent Online Data to Improve Brain-Computer Interface Operation(2024) Chen, XinlinBrain-computer interfaces (BCIs) have the potential to restore or replace lost neural output after injury or disease. BCIs operate by processing and interpreting brain signals such as (non-invasive) electroencephalography (EEG) data to decode user intent into commands to control external devices. An important consideration in deploying BCIs is that brain activity is constantly evolving, making long-term accurate interpretation of brain activity a challenge.
This work focuses on the P300 speller, a BCI that can be used to restore communication abilities for individuals whose motor abilities have been severely compromised, such as people with amyotrophic lateral sclerosis (ALS). The P300 speller enables users to select their target character from a set of choices using their brain activity. In order to spell a character, the BCI presents a stream of stimuli to the user, anticipating that event-related potentials (ERPs) will be elicited within EEG data in response to the presentation of a rare stimulus, specifically one containing the target character. The characteristics of ERPs will change over time according to factors such as a user’s level of fatigue. The P300 speller uses a machine learning classifier to interpret the user’s brain signals; this classifier is conventionally trained in a supervised manner with EEG data during a calibration phase, with its parameters remaining static as it is applied to real-time EEG data during online operation. As this BCI is used to meet long-term communication needs, it is important to maintain the performance of the P300 speller over longitudinal use. Since the statistical properties of brain signals change over time, the current and most recently available brain signals are most suitable for understanding a user’s current cognitive state. This work focuses on developing methods to enhance the use of real-time data during online BCI operation in order to improve BCI performance.
In this work, it is hypothesized that further incorporating recent online EEG data and BCI decision-making into the P300 speller framework can enhance BCI performance by providing more up-to-date information regarding the user’s intent. This research addresses methods of improving three areas of the P300 speller framework: stimulus selection, classifier learning, and representation learning. During a spelling trial, i.e., the series of stimuli used to select a user’s target character, the presentation of P300 speller stimuli directly influences how ERPs are elicited. The stimulus selection process can be optimized to improve the information content in presented stimuli based on the EEG data and stimulus presentation history from the trial. A new stimulus presentation paradigm is developed with this optimization goal, and which also focuses on designing stimulus characteristics to mitigate psychophysical effects that can negatively influence ERP elicitation. Next, as changing brain activity makes the most recent brain signals more appropriate for interpreting a user’s intent than older brain signals from offline calibration, the BCI classifier learning strategy can be modified from the conventional static approach to an adaptive approach that utilizes real-time EEG data. A semi-supervised learning strategy that relies on EEG data and ground truth labels from BCI calibration, as well as online EEG data and classifier-predicted labels, is explored. This strategy incorporates a language model in order to improve the quality of the classifier-predicted labels used for continual online learning. Finally, attention-based models afford a method to perform contextual representation learning with EEG data. The inputs to an attention-based model can be enhanced for the BCI use case with the goal of learning more robust data representations. As the stimulus presentation sequence is directly connected to the ERP elicitation patterns in a trial, providing this BCI decision-making history can improve the data context available to the model. The studies presented within this work provide evidence that these proposed methods for further integrating real-time EEG data and BCI decision-making history into the spelling framework have the potential to significantly improve BCI communication rates, including over long-term use of the BCI.
Item Open Access Hierarchical Bayesian Learning Approaches for Different Labeling Cases(2015) Manandhar, AchutThe goal of a machine learning problem is to learn useful patterns from observations so that appropriate inference can be made from new observations as they become available. Based on whether labels are available for training data, a vast majority of the machine learning approaches can be broadly categorized into supervised or unsupervised learning approaches. In the context of supervised learning, when observations are available as labeled feature vectors, the learning process is a well-understood problem. However, for many applications, the standard supervised learning becomes complicated because the labels for observations are unavailable as labeled feature vectors. For example, in a ground penetrating radar (GPR) based landmine detection problem, the alarm locations are only known in 2D coordinates on the earth's surface but unknown for individual target depths. Typically, in order to apply computer vision techniques to the GPR data, it is convenient to represent the GPR data as a 2D image. Since a large portion of the image does not contain useful information pertaining to the target, the image is typically further subdivided into subimages along depth. These subimages at a particular alarm location can be considered as a set of observations, where the label is only available for the entire set but unavailable for individual observations along depth. In the absence of individual observation labels, for the purposes of training standard supervised learning approaches, observations both above and below the target are labeled as targets despite substantial differences in their characteristics. As a result, the label uncertainty with depth would complicate the parameter inference in the standard supervised learning approaches, potentially degrading their performance. In this work, we develop learning algorithms for three such specific scenarios where: (1) labels are only available for sets of independent and identically distributed (i.i.d.) observations, (2) labels are only available for sets of sequential observations, and (3) continuous correlated multiple labels are available for spatio-temporal observations. For each of these scenarios, we propose a modification in a traditional learning approach to improve its predictive accuracy. The first two algorithms are based on a set-based framework called as multiple instance learning (MIL) whereas the third algorithm is based on a structured output-associative regression (SOAR) framework. The MIL approaches are motivated by the landmine detection problem using GPR data, where the training data is typically available as labeled sets of observations or sets of sequences. The SOAR learning approach is instead motivated by the multi-dimensional human emotion label prediction problem using audio-visual data, where the training data is available in the form of multiple continuous correlated labels representing complex human emotions. In both of these applications, the unavailability of the training data as labeled featured vectors motivate developing new learning approaches that are more appropriate to model the data.
A large majority of the existing MIL approaches require computationally expensive parameter optimization, do not generalize well with time-series data, and are incapable of online learning. To overcome these limitations, for sets of observations, this work develops a nonparametric Bayesian approach to learning in MIL scenarios based on Dirichlet process mixture models. The nonparametric nature of the model and the use of non-informative priors remove the need to perform cross-validation based optimization while variational Bayesian inference allows for rapid parameter learning. The resulting approach is highly generalizable and also capable of online learning. For sets of sequences, this work integrates Hidden Markov models (HMMs) into an MIL framework and develops a new approach called the multiple instance hidden Markov model. The model parameters are inferred using variational Bayes, making the model tractable and computationally efficient. The resulting approach is highly generalizable and also capable of online learning. Similarly, most of the existing approaches developed for modeling multiple continuous correlated emotion labels do not model the spatio-temporal correlation among the emotion labels. Few approaches that do model the correlation fail to predict the multiple emotion labels simultaneously, resulting in latency during testing, and potentially compromising the effectiveness of implementing the approach in real-time scenario. This work integrates the output-associative relevance vector machine (OARVM) approach with the multivariate relevance vector machine (MVRVM) approach to simultaneously predict multiple emotion labels. The resulting approach performs competitively with the existing approaches while reducing the prediction time during testing, and the sparse Bayesian inference allows for rapid parameter learning. Experimental results on several synthetic datasets, benchmark datasets, GPR-based landmine detection datasets, and human emotion recognition datasets show that our proposed approaches perform comparably or better than the existing approaches.
Item Open Access Image Processing Methods Applied to Landmine Detection in Ground Penetrating Radar(2013) Sakaguchi, Rayn Terin TatsumaRecent advances in statistically based ground penetrating radar (GPR) landmine detection have utilized 2-D slices of data to recognize the hyperbolic shapes caused by a sub-surface landmine. The objective in this research is to identify these shapes using methodology found in the field of image processing. Three different recognition methods were considered; (1) instance matching, which aims to recognize occurrences of a specific object; (2) object detection, which aims to find objects belonging to a class of objects; and (3) category recognition, which aims to categorize entire images based upon the contents of each image. This research consists of the adaptation and evaluation of these methods applied to GPR landmine detection. The results from this work illustrate the additional information that these methods provide to the GPR detection system. In addition, this work shows promise for the application of additional methods from the image processing and computer vision fields.
Item Open Access Information-Based Sensor Management for Static Target Detection Using Real and Simulated Data(2009) Kolba, Mark PhilipIn the modern sensing environment, large numbers of sensor tasking decisions must be made using an increasingly diverse and powerful suite of sensors in order to best fulfill mission objectives in the presence of situationally-varying resource constraints. Sensor management algorithms allow the automation of some or all of the sensor tasking process, meaning that sensor management approaches can either assist or replace a human operator as well as ensure the safety of the operator by removing that operator from a dangerous operational environment. Sensor managers also provide improved system performance over unmanaged sensing approaches through the intelligent control of the available sensors. In particular, information-theoretic sensor management approaches have shown promise for providing robust and effective sensor manager performance.
This work develops information-theoretic sensor managers for a general static target detection problem. Two types of sensor managers are developed. The first considers a set of discrete objects, such as anomalies identified by an anomaly detector or grid cells in a gridded region of interest. The second considers a continuous spatial region in which targets may be located at any point in continuous space. In both types of sensor managers, the sensor manager uses a Bayesian, probabilistic framework to model the environment and tasks the sensor suite to make new observations that maximize the expected information gain for the system. The sensor managers are compared to unmanaged sensing approaches using simulated data and using real data from landmine detection and unexploded ordnance (UXO) discrimination applications, and it is demonstrated that the sensor managers consistently outperform the unmanaged approaches, enabling targets to be detected more quickly using the sensor managers. The performance improvement represented by the rapid detection of targets is of crucial importance in many static target detection applications, resulting in higher rates of advance and reduced costs and resource consumption in both military and civilian applications.
Item Open Access Investigating the Perceptual Effects of Multi-rate Stimulation in Cochlear Implants and the Development of a Tuned Multi-rate Sound Processing Strategy(2009) Stohl, Joshua SimeonIt is well established that cochlear implants (CIs) are able to provide many users with excellent speech recognition ability in quiet conditions; however, the ability to correctly identify speech in noisy conditions or appreciate music is generally poor for implant users with respect to normal-hearing listeners. This discrepancy has been hypothesized to be in part a function of the relative decrease in spectral information available to implant users (Rubinstein and Turner, 2003; Wilson et al., 2004). One method that has been proposed for increasing the amount of spectral information available to CI users is to include time-varying stimulation rate in addition to changes in the place of stimulation. However, previous implementations of multi-rate strategies have failed to result in an improvement in speech recognition over the clinically available, fixed-rate strategies (Fearn, 2001; Nobbe, 2004). It has been hypothesized that this lack of success was due to a failure to consider the underlying perceptual responses to multi-rate stimulation.
In this work, psychophysical experiments were implemented with the goal of achieving a better understanding of the interaction of place and rate of stimulation and the effects of duration and context on CI listeners' ability to detect changes in stimulation rate. Results from those experiments were utilized in the implementation of a tuned multi-rate sound processing strategy for implant users in order to potentially ``tune" multi-rate strategies and improve speech recognition performance.
In an acute study with quiet conditions, speech recognition performance with a tuned multi-rate implementation was better than performance with a clinically available, fixed-rate strategy, although the difference was not statistically significant. These results suggest that utilizing time-varying pulse rates in a subject-specific implementation of a multi-rate algorithm may offer improvements in speech recognition over clinically available strategies. A longitudinal study was also performed to investigate the potential benefit from training to speech recognition. General improvements in speech recognition ability were observed as a function of time; however, final scores with the tuned multi-rate algorithm never surpassed performance with the fixed-rate algorithm for noisy conditions.
The ability to improve upon speech recognition scores for quiet conditions with respect to the fixed-rate algorithm suggests that using time-varying stimulation rates potentially provides additional, usable information to listeners. However, performance with the fixed-rate algorithm proved to be more robust to noise, even after three weeks of training. This lack of robustness to noise may be in part a result of the frequency estimation technique used in the multi-rate strategy, and thus more sophisticated techniques for real-time frequency estimation should be explored in the future.
Item Open Access Learning and Exploiting Camera Geometry for Computer Vision(2016) Wang, PatrickThis work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.
The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.
Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.
Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.
The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.
Item Open Access Mitigating Cochlear Implant Stimulus Pulses Dominated by Reverberant Distortions(2022) Shahidi, Lidea KatherineCochlear implant (CI) users can experience considerable degradations in speech intelligibility in reverberant environments (Kressner et al., 2018). Reverberation occurs when sound reflects off of surfaces in an enclosure, with reverberant reflections arriving at the listener at the same time as the direct speech signal. CI speech processing transforms the amplitude of frequency envelopes into discrete stimulus pulses across electrode channels. When the speech envelope is degraded by the simultaneous arrival of reverberant reflections, speech intelligibility is degraded. Removing reverberant artifacts from either the reverberant speech signal (Hazrati et al., 2013b, 2013a; Hazrati and Loizou, 2013; Kokkinakis et al., 2011) or the reverberant stimulus pattern (Desmond, 2014; Desmond et al., 2014) can restore speech envelope structure and significantly recover speech intelligibility for CI users in reverberation. The mitigation strategies applied to the reverberant speech signal used a measure of local signal distortion to indicate speech-dominant portions of the reverberant signal. The measure of signal distortion required a priori knowledge of the clean speech signal or the reverberant room. A few studies proposed statistical signal processing techniques to estimate the measure of distortion (Hazrati et al., 2013b, 2013a; Hazrati and Loizou, 2013). Reverberant speech mitigated by these strategies resulted in improvements in speech intelligibility for CI users, indicating that a measure of reverberant distortion can identify portions of the reverberant signal that are detrimental to intelligibility for CI users. Unfortunately, mitigation prior to CI processing presents limitations for real-time feasibility, imposing a delay on stimulation at the electrode array. For CI users to receive benefit from a mitigation algorithm, the delay imposed by reverberation mitigation must be minimized to avoid audio-visual synchrony (Hay-McCutcheon et al., 2009). Further, since CI users are likely to encounter a number of reverberant scenarios in their daily lives, mitigation of reverberant stimuli must provide intelligibility improvements in a range of listening scenarios. The approaches in (Hazrati et al., 2013b, 2013a; Hazrati and Loizou, 2013) relied on heuristic condition-specific tuning of parameters, yielding algorithm performance that may not be robust to changes in reverberant conditions. An alternative approach to mitigating the effects of reverberation on intelligibility for CI users leveraged a machine learning algorithm to identify reverberant artifacts due to temporal masking within the CI stimulus (Desmond, 2014). The machine learning artifact detection algorithm harnessed a data-driven approach to capture the variability present in the training data and provide artifact detection in a variety of reverberant environment. The approach in (Desmond, 2014) led to improved intelligibility in CI users and demonstrated good generalization to reverberant conditions not contained in the training set. Although temporal masking-based mitigation can remove reverberant artifacts between phonemes, the reverberant distortions within phonemes imposed in highly reverberant listening scenarios can be responsible for significant degradations in intelligibility for CI users (Desmond et al., 2014; Kokkinakis and Loizou, 2011a). Given that a measure of reverberant distortion can identify reverberant speech components that are harmful to intelligibility for CI users both within phonemes and after phoneme termination, distortion-based mitigation strategies are likely to yield intelligibility improvements in a variety of reverberant listening scenarios when compared to temporal masking-based mitigation. We combine the objective of the distortion-based criterion with the data-driven approach of machine learning to enable improved intelligibility for CI users in a range of realistic reverberant listening scenarios. To minimize the delay imposed by the artifact detection algorithm, only causal information is used and removal of reverberant artifacts was implemented at the pulse level. To ensure that the distortion-based mitigation strategy results in the greatest intelligibility benefits, we conduct an intelligibility study to determine the tolerance of intelligibility to the amount of reverberant distortion present in each stimulus pulse. The tolerance for the amount of reverberant distortion present in the mitigated signal was varied and a balance between reverberant artifact removal and speech signal retention was identified. To assess the efficacy of reverberant artifact removal, the intelligibility of the mitigated stimulus pattern must be assessed. Although online intelligibility testing is the gold standard for intelligibility assessment, it can be costly and time-intensive, prohibiting the rapid development of reverberant artifact detection algorithms. This work extends previous analyses of offline intelligibility measures (Falk et al., 2015; Santos et al., 2013) to predict the performance of mitigated reverberant CI stimuli by including the most widely adopted offline measures used in the literature and introducing new speech intelligibility data for validation. The correlation of offline measure scores to online intelligibility results is reported, indicating offline measures that leverage speech features relevant to the intelligibility of mitigated reverberant CI stimuli, and adjustments to offline measure evaluation are proposed to improve measure performance when applied to reverberant CI-processed signals. After validating the performance of offline intelligibility measures, we investigate the use of various machine learning models with different parametric complexities for identifying CI stimulus pulses dominated by reverberant distortion. To minimize the delay imposed by the algorithms, only causal frames of information are used. To test the algorithm’s robustness to unseen reverberant conditions, the models are tested in reverberant environments not encountered during training. We test the speech intelligibility of the CI stimulus after mitigation by each artifact detection algorithm with normal hearing subjects using a simulation of CI perception. We find that artifact detection performance improves significantly when models are trained on speech material spoken in a reverberant environment that closely resembles the target reverberant environment. Additionally, we demonstrate a trade-off between the computational complexity of the model and the ability of the model to generalize to speech spoken in unseen reverberant environments. As it would be infeasible to train one machine learning model to mitigate the effects of reverberation for every room a CI user might encounter, we investigate characteristics of reverberant rooms that may indicate similarities in artifact addition across different reverberant environments. We demonstrate the feasibility of applying specialized artifact detection models, termed room-matching models, to improve artifact detection performance in a range of realistic reverberant listening scenarios. To determine similarities in reverberant listening scenarios, room-matching models leverage measures of acoustic reverberation, estimated for the target listening scenario. We find that the reverberation time and clarity index measures reflect trends in reverberant artifact addition, echoing the findings that these acoustic measures indicate trends in intelligibility for CI users (Badajoz-Davila et al., 2020; Kokkinakis et al., 2011; Kressner et al., 2018). Overall, the findings from this research project indicate a framework for identifying reverberant artifacts in cochlear implant stimuli, which has the potential to improve intelligibility outcomes for cochlear implant users in a variety of real-world reverberant listening conditions.
Item Open Access Noisefield Estimation, Array Calibration and Buried Threat Detection in the Presence of Diffuse Noise(2019) Bjornstad, Joel NilsOne issue associated with all aspects of the signal processing and decision making fields is that signals of interest are corrupted by noise. This work specifically considers scenarios where the primary noise source is external to an array of receivers and is diffuse. Spatially diffuse noise is considered in three scenarios: noisefield estimation, array calibration using diffuse noise as a source of opportunity, and detection of buried threats using Ground Penetrating Radar (GPR).
Modeling the ocean acoustic noise field is impractical as the noise seen by a receiver is dependent on the position of distant shipping (a major contributing source of low frequency noise) as well as the temperature, pressure, salinity and bathymetry of the ocean. Measuring the noise field using a standard towed array is also not practical due the inability of a line array to distinguish signals arriving at different elevations as well the presence of the well-known left/right ambiguity. A method to estimate the noisefield by fusing data from a traditional towed array and two small-aperture planar arrays is developed. The resulting noise field estimates can be used to produce synthetic covariance matrices that exhibit parity performance with measured covariance matrices when used in a Matched Subspace Detector.
For a phased array to function effectively, the positions of the array elements must be well calibrated. Previous efforts in the literature have primarily focused on use of discrete sources for calibration. The approach taken here focuses on using spatially oversampled, overlapping sub-arrays. The distance between elements is determine using The geometry of each individual sub-array is determined using Maximum Likelihood estimates of the interelement distances and determining the geometry of each sub array using Multidimensional Scaling. The overlapping sub-arrays are then combined into a single array. The algorithm developed in this work performs well in simulation. Limitations in the experimental setup preclude drawing firm conclusions based on an in-air test of the algorithm.
Ground penetrating radar (GPR) is one of the most successful methods to detect landmines and other buried threats. GPR images, however, are very noisy as the propagation path through soil is quite complex. It is a challenging problem to classify GPR images as threats or non-threats. Successful buried threat classification algorithm rely on a handcrafted feature descriptor paired with a machine learning classifier. In this work the state-of-the-art Spatial Edge Descriptor (SED) feature was implemented as a neural network. This implementation allows the feature and the classifier to be trained simultaneously and expanded with minimal intervention from a designer. Impediments to training this novel network were identified and a modified network proposed that surpasses the performance of the baseline SED algorithm.
These cases demonstrate the practicality of mitigating or using diffuse background noise to achieve desired engineering results.
Item Open Access Nonparametric Bayesian Context Learning for Buried Threat Detection(2012) Ratto, Christopher RalphThis dissertation addresses the problem of detecting buried explosive threats (i.e., landmines and improvised explosive devices) with ground-penetrating radar (GPR) and hyperspectral imaging (HSI) across widely-varying environmental conditions. Automated detection of buried objects with GPR and HSI is particularly difficult due to the sensitivity of sensor phenomenology to variations in local environmental conditions. Past approahces have attempted to mitigate the effects of ambient factors by designing statistical detection and classification algorithms to be invariant to such conditions. These methods have generally taken the approach of extracting features that exploit the physics of a particular sensor to provide a low-dimensional representation of the raw data for characterizing targets from non-targets. A statistical classification rule is then usually applied to the features. However, it may be difficult for feature extraction techniques to adapt to the highly nonlinear effects of near-surface environmental conditions on sensor phenomenology, as well as to re-train the classifier for use under new conditions. Furthermore, the search for an invariant set of features ignores that possibility that one approach may yield best performance under one set of terrain conditions (e.g., dry), and another might be better for another set of conditions (e.g., wet).
An alternative approach to improving detection performance is to consider exploiting differences in sensor behavior across environments rather than mitigating them, and treat changes in the background data as a possible source of supplemental information for the task of classifying targets and non-targets. This approach is referred to as context-dependent learning.
Although past researchers have proposed context-based approaches to detection and decision fusion, the definition of context used in this work differs from those used in the past. In this work, context is motivated by the physical state of the world from which an observation is made, and not from properties of the observation itself. The proposed context-dependent learning technique therefore utilized additional features that characterize soil properties from the sensor background, and a variety of nonparametric models were proposed for clustering these features into individual contexts. The number of contexts was assumed to be unknown a priori, and was learned via Bayesian inference using Dirichlet process priors.
The learned contextual information was then exploited by an ensemble on classifiers trained for classifying targets in each of the learned contexts. For GPR applications, the classifiers were trained for performing algorithm fusion For HSI applications, the classifiers were trained for performing band selection. The detection performance of all proposed methods were evaluated on data from U.S. government test sites. Performance was compared to several algorithms from the recent literature, several which have been deployed in fielded systems. Experimental results illustrate the potential for context-dependent learning to improve detection performance of GPR and HSI across varying environments.
Item Open Access Physically Motivated Feature Development for Machine Learning Applications(2017) Czarnek, NicholasFeature development forms a cornerstone of many machine learning applications. In this work, we develop features, motivated by physical or physiological knowledge, for several applications: energy disaggregation, brain cancer prognosis, and landmine detection with seismo-acoustic vibrometry (SAVi) sensors. For event-based energy disaggregation, or the automated process of extracting component specific energy data from a building's aggregate power signal, we develop low dimensional features that capture transient information near changes in energy signals. These features reflect the circuit composition of devices comprising the aggregate signal and enable classifiers to discriminate between devices. To develop image based biomarkers, which may help clinicians develop treatment strategies for patients with glioblastoma brain tumors, we exploit physiological evidence that certain genes are both predictive of patient survival and correlated with tumor shape. We develop features that summarize tumor shapes and therefore serve as surrogates for the genetic content of tumors, allowing survival prediction. Our final analysis and the main focus of this document is related to landmine detection using SAVi sensors. We exploit knowledge of both landmine shapes and the interactions between acoustic excitation and the ground's vibration response to develop features that are indicative of target presence. Our analysis, which employs these novel features, is the first evaluation of a large dataset recorded under realistic conditions and provides evidence for the utility of SAVi systems for use in combat arenas.
Item Open Access Probabilistic Methods for Improving the Performance of the P300 Speller Brain Computer Interface(2018) Kalika, DmitryMany augmentative and alternative communication (AAC) devices have been developed to aid individuals who have some form of severe neuromuscular disorder to communicate with the outside world, such as Amyotrophic Lateral Sclerosis (ALS). Eye-trackers are used as a primary communication device for people with ALS until they lose the ability to use their eyes. As an alternative to eye-trackers, the P300 speller brain-computer interface (BCI) is a non-invasive mode of communication that utilizes electroencephalography(EEG) data.
The P300 speller relies on eliciting and detecting event related potentials (ERPs) that occur in the EEG data when a rare or unexpected visual or auditory stimulus is presented to the user; however, only visual stimuli are used in this work. The P300 speller displays characters or symbols in a grid on a computer screen and presents (i.e. flashes) a subset of the characters simultaneously; these presentations are known as the stimuli. The presentation of the target (i.e., desired) character should elicit an ERP. After many repetitions of presentations, the speller attempts to estimate the desired character using the EEG data. This thesis is composed of two primary methods to improve the P300 speller: A novel data-driven adaptive stimulus selection paradigm based on maximizing the expected discrimination gain (EDG) metric; and fusion of an eye-gaze data stream to develop a hybrid P300 and eye-gaze speller.
Many pseudo-random stimulus presentation paradigms (i.e., patterns) have been developed to improve the accuracy and decrease the time required to communicate via the P300 speller. Few data-driven, adaptive, stimulus presentation paradigms have been developed, however, they are computationally expensive, thus have limited flexibility in the groups of characters that can be presented simultaneously. In this thesis, a novel data-driven, adaptive, stimulus selection approach based on maximizing the expected discrimination gain (EDG) is introduced. Various restrictions are set on the characters that can be presented based on system and physiological constraints. Simulations show that even with various restrictions on the proposed adaptive paradigm, the adaptive paradigm yields a higher accuracy and a decrease in time required to spell compared to the most commonly used row/column random paradigm. Online results show that the proposed paradigm decreases the time required to spell, however, there is a slight decrease in speller accuracy.
In addition to setting restrictions based on physiological effects, this thesis presents work done on explicitly modeling refractory effects, probabilistically, on a subject-specific basis. Refractory effects occur when the time between target stimulus presentations is not sufficiently long, resulting in decreased SNRs of the ERP. By modeling the refectory effects, the adaptive stimulus selection paradigm can automatically choose characters to present that minimize refractory effects, without having to explicitly set ad-hoc restrictions. Offline simulations showed that modeling refractory effects explicitly has the potential to further increase the accuracy, and decrease the time required to spell.
Beyond improving the independent P300 speller, there has been recent interest in developing a hybrid (or “fused”) BCI system. In this thesis, a probabilistic hybrid P300 and eye-tracker is developed and its effectiveness is explored. The hybrid speller collects both eye-tracking and EEG data in parallel, and the user spells the characters in the same way that they would spell them using the traditional P300 speller. Both online and offline experiments are performed to analyze the hybrid speller. Online results showed that for the fifteen non-disabled participants, the hybrid speller improved accuracy and reduced the time required to spell a character. Offline simulations showed that the system is more robust to eye-gaze abnormalities than a stand-alone eye-gaze system.