Browsing by Subject "Artificial intelligence"
Results Per Page
Sort Options
Item Open Access A Cell Decomposition Approach to Robotic Trajectory Planning via Disjunctive Programming(2012) Swingler, AshleighThis thesis develops a novel solution method for the problem of collision-free, optimal control of a robotic vehicle in an obstacle populated environment. The technique presented combines the well established approximate cell decomposition methodology with disjunctive programming in order to address both geometric and kinematic trajectory concerns. In this work, an algorithm for determining the shortest distance, collision-free path of a robot with unicycle kinematics is developed. In addition, the research defines a technique to discretize nonholonomic vehicle kinematics into a set of mixed integer linear constraints. Results obtained using the Tomlab/CPLEX mixed integer quadratic programming software exhibit that the method developed provides a powerful initial step in reconciling geometric path planning methods with optimal control techniques.
Item Open Access A Comparison of the Attitudes of Human Resource (HR) Executives and HR Practitioners on the Use of Artificial Intelligence (AI)-Enabled Tools in Recruiting(2022) Boyd, Kristi ShevkunAs part of the technological growth in HR, companies are developing and adopting AI-enabled solutions for recruitment of qualified talent for a job opening. AI-enabled recruiting tools provide a variety of potential benefits to an organization: from improving overall efficiency and lowering hiring costs, to automating repetitive tasks and removing human biases. AI-enabled tools in recruiting also introduce concerns about dehumanization of the hiring process, increased discrimination, and accidental exclusion of qualified candidates. These benefits and concerns are discussed at the HR executive level in industry and in academic contexts; however, the data on the perspectives of HR practitioners is much more limited. Studies show that only 32 percent of companies include individual practitioners within the talent acquisition technology discussions. HR practitioners leverage AI-enabled tools in hiring and, therefore, should be aware of and able to mitigate potential risks of leveraging AI-enabled tools. Lack of consideration of perspectives of HR practitioners on the benefits and risks of AI-enabled tools increases the possibility of ethical concerns and legal liability for the individual companies (Nankervis, 2021). HR executives need take into consideration the perspectives of HR practitioners who work with AI-enabled tools as this awareness is likely to help the businesses successfully realize their talent management goals. This paper is based on the hypothesis that the perspectives of HR practitioners on the use of AI-enabled tools in hiring differ from the perspectives of HR executives and need to be addressed to ensure that organizations can successfully and ethically implement AI-enabled tools within organizations. Robinson 2019, states that “examination of the practitioners’ perspective [is] a valuable part of AI technology adoption, if organizations hope to have employees support and embrace the accompanying changes." This paper contributes to the examination of practitioner’s perspectives by identifying an information gap that may influence attitudes of individual HR practitioners on the use of AI-enabled recruiting tools. The paper provides additional insights into the attitudes of individual HR practitioners in the United States (U.S.) through a new small-sample survey finding. The survey findings highlight the different attitudes that individual HR practitioners have towards the use of AI-enabled recruiting tools, especially when compared with those of HR executives. This survey is an initial step for more robust research and lays the foundation for follow up research topics. Finally, the paper provides recommendations that can help organizations ethically implement AI-enabled tools by ensuring the attitudes of individual HR practitioners are taken into consideration.
Item Open Access A Dilemma for Criminal Justice Under Social Injustice(2019) Ariturk, DenizA moral dilemma confronts criminal justice in unjust states. If the state punishes marginalized citizens whose crimes are connected to conditions of systemic injustice the state has failed to alleviate, it perpetuates a further injustice to those citizens. If the state does not punish, it perpetuates an injustice to victims of crime whose protection is the duty of the criminal justice system. Thus, no reaction to crime by the unjust state appears to avoid perpetuating further injustice. Tommie Shelby proposes a new solution to this old dilemma, suggesting that certain theoretical and practical qualifications can save the unjust state from perpetuating injustice. He argues that punishment can be just even as society remains unjust if it is: (a) administered through a fair criminal justice apparatus; (b) only directed at mala in se crimes; and (c) not expressive of moral judgment. In the first part of this thesis, I explore Shelby’s solution to show that certain aspects of his framework are superior to alternative ones, but that it nonetheless fails to resolve the dilemma. In Part 2, I use a novel technological reform that promises to make criminal justice fairer, the AI risk assessment, as a case study to show why even punishment that meets Shelby’s criteria will continue to perpetuate injustice as long as it operates under systemic social injustice. Punishment can only be just if society is.
Item Open Access A NEW ZEROTH-ORDER ORACLE FOR DISTRIBUTED AND NON-STATIONARY LEARNING(2021) Zhang, YanZeroth-Order (ZO) methods have been applied to solve black-box or simulation-based optimization prroblems. These problems arise in many important applications nowa- days, e.g., generating adversarial attacks on machine learning systems, learning to control the system with complicated physics structure or human in the loop. In these problem settings, the objective function to optimize does not have an explicit math- ematical form and therefore its gradient cannot be obtained. This invalidates all gradient-based optimization approaches. On the other hand, ZO methods approxi- mate the gradient by using the objective function values. Many existing ZO methods adopt the two-point feedback scheme to approximate the unknown gradient due to its low estimation variance and fast convergence speed. Specifically, two-point ZO method estimates the gradient at the current iterate of the algorithm by querying the objective function value twice at two distinct neighbor points around the current iterate. Such scheme becomes infeasible or difficult to implement when the objective function is time-varying, or when multiple agents collaboratively optimize a global objective function that depends on all agents’ decisions, because the value of the objective function can be queried only once at a single decision point. However, con- ventional ZO method based on one-point feedback is subject to large variance of the gradient estimation and therefore slows down the convergence.
In this dissertation, we propose a novel one-point ZO method based on the residual feedback. Specifically, the residual feedback scheme estimates the gradient using the residual between the values of the objective function at two consecutive iterates of the algorithm. When optimizing a deterministic Lipschitz function, we show that the query complexity of ZO with the proposed one-point residual feedback matches that of ZO with the existing two-point schemes. Moreover, the query complexity of the proposed algorithm can be improved when the objective function has Lipschitz gradient. Then, for stochastic bandit optimization problems, we show that ZO with one-point residual feedback achieves the same convergence rate as that of two-point scheme with uncontrollable data samples.
Next, we apply the proposed one-point residual-feedback gradient estimator to solve online optimizaiton problems, where the objective function varies over time. In the online setting, since each objective function can only be evaluated once at a single decision point, existing two-point ZO methods are not feasible and only one-point ZO methods can be used. We develop regret bounds for ZO with the proposed one- point residual feedback scheme for both convex and nonconvex online optimization problems. Specifically, for both deterministic and stochastic problems and for both Lipschitz and smooth objective functions, we show that using residual feedback can produce gradient estimates with much smaller variance compared to conventional one- point feedback methods. As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems. Additionally, our regret bounds rely on weaker assumptions than those used in conventional one-point feedback methods.
The proposed residual-feedback scheme is next decentralized to conduct dis- tributed policy optimization in the multi-agent reinforcement learning (MARL) prob- lems. Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multi-hop neighbors may incur significant communication overhead. The advantage of the pro- posed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, the local ZO policy gradients relying on one-point residual-feedback significantly reduces the vari- ance of the local policy gradient estimates compared to the conventional one-point policy gradient estimates, improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to a neighborhood of the global optimal policy that depends on the number of consensus steps used to calculate the local estimates of the global accumulated rewards.Another challenge in the distributed ZO optimization problems is that the agents may conduct local updates in an asynchronous fashion when they do not have ac- cess to a global clock. To deal with this challenge, we propose an asynchronous zeroth-order distributed optimization method that relies on the proposed one-point residual feedback gradient estimator. We show that this estimator is unbiased under asynchronous updating, and theoretically analyze its convergence. We demonstrate the effectiveness of all proposed algorithms via extensive numerical experiments.
Item Embargo A Radiomics-Embedded Vision Transformer for Breast Cancer Ultrasound Image Classification Efficiency Improvement(2024) Zhu, HaimingPurpose: To develop a radiomics-embedded vision transformer (RE-ViT) model by incorporating radiomics features into its architecture, seeking to improve the model's efficiency in medical image recognition towards enhanced breast ultrasound image diagnostic accuracy.Materials and Methods: Following the classic ViT design, the input image was first resampled into multiple 16×16 grid image patches. For each patch, 56-dimensional habitat radiomics features, including intensity-based, Gray Level Co-Occurrence Matrix (GLCOM)-based, and Gray Level Run-Length Matrix (GLRLM)-based features, were extracted. These features were designed to encode local-regional intensity and texture information comprehensively. The extracted features underwent a linear projection to a higher-dimensional space, integrating them with ViT’s standard image embedding process. This integration involved an element-wise addition of the radiomics embedding with ViT’s projection-based and positional embeddings. The resultant combined embeddings were then processed through a Transformer encoder and a Multilayer Perceptron (MLP) head block, adhering to the original ViT architecture. The proposed RE-ViT model was studied using a public BUSI breast ultrasound dataset of 399 patients with benign, malignant, and normal tissue classification. The comparison study includes: (1) RE-ViT versus classic ViT training from scratch, (2) pre-trained RE-ViT versus pre-trained ViT (based on ImageNet-21k), (3) RE-ViT versus VGG-16 CNN model. The model performance was evaluated based on accuracy, ROC AUC, sensitivity, and specificity with 10-fold Monte-Carlo cross validation. Result: The RE-ViT model significantly outperformed the classic ViT model, demonstrating superior overall performance with accuracy = 0.718±0.043, ROC AUC = 0.848±0.033, sensitivity = 0.718±0.059, and specificity = 0.859±0.048. In contrast, the classic ViT model achieved accuracy = 0.473±0.050, ROC AUC = 0.644±0.062, sensitivity = 0.473±0.101, and specificity = 0.737±0.065. Pre-trained versions of RE-ViT also showed enhanced performance (accuracy = 0.864±0.031, ROC AUC = 0.950±0.021, sensitivity = 0.864±0.074, specificity = 0.932±0.036) compared to pre-trained ViT (accuracy = 0.675±0.111, ROC AUC = 0.872±0.086, sensitivity = 0.675±0.129, specificity = 0.838±0.096). Additionally, RE-ViT surpassed VGG-16 CNN results (accuracy = 0.553±0.079, ROC AUC = 0.748±0.080, sensitivity = 0.553±0.112, specificity = 0.777±0.089). Conclusion: The proposed radiomics-embedded ViT was successfully developed for ultrasound-based breast tissue classification. Current results underscore the potential of our approach to advance other transformer-based medical image diagnosis tasks.
Item Open Access Accelerated Multi-Criterial Optimization in Radiation Therapy using Voxel-Wise Dose Prediction(2020) Jensen, Patrick JamesIn external beam radiation therapy (EBRT) for cancer patients, it is highly desirable to completely eradicate the cancerous cells for the purpose of improving the patient’s quality of life and increasing the patient’s likelihood of survival. However, there can be significant side effects when large regions of healthy cells are irradiated during EBRT, particularly for organs-at-risk (OARs). Due to the juxtaposition of the cancerous and non-cancerous tissue, trade-offs need to be made between target coverage and OAR sparing during treatment planning. For this reason, the treatment planning process can be posed as a multi-criterial optimization (MCO) problem, which has previously been studied extensively with several exact solutions existing specifically for radiation therapy. Typical MCO implementations for EBRT involve creating, optimizing, and calculating many treatment plans to infer the set of feasible best radiation doses, or the Pareto surface. However, each optimization and calculation can take 10-30 minutes per plan. As a result, generating enough plans to attain an accurate representation of the Pareto surface can be very time-consuming, particularly in higher-dimensions with many possible trade-offs.
The purpose of this study is to streamline the MCO workflow by using a machine-learning model to quickly predict the Pareto surface plan doses, rather than exactly computing them. The primary focus of this study focuses on the development and analysis of the dose prediction model. The secondary focus of this study is to develop new metrics for analyzing the similarity between different Pareto surface interpolations. The tertiary focus of this study is to investigate the feasibility of deliberately irradiating the epidural space in spine stereotactic radiosurgery (SRS), as well as estimate its potential effect on preventing tumor recurrence.
For the primary focus of this study, the model’s architecture proceeds as follows. The model begins by creating an initial dose distribution via an inverse fit of inter-slice and intra-slice PTV distance maps on a voxel-wise basis. The model proceeds by extracting three sets of transverse patches from all structure maps and the initialized dose map at each voxel. The model then uses the patch vectors as inputs for a neural network which updates and refines the dose initialization to achieve a final dose prediction. The primary motivation behind our model is to use our understanding of the general shape of dose distributions to remove much of the nonlinearity of the dose prediction problem, decreasing the difficulty of subsequent network predictions. Our model is able to take the optimization priorities into account during dose prediction and infer feasible dose distributions across a range of optimization priority combinations, allowing for indirect Pareto surface inference.
The model’s performance was analyzed on conventional prostate volumetric modulated arc therapy (VMAT), pancreas stereotactic body radiation therapy (SBRT), and spine stereotactic radiosurgery (SRS) with epidural space irradiation. For each of these treatment paradigms, the Pareto surfaces of many patients were thoroughly sampled to train and test the model. On all of these cases, our model achieved good performance in terms of speed and accuracy. Overfitting was shown to be minimal in all cases, and dose distribution slices and dose-volume histograms (DVHs) were shown for comparison, confirming the proficiency of our model. This model is relatively fast (0.05-0.20 seconds per plan), and it is capable of sampling the entire Pareto surface much faster than commercial dose optimization and calculation engines.
While these results were generally promising, the model achieved lower error on the prostate VMAT treatment plans compared to the pancreas SBRT and spine SRS treatment plans. This is likely due to the existence of heavier beam streaks in the stereotactic treatment plans which are generated by a sharper control of the delivered dose distribution. However, the Pareto surface errors were similar across all three cases, so these dose distribution errors did not propagate to the Pareto objective space.
The secondary focus of this study is the development and analysis of Pareto surface similarity metrics. The dose prediction model can be used to rapidly estimate many Pareto-optimal plans for quick Pareto surface inference. This could allow for a potentially significant increase in the speed at which Pareto surfaces are inferenced to provide treatment planning assistance and acceleration. However, previous investigations into Pareto surface analysis typically do not compare a ground truth Pareto surface with a Pareto surface prediction. Therefore, there is a need to develop a Pareto surface metric in order to evaluate the ability of the model to generate accurate Pareto surfaces in addition to accurate dose distributions.
To address these needs, we developed four Pareto surface similarity metrics, emphasizing the ability to represent distances between the interpolations rather than the sampled points. The most straightforward metric is the root-mean-square error (RMSE) evaluated between matched, sampled points on the Pareto surfaces, augmented by intra-simplex upsampling of the barycentric dimensions of each simplex. The second metric is the Hausdorff distance, which evaluates the maximum closest distance between the sets of sampled points. The third metric is the average projected distance (APD), which evaluates the displacements between the sampled points and evaluates their projections along the mean displacement. The fourth metric is the average nearest-point distance (ANPD), which numerically integrates point-to-simplex distances over the upsampled simplices of the Pareto surfaces. These metrics are compared by their convergence rates as a function of intra-simplex upsampling, the calculation times required to achieve convergence, and their qualitative meaningfulness in representing the underlying interpolated surfaces. For testing, several simplex pairs were constructed abstractly, and Pareto surfaces were constructed using inverse optimization and our dose prediction model applied to conventional prostate VMAT, pancreas SBRT, and spine SRS with epidural irradiation.
For the abstract simplex pairs, convergence within 1% was typically achieved at approximately 50 and 100 samples per barycentric dimension for the ANPD and the RMSE, respectively. The RMSE and the ANPD required approximately 50 milliseconds and 3 seconds to calculate to these sampling rates, respectively, while the APD and HD required much less than 1 millisecond. Additionally, the APD values closely resembled the ANPD limits, while the RMSE limits and HD tended to be more different. The ANPD is likely more meaningful than the RMSE and APD, as the ANPD’s point-to-simplex distance functions more closely represent the dissimilarity between the underlying interpolated surfaces rather than the sampling points on the surfaces. However, in situations requiring high-speed evaluations, the APD may be more desirable due to its speed, lack of subjective specification of intra-simplex upsampling rates, and similarity to the ANPD limits.
The tertiary focus of this study is the analysis of the feasibility of epidural space irradiation in spine SRS. The epidural space is a frequent site of cancer recurrence after spine SRS. This may be due to microscopic disease in the epidural space which is under-dosed to obey strict spinal cord dose constraints. We hypothesized that the epidural space could be purposefully irradiated to prescription dose levels, potentially reducing the risk of recurrence in the epidural space without increasing toxicity. To address this, we sought to analyze the feasibility of irradiating the epidural space in spine SRS. Analyzing the data associated with this study is synergistic to our MCO acceleration study, since the range of trade-offs between epidural space irradiation and spinal cord sparing represents an MCO problem which our dose prediction model may quickly solve.
Spine SRS clinical treatment plans with associated spinal PTV (PTVspine) and spinal cord contours, and prior delivered dose distributions were identified retrospectively. An epidural space PTV (PTVepidural) was contoured to avoid the spinal cord and focus on regions near the PTVspine. Clinical plan constraints included PTVspine constraints (D95% = 1800 cGy, D5% < 1950 cGy) and spinal cord constraints (Dmax < 1300 cGy, D10% < 1000 cGy). Prior clinical plan doses were mapped onto the new PTVepidural contour for analysis. Plans were copied and revised to additionally target the PTVepidural, optimizing PTVepidural D95% after meeting clinical plan constraints. Tumor control probabilities (TCPs) were estimated for the PTVepidural using a radiobiological linear-quadratic model of cell survival for both clinical and revised plans. Clinical and revised plans were compared according to their PTVepidural DVH distributions, D95% distributions, and TCPs.
Seventeen SSRS plans were identified and included in this study. Revised plan DVHs demonstrated higher doses to the epidural low-dose regions, with D95% improving from 10.96 Gy ± 1.76 Gy to 16.84 Gy ± 0.87 Gy (p < 10-5). Our TCP modeling set the clinical plan TCP average to 85%, while revised plan TCPs were all greater than 99.99%. Therefore, irradiating the epidural space in spine SRS is likely feasible, and purposefully targeting the epidural space in SSRS should increase control in the epidural space without significantly increasing the risk of spinal cord toxicity.
Item Open Access Adapting a Kidney Exchange Algorithm to Incorporate Human Values(2017-05-04) Freedman, RachelArtificial morality is moral behavior exhibited by automated or artificially intelligent agents. A primary goal of the field of artificial morality is to design artificial agents that act in accordance with human values. One domain in which computer systems make such value decisions is that of kidney exchange. In a kidney exchange, incompatible patient-donor pairs exchange kidneys with other incompatible pairs instead of waiting for cadaver kidney transplants. This allows these patients to be removed from the waiting list and to receive live transplants, which typically have better outcomes. When the matching of these pairs is automated, algorithms must decide which patients to prioritize. In this paper, I develop a procedure to align these prioritization decisions with human values. Many previous attempts to impose human values on artificial agents have relied on the “top-down approach” of defining a coherent framework of rules for the agent to follow. Instead, I develop my value function by gathering survey participant responses to relevant moral dilemmas, using machine learning to approximate the value system that these responses are based on, and then encoding these into the algorithm. This “bottom-up approach” is thought to produce more accurate, robust, and generalizable moral systems. My method of gathering, analyzing, and incorporating public opinion can be easily generalized to other domains. Its success here therefore suggests that it holds promise for the future development of artificial morality.Item Embargo Adaptive Planning in Changing Policies and Environments(2023) Sivakumar, Kavinayan PillaiarBeing able to adapt to different tasks is a staple of learning, as agents aim to generalize across different situations. Specifically, it is important for agents to adapt to the policies of other agents around them. In swarm settings, multi-agent sports settings, or other team-based environments, agents learning from one another can save time and reduce errors in performance. As a result, traditional transfer reinforcement learning proposes ways to decrease the time it takes for an agent to learn from an expert agent. However, the problem of transferring knowledge across agents that operate in different action spaces and are therefore heterogeneous poses new challenges. Mainly, it is difficult to translate between heterogeneous agents whose action spaces are not guaranteed to intersect.
We propose a transfer reinforcement learning algorithm between heterogeneous agents based on a subgoal trajectory mapping algorithm. We learn a mapping between expert and learner trajectories that are expressed through subgoals. We do so by training a recurrent neural network on trajectories in a training set. Then, given a new task, we input the expert's trajectory of subgoals into the trained model to predict the optimal trajectory of subgoals for the learner agent. We show that the learner agent is able to learn an optimal policy faster with this predicted trajectory of subgoals.
It is equally important for agents to adapt to the intentions of agents around them. To this end, we propose an inverse reinforcement learning algorithm to estimate the reward function of an agent as it updates its policy over time. Previous work in this field assume the reward function is approximated by a set of linear feature functions. Choosing an expressive enough set of feature functions can be challenging, and failure to do so can skew the learned reward function. Instead, we propose an algorithm to estimate the policy parameters of the agent as it learns, bundling adjacent trajectories together in a new form of behavior cloning we call bundle behavior cloning. Our complexity analysis shows that using bundle behavior cloning, we can attain a tighter bound on the difference between the distribution of the cloned policy and that of the true policy than the same bound achieved in standard behavior cloning. We show experiments where our method achieves the same overall reward using the estimated reward function as that learnt from the initial trajectories, as well as testing the feasibility of bundle behavior cloning with different neural network structures and empirically testing the effect of the bundle choice on performance.
Finally, due to the need for agents to adapt to environments that are prone to change due to damage or detection, we propose the design of a robotic sensing agent to detect damage. In such dangerous environments, it may be unsafe for human operators to manually take measurements. Current literature in structural health monitoring proposes sequential sensing algorithms to optimize the number of locations measurements need to be taken at before locating sources of damage. As a result, the robotic sensing agent we designed is mobile, semi-autonomous, and precise in measuring a location on the model structure we built. We detail the components of our robotic sensing agent, as well as show measurement data taken from our agent at two locations on the structure displaying little to no noise in the measurement.
Item Open Access Advancing the Design and Utility of Adversarial Machine Learning Methods(2021) Inkawhich, Nathan AlbertWhile significant progress has been made to craft Deep Neural Networks (DNNs) with super-human recognition performance, their reliability and robustness in challenging operating conditions is still a major concern. In this work, we study multiple facets of the DNN robustness problem by pursuing two main threads of research. The key methodological linkage throughout our investigations is the consistent design/development/utilization/deployment of Adversarial Machine Learning techniques, which have remarkable abilities to both degrade and enhance model performance. Our ultimate goal is to help construct the more safe and reliable models of the future.
In the first thread of research, we take the perspective of an adversary who wishes to find novel and increasingly potent ways to fool current DNN models. Our approach is centered around the development of a feature space attack, and the construction of novel adversarial threat models that work to reduce required knowledge assumptions. Interestingly, we find that a transfer-based blackbox adversary can be significantly more powerful than previously believed, and can reliably cause targeted misclassifications with imperceptible noises. Further, we find that the attacker does not necessarily require access to the target model's training distribution to create transferable attacks, which is a more practically concerning scenario due to the reduction of required attacker knowledge.
Along the second thread of research, we take the perspective of a DNN model designer whose job is to create systems capable of robust operation in ``open-world'' environments, where both known and unknown target types may be encountered. Our approach is to establish a classifier + out-of-distribution (OOD) detector system co-design that is centered around an adversarial training procedure and an outlier exposure-based learning objective. Through various experiments, we find that our systems can achieve high accuracy in extended operating conditions, while reliably detecting and rejecting fine-grained OOD target types. We also develop a method for efficiently improving OOD detection by learning from the deployment environment. Overall, by exposing novel vulnerabilities of current DNNs while also improving the reliability of existing models to known vulnerabilities, our work makes significant progress towards creating the next-generation of more trustworthy models.
Item Open Access ADVANCING VISION INTELLIGENCE THROUGH THE DEVELOPMENT OF EFFICIENCY, INTERPRETABILITY AND FAIRNESS IN DEEP LEARNING MODELS(2024) Kong, FanjieDeep learning has demonstrated remarkable success in developing vision intelligence across a variety of application domains, including autonomous driving, facial recognition, medical image analysis, \etc.However, developing such vision systems poses significant challenges, particularly in relation to ensuring efficiency, interpretability, and fairness. Efficiency requires a model to leverage the least possible computational resources while preserving performance relative to more computationally-demanding alternatives, which is essential for the practical deployment of large-scale models in real-time applications. Interpretability demands a model to align with the domain-specific knowledge of the task it addresses while having the capability for case-based reasoning. This characteristic is especially crucial in high-stakes areas such as healthcare, criminal justice, and financial investment. Fairness ensures that computer vision models do not perpetuate or exacerbate societal biases in downstream applications such as web image search, text-guided image generation, \etc. In this dissertation, I will discuss the contributions that I have made in advancing vision intelligence regarding to efficiency, interpretability and fairness in computer vision models.
The first part of this dissertation will focus on how to design computer vision models to efficiently process very large images.We propose a novel CNN architecture termed { \em Zoom-In Network} that leverages a hierarchical attention sampling mechanisms to select important regions of images to process. Such approach without processing the entire image yields outstanding memory efficiency while maintaining classification accuracy on various tiny object image classification datasets.
The second part of this dissertation will discuss how to build post-hoc interpretation method for deep learning models to obtain insights reasoned from the predictions.We propose a novel image and text insight-generation framework based on attributions from deep neural nets. We test our approach on an industrial dataset and demonstrate our method outperforms competing methods.
Finally, we study fairness in large vision-language models.More specifically, we examined gender and racial bias in text-based image retrieval for neutral text queries. In an attempt to address bias in the test-time phase, we proposed post-hoc bias mitigation to actively balance the demographic group in the image search results. Experiments on multiple datasets show that our method can significantly reduce bias while maintaining satisfactory retrieval accuracy at the same time.
My research in enhancing vision intelligence via developments in efficiency, interpretability, and fairness, has undergone rigorous validation using publicly available benchmarks and has been recognized at leading peer-reviewed machine learning conferences.This dissertation has sparked interest within the AI community, emphasizing the importance of improving computer vision models through these three critical dimensions, namely, efficiency, interpretability and fairness.
Item Open Access AN APPLICATION OF GRAPH DIFFUSION FOR GESTURE CLASSIFICATION(2020) Voisin, Perry SamuelReliable and widely available robotic prostheses have long been a dream of science fiction writers and researchers alike. The problem of sufficiently generalizable gesture recognition algorithms and technology remains a barrier to these ambitions despite numerous advances in computer science, engineering, and machine learning. Often the failure of a particular algorithm to generalize to the population at large is due to superficial characteristics of subjects in the training data set. These superficial characteristics are captured and integrated into the signal intended to capture the gesture being performed. This work applies methods developed in computer vision
and graph theory to the problem of identifying pertinent features in a set of time series modalities.
Item Open Access Android Linguistics: How Machines Do Things With Words(2021) Donahue, EvanThe field of artificial intelligence (AI) was founded on the conviction that in order to make computers more advanced, it was necessary to build them to be more human. Adopting the human form as the blueprint for computer systems allowed AI researchers to imagine and construct computer systems capable of feats otherwise unimaginable for machines. As the institutions and professional boundaries of the field have evolved over the past 70 years, they have at times obscured the figure of the human at the heart of AI work. However, in moments of heightened optimism, when researchers permit themselves to speculate on the fantastic futures AI technologies will one day enable, it is inevitably to this figure that the field returns, forever striving to resolve that originary question of just what is the nature of this human intelligence the field has so long pursued?
In this dissertation, I trace the emergence of the figure of the human at the center of AI work. I argue that the human at the center of the imaginary of AI is rooted in a deeper impulse---that of envisioning not machines that think, but machines that speak. It is language that most fundamentally defines the original ambition of AI work and the inability to conceptualize language apart from the human that draws the field inevitably back to this figure. With language properly at the center of its project, AI becomes a study not of the physical world but of the narrative universe, not of the biological human being but of literary character, not of machinic intelligence but of machinic personhood.
Drawing on the history of AI's entanglements with language, I argue for a reconceptualization of the project of AI around a vision of language not as an encoding of solitary thought but as a collection of shifting social practices that allow human and non-human intelligences to navigate their shared worlds despite their irreducibly alien cognitive realities. Such a reorientation, I contend, makes room for a broader vision of AI work that joins critical and technical practices in the shared project of grappling with the question of what it means to be human.
Item Open Access Anthropomorphic Attachments in U.S. Literature, Robotics, and Artificial Intelligence(2010) Rhee, Jennifer"Anthropomorphic Attachments" undertakes an examination of the human as a highly nebulous, fluid, multiple, and often contradictory concept, one that cannot be approached directly or in isolation, but only in its constitutive relationality with the world. Rather than trying to find a way outside of the dualism between human and not-human, I take up the concept of anthropomorphization as a way to hypersaturate the question of the human. Within this hypersaturated field of inquiry, I focus on the specific anthropomorphic relationalities between human and humanoid technology. Focusing primarily on contemporary U.S. technologies and cultural forms, my dissertation looks at artificial intelligence and robotics in conversation with their cultural imaginaries in contemporary literature, science fiction, film, performance art, and video games, and in conversation with contemporary philosophies of the human, the posthuman, and technology. In reading these discourses as shaping, informing, and amplifying each other and the multiple conceptions of the human they articulate, "Anthropomorphic Attachments" attends to these multiple humans and the multiple morphologies by which anthropomorphic relationalities imagine and inscribe both humanoid technologies and the human itself.
Item Open Access Application of Stochastic Processes in Nonparametric Bayes(2014) Wang, YingjianThis thesis presents theoretical studies of some stochastic processes and their appli- cations in the Bayesian nonparametric methods. The stochastic processes discussed in the thesis are mainly the ones with independent increments - the Levy processes. We develop new representations for the Levy measures of two representative exam- ples of the Levy processes, the beta and gamma processes. These representations are manifested in terms of an infinite sum of well-behaved (proper) beta and gamma dis- tributions, with the truncation and posterior analyses provided. The decompositions provide new insights into the beta and gamma processes (and their generalizations), and we demonstrate how the proposed representation unifies some properties of the two, as these are of increasing importance in machine learning.
Next a new Levy process is proposed for an uncountable collection of covariate- dependent feature-learning measures; the process is called the kernel beta process. Available covariates are handled efficiently via the kernel construction, with covari- ates assumed observed with each data sample ("customer"), and latent covariates learned for each feature ("dish"). The dependencies among the data are represented with the covariate-parameterized kernel function. The beta process is recovered as a limiting case of the kernel beta process. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks.
Last is a non-Levy process example of the multiplicative gamma process applied in the low-rank representation of tensors. The multiplicative gamma process is applied along the super-diagonal of tensors in the rank decomposition, with its shrinkage property nonparametrically learns the rank from the multiway data. This model is constructed as conjugate for the continuous multiway data case. For the non- conjugate binary multiway data, the Polya-Gamma auxiliary variable is sampled to elicit closed-form Gibbs sampling updates. This rank decomposition of tensors driven by the multiplicative gamma process yields state-of-art performance on various synthetic and benchmark real-world datasets, with desirable model scalability.
Item Open Access Artificial Intelligence for added value in the creation, implementation, and evaluation of national export strategies(2022-04-22) Rodríguez, EugeniaA National Export Strategy (NES) is an action plan that sets priorities, allocates resources, and specifies actions to strengthen an economy’s international trade capabilities, seeking to enhance its economic growth and development. In recent times there was an increase in the number of national initiative documents concerning strategic trade and development, with many developing countries facing challenges to ensure their trade dynamics effectively and efficiently contribute to their long-term sustainable development. In this context, technology can be a helpful tool in the NES process. Based on a literature review, use cases, and expert interviews, this report aims to inform the International Trade Centre (ITC) of key ways in which artificial intelligence (AI) can add value to the creation, implementation, and evaluation of a NES. It also identifies important considerations, challenges, and limitations regarding AI adoption in the NES process, and provides conclusions and high-level recommendations.Item Open Access Artificial Intelligence Powered Direct Prediction of Linear Accelerator Machine Parameters: Towards a New Paradigm for Patient Specific Pre-Treatment QA(2021) Lay, Lam MyPurpose: Traditional pre-treatment patient specific QA is known for its high workload for physicist, ineffectiveness at identifying clinically relevant dosimetric uncertainties of treatment plans, and incompatibility with on-line adaptive radiotherapy. Our purpose is to develop a trajectory file based PSQA procedure that allows for a virtual pre-treatment QA that can effectively evaluate the performance and robustness of a treatment plan via a DVH based analysis and can be carried out with online adaptive radiotherapy. For this purpose, we have developed a machine learning model that can predict discrepancy in machine parameters between delivery and treatment plan on a Varian TrueBeam linear accelerator.
Methods: Trajectory log files and DICOM-RT plan files of 30 IMRT plans and 75 VMAT plans from four Varian TrueBeam linear accelerators were collected for analysis. The discrepancy in machine parameters is divided into “conversion error” (from converting DICOM-RT to deliverable machine trajectory) and “delivery error” (difference in machine parameters recorded in trajectory files). Correlation matrices were obtained to determine the linear correlation between actual discrepancy and mechanical parameters, such as MLC velocity, MLC acceleration, control point, dose rate, gravity vector, gantry velocity, and gantry acceleration. Multiple regression algorithms were used to develop machine learning models to predict the total discrepancy in machine parameters and its components based on mechanical parameters. The fully trained models were validated with an independent validation dataset and treatment plans constructed with varying degrees of complexity approaching the limitations of the linear accelerator.
Results: For both IMRT and VMAT, the RMS of conversion error (0.1528 mm) was 4 times greater than the RMS of delivery error (0.0367 mm). A high correlation existed between MLC velocity and both components of discrepancies for IMRT (R2 ∈ [0.61, 0.75]) and VMAT [0.75, 0.85]). Final models trained by data from all linear accelerators can predict MLC delivery errors, conversion errors, and combined errors with a high degree of accuracy and correlation between predicted and actual errors for IMRT (R2 = 0.99, 0.86, 0.98) and VMAT (R2 = 0.84, 0.86, 0.87).
Conclusion: We developed an AI model that can predict total MLC discrepancy on Varian TrueBeam linear accelerator with high accuracy using mechanical parameters from trajectory log files and DICOM-RT plans. The software tool from our previous study has been updated to incorporate the discrepancy in planned position into the predictions of total delivery error. We have released the tool for public uses to enable researchers to simulate a treatment delivery without a physical delivery. The tool also has promise in clinical scenarios by allowing for a virtual pre-treatment QA and can be carried out with online adaptive radiotherapy, thereby increasing the effectiveness of pre-treatment patient specific QA.
Item Open Access Assisting Unsupervised Optical Flow Estimation with External Information(2023) Yuan, ShuaiOptical flow estimation is a long-standing problem in computer vision with broad applications in autonomous driving, robotics, etc.. Due to the scarcity of ground-truth labels, the unsupervised estimation of optical flow is especially important. However, it is a poorly constrained problem and presents challenges in the presence of occlusions, motion boundaries, non-Lambertian surfaces, lack of texture, and illumination changes. Therefore, we explore using external information, namely partial labels, semantics, and stereo views, to assist unsupervised optical flow estimation.Supervised training of optical flow predictors generally yields better accuracy than unsupervised training. However, the improved performance comes at an often high annotation cost. Semi-supervised training trades off accuracy against annotation cost. We use a simple yet effective semi-supervised training method to show that even a small fraction of labels can improve flow accuracy by a significant margin over unsupervised training. In addition, we propose active learning methods based on simple heuristics to further reduce the number of labels required to achieve the same target accuracy. Our experiments on both synthetic and real optical flow datasets show that our semi-supervised networks generally need around 50% of the labels to achieve close to full-label accuracy, and only around 20% with active learning on Sintel. We also analyze and show insights on the factors that may influence active learning performance. Code is available at https://github.com/duke-vision/optical-flow-active-learning-release. Unsupervised optical flow estimation is especially hard near occlusions and motion boundaries and in low-texture regions. We show that additional information such as semantics and domain knowledge can help better constrain this problem. We introduce SemARFlow, an unsupervised optical flow network designed for autonomous driving data that takes estimated semantic segmentation masks as additional inputs. This additional information is injected into the encoder and into a learned upsampler that refines the flow output. In addition, a simple yet effective semantic augmentation module provides self-supervision when learning flow and its boundaries for vehicles, poles, and sky. Together, these injections of semantic information improve the KITTI-2015 optical flow test error rate from 11.80% to 8.38%. We also show visible improvements around object boundaries as well as a greater ability to generalize across datasets. Code is available at https://github.com/duke-vision/semantic-unsup-flow-release. Both optical flow and stereo disparities are image matches and can therefore benefit from joint training. Depth and 3D motion provide geometric rather than photometric information and can further improve optical flow. Accordingly, we design a first network that estimates flow and disparity jointly and is trained without supervision. A second network, trained with optical flow from the first as pseudo-labels, takes disparities from the first network, estimates 3D rigid motion at every pixel, and reconstructs optical flow again. A final stage fuses the outputs from the two networks. In contrast with previous methods that only consider camera motion, our method also estimates the rigid motions of dynamic objects, which are of key interest in applications. This leads to better optical flow with visibly more detailed occlusions and object boundaries as a result. Our unsupervised pipeline achieves 7.36% optical flow error on the KITTI-2015 benchmark and outperforms the previous state-of-the-art 9.38% by a wide margin. It also achieves slightly better or comparable stereo depth results. Code will be made available.
Item Open Access Automatic Identification of Training & Testing Data for Buried Threat Detection using Ground Penetrating Radar(2017) Reichman, DanielGround penetrating radar (GPR) is one of the most popular and successful sensing modalities that has been investigated for landmine and subsurface threat detection. The radar is attached to front of a vehicle and collects measurements on the path of travel. At each spatial location queried, a time-series of measurements is collected, and then the measured set of data are often visualized as images within which the signals corresponding to buried threats exhibit a characteristic appearance. This appearance is typically hyperbolic and has been leveraged to develop several automated detection methods. Many of the detection methods applied to this task are supervised, and therefore require labeled examples of threat and non-threat data for training. Labeled examples are typically obtained by collecting data over deliberately buried threats at known spatial locations. However, uncertainty exists with regards to the temporal locations in depth at which the buried threat signal exists in the imagery. This uncertainty is an impediment to obtaining labeled examples of buried threats to provide to the supervised learning model. The focus of this dissertation is on overcoming the problem of identifying training data for supervised learning models for GPR buried threat detection.
The ultimate goal is to be able to apply the lessons learned in order to improve the performance of buried threat detectors. Therefore, a particular focus of this dissertation is to understand the implications of particular data selection strategies, and to develop principled general strategies for selecting the best approaches. This is done by identifying three factors that are typically considered in the literature with regards to this problem. Experiments are conducted to understand the impact of these factors on detection performance. The outcome of these experiments provided several insights about the data that can help guide the future development of automated buried threat detectors.
The first set of experiments suggest that a substantial number of threat signatures are neither hyperbolic nor regular in their appearance. These insights motivated the development of a novel buried threat detector that improves over the state-of-the-art benchmark algorithms on a large collection of data. In addition, this newly developed algorithm exhibits improved characteristics of robustness over those algorithms. The second set of experiments suggest that automating the selection of data corresponding to the buried threats is possible and can be used to replace manually designed methods for this task.
Item Open Access Bayesian Learning with Dependency Structures via Latent Factors, Mixtures, and Copulas(2016) Han, ShaoboBayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.
Item Open Access Deep Automatic Threat Recognition: Considerations for Airport X-Ray Baggage Screening(2020) Liang, Kevin JDeep learning has made significant progress in recent years, contributing to major advancements in many fields. One such field is automatic threat recognition, where methods based on neural networks have surpassed more traditional machine learning methods. In particular, we evaluate the performance of convolutional object detection models within the context of X-ray baggage screening at airport checkpoints. To do so, we collected a large dataset of scans containing threats from a diverse set of classes, and then trained and compared a number of models. Many currently deployed X-ray scanners contain multiple X-ray emitter-detector pairs arranged to give multiple views of the scanned object, and we find that combining predictions from these improves overall performance. We select the best-performing models fitting our design criteria and integrate them into the X-ray scanning machines, resulting in functional prototypes capable of simulating live screening deployment.
We also explore a number of subfields of deep learning with potential to improve these deep automatic threat recognition algorithms. For example, as data collection efforts are scaled up and the number threat categories are expanded, the likelihood of missing annotations will also increase, especially if this new data is collected from real airport traffic. Such a setting is actually common in object detection datasets, and we show that a positive-unlabeled learning assumption better fits the characteristics of the data. Additionally, real-world data distributions tend to drift over time or evolve cyclically with the seasons. Baggage scan images also tend to be sensitive, meaning storing data may represent a security or privacy risk. As a result, a continual learning setting may be more appropriate for these kinds of data, which we examine in the context of generative adversarial networks. Finally, the sensitivity of security applications makes understanding models especially important. We thus spend some time examining how certain popular neural networks emerge from assumptions made starting from kernel methods. Through these works, we find that deep learning methods show considerable promise to improve existing automatic threat recognition systems.