Browsing by Subject "Computer vision"
- Results Per Page
- Sort Options
Item Open Access 3D dynamic in vivo imaging of joint motion: application to measurement of anterior cruciate ligament function(2019) Englander, Zoë AlexandraMore than 400,000 anterior cruciate ligament (ACL) injuries occur annually in the United States, 70% of which are non-contact. A severe consequence of ACL injury is the increased risk of early-onset of osteoarthritis (OA). Importantly, the increased risk of OA persists even if the ACL is surgically reconstructed. Thus, due to the long term physical consequences and high financial burden of treatment, injury prevention and improved reconstruction techniques are critical. However, the causes of non-contact ACL injuries remain unclear, which has hindered efforts to develop effective training programs targeted at preventing these injuries. Improved understanding of the knee motions that increase the risk of ACL injury can inform more effective injury prevention strategies. Furthermore, there is presently limited in vivo data to describe the function of ACL under dynamic loading conditions. Understanding how the ACL functions to stabilize the knee joint under physiologic loading conditions can inform design criteria for grafts used in ACL reconstruction. Grafts that more accurately mimic the native function of the ACL may help prevent these severe long term degenerative changes in the knee joint after injury.
To this end, measurements of in vivo ACL function during knee motion are critical to understanding how non-contact ACL injuries occur and the function of the ACL in stabilizing the joint during activities of daily living. Specifically, identifying the knee motions that increase ACL length and strain can elucidate the mechanisms of non-contact ACL injury, as a taut ligament is more likely to fail. Furthermore, measuring ACL elongation patterns during dynamic activity can inform the design criteria for grafts used in reconstructive surgery. To obtain measurements, 3D imaging techniques that can be used to measure dynamic in vivo ACL elongation and strain at high temporal and spatial resolution are needed.
Thus, in this dissertation a method of measuring knee motion and ACL function during dynamic activity in vivo using high-speed biplanar radiography in combination with magnetic resonance (MR) imaging was developed. In this technique, 3D surface models of the knee joint are created from MR images and registered to high-speed biplanar radiographs of knee motion. The use of MR imaging to model the joint allows for visualization of bone and soft tissue anatomy, in particular the attachment site footprints of the ligaments. By registering the bone models to biplanar radiographs using software developed in this dissertation, the relative positions of the bones and associated ligament attachment site footprints at the time of radiographic imaging can be reproduced. Thus, measurements of knee kinematics and ligament function during dynamic activity can be obtained at high spatial and temporal resolution.
We have applied the techniques developed in this dissertation to obtain novel dynamic in vivo measurements of the mechanical function of the knee joint. Specifically, the physiologic elongation and strain behaviors of the ACL during gait and single-legged jumping were measured. Additionally, the dynamic function of the patellar tendon during single legged jumping was measured. The findings of this dissertation have helped to elucidate the knee kinematics that increase ACL injury vulnerability by identifying the dynamic motions that result in elongation and strain in the ACL. Furthermore, the findings of this dissertation have provided critical data to inform design criteria for grafts used in reconstructive surgery such that reconstructive techniques better mimic the physiologic function of the ACL.
The methodologies described in this dissertation can be applied to study the mechanical behavior of other joints such as the spine, and other soft tissues, such as articular cartilage, under various loading conditions. Therefore, these methods may have a significant impact on the field of biomechanics as a whole, and may have applicability to a number of musculoskeletal applications.
Item Open Access 3D Object Representations for Robot Perception(2019) Burchfiel, Benjamin Clark MalloyReasoning about 3D objects is one of the most critical perception problems robots face; outside of navigation, most interactions between a robot and its environment are object-centric. Object-centric robot perception has long relied on maintaining an explicit database of 3D object models with the assumption that encountered objects will be exact copies of entries in the database; however, as robots move into unstructured environments such as human homes, the variation of encountered objects increases and maintaining an explicit object database becomes infeasible. This thesis introduces a general-purpose 3D object representation that allows the joint estimation of a previously unencountered object's class, pose, and 3D shape---crucial foundational tasks for general robot perception.
We present the first method capable of performing all three of these tasks simultaneously, Bayesian Eigenobjects (BEOs), and show that it outperforms competing approaches which estimate only object shape and class given a known object pose. BEOs use an approximate Bayesian version of Principal Component Analysis to learn an explicit low-dimensional subspace containing the 3D shapes of objects of interest, which allows for efficient shape inference at high object resolutions. We then extend BEOs to produce Hybrid Bayesian Eigenobjects (HBEOs), a fusion of linear subspace methods with modern convolutional network approaches, enabling realtime inference from a single depth image. Because HBEOs use a Convolutional Network to project partially observed objects onto the learned subspace, they allow the object to be larger and more expressive without impacting the inductive power of the model. Experimentally, we show that HBEOs offer significantly improved performance on all tasks compared to their BEO predecessors. Finally, we leverage the explicit 3D shape estimate produced by BEOs to further extend the state-of-the-art in category level pose estimation by fusing probabilistic pose predictions with a silhouette-based reconstruction prior. We also illustrate the advantages of combining both probabilistic pose estimation and shape verification, via an ablation study, and show that both portions of the system contribute to its performance. Taken together, these methods comprise a significant step towards creating a general-purpose 3D perceptual foundation for robotics systems, upon which problem-specific systems may be built.
Item Embargo 3D Tissue Modelling: Laser-based Multi-modal Surface Reconstruction, Crater Shape Prediction and Pathological Mapping in Robotic Surgery(2023) Ma, GuangshenIn surgical robotics, fully-automated tumor removal is an important topic and it includes three main tasks: tissue classification for cancer diagnosis, pathological mapping for tumor localization and tissue resection by using a laser scalpel. Generating a three-dimensional (3D) pathological tissue model with fully non-contact sensors can provide invaluable information to assist surgeons in decision-making and enable the use of surgical robots for efficient tissue manipulation. To collect the comprehensive information of a biological tissue target, robotic laser systems with complementary sensors (e.g., Optical coherence tomography (OCT) sensor, and stereovision) can play important roles in providing non-contact laser scalpels (i.e., cutting laser scalpel) for tissue removal, applying photonics-based sensors for pathological tissue classification (i.e., laser-based endogenous fluorescence), and aligning multi-sensing information to generate a 3D pathological map. However, there are three main challenges with integrating multiple laser-based sensors into the robotic laser system, which includes: 1) Modelling the laser beam transmission in 3D free-space to achieve accurate laser-tissue manipulation under geometric constraints, 2) Studying the complex physics of laser-tissue interaction for tissue differentiation and 3D shape modelling to ensure safe tissue removal, and 3) Integrating information from multiple sensing devices under sensor noise and uncertainties from system calibration.
Targeting these three research problems separately, a computational framework is proposed to provide kinematics and calibration algorithms to control and direct the 3D laser beam through a system with multiple rotary mirrors (to transmit laser beam in free-space) and laser-based sensor inputs. This framework can serve as a base platform for optics-based robotic system designs and solving the motion planning problems related to laser-based robot systems. Simulation experiments have verified the feasibility of the proposed framework and actual experiments have been conducted with an existing robotic laser system on phantom and ex-vivo biological tissues.
To study the complex physics of laser-tissue interaction, a 3D data-driven method is developed to model the geometric relation between the laser energy distribution, laser incident angles, and the tissue deformation resulting from photoablation. The results of the phantom studies have demonstrated the feasibility of applying the trained model for laser crater shape predictions during the surgical planning.
Finally, a research platform, referred as ``TumorMapping", is developed to collect multimodal sensing information from complementary sensors to build a 3D pathological map of a mice tumor surface. This robot system includes a sensor module attached to a 6-DOF robot arm end-effector, based on the laser-induced fluorescence spectroscopy for tissue classification and a fiber couple cutting laser for tissue resection. A benchtop sensor platform is built with an OCT sensor and a stereovision system with two lens camera to collect the tissue information with a non-contact pattern. The robot-sensor and the complementary sensor sub-systems are integrated in a unified platform for the 3D pathological map reconstruction.
In summary, the research contributions include important advancements in laser-based sensor fusion for surgical decision-making which is enabling new capabilities for the use of 3D pathological mapping combined with intelligent robot planning and control algorithms for robotic surgery.
Item Open Access Appearance-based Gaze Estimation and Applications in Healthcare(2020) Chang, ZhuoqingGaze estimation, the ability to predict where a person is looking, has become an indispensable technology in healthcare research. Current tools for gaze estimation rely on specialized hardware and are typically used in well-controlled laboratory settings. Novel appearance-based methods directly estimate a person's gaze from the appearance of their eyes, making gaze estimation possible with ubiquitous, low-cost devices, such as webcams and smartphones. This dissertation presents new methods on appearance-based gaze estimation as well as applying this technology to solve challenging problems in practical healthcare applications.
One limitation of appearance-based methods is the need to collect a large amount of training data to learn the highly variant eye appearance space. To address this fundamental issue, we develop a method to synthesize novel images of the eye using data from a low-cost RGB-D camera and show that this data augmentation technique can improve gaze estimation accuracy significantly. In addition, we explore the potential of utilizing visual saliency information as a means to transparently collect weakly-labelled gaze data at scale. We show that the collected data can be used to personalize a generic gaze estimation model to achieve better performance on an individual.
In healthcare applications, the possibility of replacing specialized hardware with ubiquitous devices when performing eye-gaze analysis is a major asset that appearance-based methods brings to the table. In the first application, we assess the risk of autism in toddlers by analyzing videos of them watching a set of expert-curated stimuli on a mobile device. We show that appearance-based methods can be used to estimate their gaze position on the device screen and that differences between the autistic and typically-developing populations are significant. In the second application, we attempt to detect oculomotor abnormalities in people with cerebellar ataxia using video recorded from a mobile phone. By tracking the iris movement of participants while they watch a short video stimuli, we show that we are able to achieve high sensitivity and specificity in differentiating people with smooth pursuit oculomotor abnormalities from those without.
Item Open Access Automatic Behavioral Analysis from Faces and Applications to Risk Marker Quantification for Autism(2018) Hashemi, JordanThis dissertation presents novel methods for behavioral analysis with a focus on early risk marker identification for autism. We present current contributions including a method for pose-invariant facial expression recognition, a self-contained mobile application for behavioral analysis, and a framework to calibrate a trained deep model with data synthesis and augmentation. First we focus on pose-invariant facial expression recognition. It is known that 3D features have higher discrimination power than 2D features; however, usually 3D features are not readily available at testing time. For pose-invariant facial expression recognition, we utilize multi-modal features at training and exploit the cross-modal relationship at testing. We extend our pose-invariant facial expression recognition method and present other methods to characterize a multitude of risk behaviors related to risk marker identification for autism. In practice, identification of children with neurodevelopmental disorders requires low specificity screening with questionnaires followed by time-consuming, in-person observational analysis by highly-trained clinicians. To alleviate the time and resource expensive risk identification process, we develop a self-contained, closed- loop, mobile application that records a child’s face while he/she is watching specific, expertly-curated movie stimuli and automatically analyzes the behavioral responses of the child. We validate our methods to those of expert human raters. Using the developed methods, we present findings on group differences for behavioral risk markers for autism and interactions between motivational framing context, facial affect, and memory outcome. Lastly, we present a framework to use face synthesis to calibrate trained deep models to deployment scenarios that they have not been trained on. Face synthesis involves creating novel realizations of an image of a face and is an effective method that is predominantly employed only at training and in a blind manner (e.g., blindly synthesize as much as possible). We present a framework that optimally select synthesis variations and employs it both during training and at testing, leading to more e cient training and better performance.
Item Open Access Building a Better Business Intelligence Platform for EV Charging Developers(2024-04-25) Dreis, Andrew; Belcher, HarlanWhile working with NextEra’s Energy Mobility team we realized that there is inadequate data on commercial vehicle fleet locations in the US. This market intelligence blind spot slows down the sales process for EV charging infrastructure and associated services. Based on this discovery, we built a business plan for a software sales tool that could vastly expand the data pool available to companies who sell to vehicle fleets. This product rests on a novel computer vision approach to identifying vehicle fleets. The product, still in development, processes the type and number of vehicles in satellite imagery and then matches that data with business information, increasing sales efficiency. Further analysis revealed estimated market size, customers, competitors, and go-to-market strategy. Interviews with computer vision experts and industry players validated our findings and strategies.Item Open Access Deep Automatic Threat Recognition: Considerations for Airport X-Ray Baggage Screening(2020) Liang, Kevin JDeep learning has made significant progress in recent years, contributing to major advancements in many fields. One such field is automatic threat recognition, where methods based on neural networks have surpassed more traditional machine learning methods. In particular, we evaluate the performance of convolutional object detection models within the context of X-ray baggage screening at airport checkpoints. To do so, we collected a large dataset of scans containing threats from a diverse set of classes, and then trained and compared a number of models. Many currently deployed X-ray scanners contain multiple X-ray emitter-detector pairs arranged to give multiple views of the scanned object, and we find that combining predictions from these improves overall performance. We select the best-performing models fitting our design criteria and integrate them into the X-ray scanning machines, resulting in functional prototypes capable of simulating live screening deployment.
We also explore a number of subfields of deep learning with potential to improve these deep automatic threat recognition algorithms. For example, as data collection efforts are scaled up and the number threat categories are expanded, the likelihood of missing annotations will also increase, especially if this new data is collected from real airport traffic. Such a setting is actually common in object detection datasets, and we show that a positive-unlabeled learning assumption better fits the characteristics of the data. Additionally, real-world data distributions tend to drift over time or evolve cyclically with the seasons. Baggage scan images also tend to be sensitive, meaning storing data may represent a security or privacy risk. As a result, a continual learning setting may be more appropriate for these kinds of data, which we examine in the context of generative adversarial networks. Finally, the sensitivity of security applications makes understanding models especially important. We thus spend some time examining how certain popular neural networks emerge from assumptions made starting from kernel methods. Through these works, we find that deep learning methods show considerable promise to improve existing automatic threat recognition systems.
Item Open Access Deep Generative Models for Vision, Languages and Graphs(2019) Wang, WenlinDeep generative models have achieved remarkable success in modeling various types of data, ranging from vision, languages and graphs etc. They offer flexible and complementary representations for both labeled and unlabeled data. Moreover, they are naturally capable of generating realistic data. In this thesis, novel variations of generative models have been proposed for various learning tasks, which can be categorized into three parts.
In the first part, generative models are designed to learn generalized representation for images under Zero-Shot Learning (ZSL) setting. An attribute conditioned variational autoencoder is introduced, representing each class as a latent-space distribution and enabling learning highly discriminative and robust feature representations. It endows the generative model discriminative power by choosing one class that maximize the variational lower bound. I further show that the model can be naturally generalized to transductive and few-shot setting.
In the second part, generative models are proposed for controllable language generation. Specifically, two types of topic enrolled language generation models have been proposed. The first introduces a topic compositional neural language model for controllable and interpretable language generation via a mixture-of-expert model design. While the second solve the problem via a VAE framework with a topic-conditioned GMM model design. Both of the two models have boosted the performance of existing language generation systems with controllable properties.
In the third part, generative models are introduced for the broaden graph data. First, a variational homophilic embedding (VHE) model is proposed. It is a fully generative model that learns network embeddings by modeling the textual semantic information with a variational autoencoder, while accounting for the graph structure information through a homophilic prior design. Secondly, for the heterogeneous multi-task learning, a novel graph-driven generative model is developed to unifies them into the same framework. It combines graph convolutional network (GCN) with multiple VAEs, thus embedding the nodes of graph in a uniform manner while specializing their organization and usage to different tasks.
Item Open Access Development of Deep Learning Models for Deformable Image Registration (DIR) in the Head and Neck Region(2020) Amini, AlaDeformable image registration (DIR) is the process of registering two or more images to a reference image by minimizing local differences across the entire image. DIR is conventionally performed using iterative optimization-based methods, which are time-consuming and require manual parameter tuning. Recent studies have shown that deep learning methods, most importantly convolutional neural networks (CNNs), can be employed to address the DIR problem. In this study, we propose two deep learning frameworks to perform the DIR task in an unsupervised approach for CT-to-CT deformable registration of the head and neck region. Given that head and neck cancer patients might undergo severe weight loss over the course of their radiation therapy treatment, DIR in this region becomes an important task. The first proposed deep learning framework contains two scales, where both scales are based on freeform deformation, and are trained based on minimizing a dissimilarity intensity-based metrics, while encouraging the deformed vector field (DVF) smoothness. The two scales were first trained separately in a sequential manner, and then combined in a two-scale joint training framework for further optimization. We then developed a transfer learning technique to improve the DIR accuracy of the proposed deep learning networks by fine-tuning a pre-trained group-based model into a patient-specific model to optimize its performance for individual patients. We showed that by utilizing as few as two prior CT scans of a patient, the performance of the pretrained model described above can be improved yielding more accurate DIR results for individual patients. The second proposed deep learning framework, which also consists of two scales, is a hybrid DIR method using B-spline deformation modeling and deep learning. In the first scale, deformation of control points are learned by deep learning and initial DVF is estimated using B-spline interpolation to ensure smoothness of the initial estimation. Second scale model of the second framework is the same as that in the first framework. In our study, the networks were trained and evaluated using public TCIA HNSCC-3DCT for the head and neck region. We showed that our DIR results of our proposed networks are comparable to conventional DIR methods while being several orders of magnitude faster (about 2 to 3 seconds), making it highly applicable for clinical applications.
Item Open Access Motion Boundary and Occlusion Reasoning for Video Analysis(2022) Kim, HannahWith the increasing prevalence of video cameras, video motion analysis has become an important research area in vision. Motion in video is often represented in the form of dense Optical Flow fields, which specify the motion of each pixel from one frame to the next. While existing flow predictors achieve almost sub-pixel performance in existing benchmarks, they still suffer in three particular areas. The first area is near motion boundaries, or the curves across which the optical flow field is discontinuous. The second is in occlusion regions, sets of pixels in one frame without a corresponding pixel in the other. The optical flow is not defined for these occlusion pixels. The third is in regions with large motion as they require high computational and memory costs. This dissertation examines these three challenges for motion boundary detection, occlusion detection, video interpolation, and occlusion-based adversarial attack detection for optical flow.
First, we propose a convolutional neural network named MONet to jointly detect motion boundaries and occlusion regions in video both forward and backward in time. Since both motion boundaries and occlusion regions disrupt correspondences across frames, we first use a cost map of the Euclidean distances between each feature in one frame to its closest feature in the next. To reason in two time directions simultaneously, we direct warp the estimated occlusion region and motion boundary maps between two frames, preserving features in occlusion regions. As motion boundaries align with occlusion region boundaries, we utilize an attention mechanism and a gradient module to enforce the network to focus on the useful 2D spatial regions predicted by the other task. MONet achieves state-of-the-art results for both tasks on various benchmarks.
Next, we consider the video interpolation task, which aims to interpolate an intermediate frame given two consecutive image frames around it. We first present a novel visual transformer module, named Cross Similarity (CS), to globally aggregate input image features with similar appearances as those of the interpolated frame. These aggregated features are then used to refine the interpolated prediction. To account for occlusions in the aggregated CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other. Additionally, we augment our training dataset with an occluder patch that moves across frames to improve the network's robustness to occlusions and large motion. We supervise our IA module so that the network is encouraged to down-weight the features that are occluded by these patches. Because existing methods yield smooth predictions especially near motion boundaries, we use an additional training loss based on image gradient to yield sharper predictions.
We finally observe the effect of patch-based adversarial attacks on flow networks that cause occlusions and motion boundaries in the inputs, and present the first method to detect and localize these attacks without any fine-tuning or prior knowledge about the attacks. In particular, we detect the occlusion patch attacks via iterative optimization on the activations from the inner layers of any pre-trained optical flow networks to detect subset of anomalous activations.
Item Open Access Physical Designs in Artificial Neural Imaging(2022) Huang, QianArtificial neural networks fundamentally shift the paradigm of computational imaging. Powerful neural processing is not only taking place of the conventional algorithms, but also embracing radical and physically plausible forward models that better sample the high dimensional light field. Physical designs of sampling in turn tailor simulation and neural algorithms for optimal inverse estimation. Sampling, simulation and neural algorithms as three essential components compose a novel imaging paradigm -- artificial neural imaging, in which they interact and improve themselves in an upward spiral.
Here we present three concrete examples of artificial neural imaging and the important roles physical designs play. In all-in-focus imaging, we see autofocus, sampling and fusion algorithms are redefined for optimizing the image quality of a camera with limited depth of field. Image-based neural autofocus acts 5-10x faster than traditional algorithms. The focus control based on the rule or reinforcement learning dynamically estimates the environment and optimizes the focus trajectory. Along with the neural fusion algorithm, the pipeline outperforms traditional focal stacking approaches in static and dynamic scenes. In scatter ptychography, we show imaging the secondary scatters reflected by a remote target under coherent illumination can create a synthetic aperture on the scatterer. The reconstruction of the object through phase retrieval algorithms can drastically exceed the resolution of directly viewing the target. In the lab experiment we demonstrate 32x resolution improvement relative to direct imaging using error-reduction and plug-and-play algorithms. In array camera imaging, we demonstrate heterogeneous multiaperture designs that have better sampling structures and physics-aware transformers for feature-based data fusion. The proposed transformer incorporates the physical information of the camera array as its receptive fields, demonstrating the superior ability of image compositing on array cameras with diverse resolutions, focal lengths, focal planes, color spaces, and exposures. We also demonstrate a scalable pipeline of data synthesis through computer graphics software that empowers the transformers.
The examples above justify artificial neural imaging and the physical designs interweaved. We expect better designs in sampling, simulation, neural algorithms and eventually better estimation of the light field.
Item Open Access Practical Architectures for Fused Visual and Inertial Mobile Sensing(2015) Jain, PuneetCrowdsourced live video streaming from users is on the rise. Several factors such as social networks, streaming applications, smartphones with high-quality cameras, and ubiquitous wireless connectivity are contributing to this phenomenon. Unlike isolated professional videos, live streams emerge at an unprecedented scale, poorly captured, unorganized, and lack user context. To utilize the full potential of this medium and enable new services on top, immediate addressing of open challenges is required. Smartphones are resource constrained -- battery power is limited, bandwidth is scarce, on-board computing power and storage is insufficient to meet real-time demand. Therefore, mobile cloud computing is cited as an obvious alternative where cloud does the heavy-lifting for the smartphone. But, cloud resources are not cheap and real-time processing demands more than what the cloud can deliver.
This dissertation argues that throwing cloud resources at these problems and blindly offloading computation, while seemingly necessary, may not be sufficient. Opportunities need to be identified to streamline big-scale problems by leveraging in device capabilities, thereby making them amenable to a given cloud infrastructure. One of the key opportunities, we find, is the cross-correlation between different streams of information available in the cloud. We observe that inferences on a single information stream may often be difficult, but when viewed in conjunction with other information dimensions, the same problem often becomes tractable.
Item Open Access Realtime Image Processing for Resource Constrained Devices(2018) Streiffer, ChristopherWith the proliferation of embedded sensors within smartphone and Internet-of-Things devices, applications have programmatic access to more data processing than ever before. At the same time, advances in computer vision and deep learning have fostered methodology for performing complex, yet powerful operations on spatial and temporal data. Capitalizing on this union, applications are capable of providing advanced functionality to their users through features such as augmented reality and image classification. However, the devices responsible for running these libraries often lack the sufficient hardware to replicate the parallelization and straight-line speed of high-end servers. For image processing applications, this means that realtime performance is difficult without compromising functionality.
To detail this emerging paradigm, this work presents and examines two image processing applications which offer advanced functionality. The first, DarNet, utilizes the TensorFlow library to perform distracted driving classification based on image data using a Convolutional Neural Network (CNN). The second, PrivateEye, uses the OpenCV library to provide a camera based access-control privacy framework for Android users. While this advanced processing allows for enhanced functionality, the computationally expensive operations impose limitations on the realtime performance of these applications due to the lack of sufficient hardware.
This work posits that realtime image processing applications running on resource constrained devices require the external use of edge servers. To this extent, this work presents ePrivateEye, an extension to PrivateEye which provides code offloading to an edge server. The results of this work shows that offloading video-frame analysis to the edge at a metro-scale distance allows ePrivateEye to analyze more frames than PrivateEye's local processing over the same period, and achieve realtime performance of 30 fps with perfect precision and negligible impact on energy efficiency.
Item Open Access Sampling Strategies and Neural Processing for Array Cameras(2023) Hu, MinghaoArtificial intelligence (AI) reshapes computational imaging systems. Deep neural networks (DNN) not only show superior reconstruction performance over conventional ones handling the same sampling systems, these new reconstruction algorithms also call for new sampling strategies. In this dissertation, we study how DNN reconstruction algorithms and sampling strategy can be jointly designed to boost the system performance.
First, two DNNs for sensor fusion tasks based on convolutional neural networks (CNN) and transformers are proposed. They are able to fuse frames with different resolution, different wave band, or different temporal window. The amount of frames can also vary, showing great flexibility and scalability. A reasonable computational load is achieved by a proper receptive field design balancing the flexibility and complexity. Visual pleasing reconstruction results are achieved.
Then we demonstrate how DNN reconstruction algorithms favor certain sampling strategy over another, with snapshot compressive imaging (SCI) task as an example. Using synthetic datasets, we compare quasi-random coded sampling and multi-aperture multi-scale manifold sampling under DNN reconstruction. The latter sampling strategy requires much simpler physical setup, yet gives comparable, if not better, reconstruction image quality.
At the end, we design and build a multifocal array camera fitting the DNN reconstruction. With commercial on-the-shelf cameras and lenses, the array camera achieves a nearly 70 degree field of view (FoV), a 0.1m - 17.1m depth of field (DoF), and the ability to resolve objects with 2mm granularity. One final output image contains about 33M RGB pixels.
Overall, we explore the joint design of DNN reconstruction algorithms and physics sampling. With our research, we hope to develop more compact, more accurate, and larger covering range computational imaging systems.
Item Open Access Single Image Super Resolution:Perceptual quality & Test-time Optimization(2019) Chen, LeiImage super resolution is defined as recovering a high-resolution image given a low-resolution image input. It has a wide area of applications in modern digital image processing, producing better results in areas including satellite image processing, medical image processing, microscopy image processing, astrological studies and surveillance area. However, image super resolution is an ill-posed question since there exists non-deterministic answer in the high resolution image space, making it difficult to find the optimal solution.
In this work, various research directions in the area of single image super resolution are thoroughly studied. Each of the proposed methods' achievements as well as limitations including computational efficiency, perceptual performance limits are compared. The main contribution in this work including implementing a perceptual score predictor and integrating as part of the objective function in the upsampler algorithm. Apart from that, a test-time optimization algorithm is proposed, aiming at further enhance the image quality for the obtained super-resolution image from any upsampler. The proposed methods are implemented and tested using Pytorch. Results are compared on baseline applied datasets including Set5, Set14, Urban100 and DIV2K.
Results from perceptual score predictor was evaluated on both PSNR precision index and perceptual index, which is a combination of perceptual evaluation Ma score and NIQE score. With new objective function, the upsampler achieved to move along the trade-off curve of precision and perception. The test-time optimization algorithm achieved slightly improvements in both precision and perception index. Note that the proposed test time optimization does not require training of new neural network, thus, is computationally efficient.
Item Open Access Statistical Modeling to Improve Buried Target Detection with a Forward-Looking Ground-Penetrating Radar(2017) Camilo, JosephForward-looking ground-penetrating radar (FLGPR) has recently been investigated as a remote sensing modality for buried target detection (e.g., landmines and improvised explosive devices (IEDs) ). In this context, raw FLGPR data is commonly beamformed into images and then computerized algorithms are applied to automatically detect subsurface buried targets. Most existing algorithms are supervised, meaning they are trained to discriminate between labeled target and non-target imagery, usually based on features extracted from the radar imagery. This thesis is composed of two FLGPR research areas: an analysis of image features for classification, and the application of machine learning techniques to the formation process of radar imagery.
A large number of image features and classifiers have been proposed for detecting landmines in the FLGPR imagery, but it has been unclear which were the most effective. The primary goal of this component of my research is to provide a comprehensive comparison of detection performance using existing features on a large collection of FLGPR data. Fusion of the decisions resulting from processing each feature is also considered. These comparisons have not previously been performed, and a novel 2DFFT feature was also developed for the FLGPR application. Another contribution of my research in the image feature investigation was the analysis of two modern feature learning approaches from the object recognition literature: the bag-of-visual-words and the Fisher vector for FLGPR processing. The results indicate that most image classification algorithms perform similarly, though the newly designed 2DFFT-based feature consistently performs best for landmine detection with the FLGPR.
Based on the image feature results presented in this work, it appears that the current feature extractors are leveraging most of the information available in the radar images that are produced by the conventional beamforming process. The work presented in the second component of this thesis improves the beamforming process applied to the radar responses. By improving the radar images (i.e., increasing signal to noise ratio, or SNR), each feature extractor and classification algorithm is shown to subsequently increase in performance. These new methods are designed to incorporate multiple uncertainties in the physical world that are currently ignored during conventional beamforming. The two approaches to improving the underlying FLGPR image are a learned weighting applied to the antenna responses and a strategy for selecting the image creation depth. Both of these two new beamforming process approaches yield additional improvements to the imagery which are reflected in improved detection results.
Item Open Access Using Synthetic Satellite Imagery from Virtual Worlds to Train Deep Learning Models for Object Recognition(2021) Huang, BohaoObject segmentation in overhead imagery is a challenging problem in computer vision that has been extensively investigated. One challenge with this task is the tremendous visual variability that be present in real-world overhead imagery due to variations in scene content (e.g., building designs, vegetation types), weather conditions (e.g., cloud cover), time-of-day (e.g., sun direction and intensity), and imaging hardware. Existing training datasets to be used for object segmentation algorithms however only capture a small fraction of this variability, limiting the robustness of trained segmentation models. In particular, recent evidence in the literature suggests that trained models perform poorly on imagery that was collected in novel geographic locations (i.e., physical locations that are not present in the training data), or simply at a different time of day, limiting the widespread adoption of these approaches. In this work I make several contributions towards understanding and addressing these challenges. First, I build a deep learning framework, termed MRS, that streamlines the training and validation of deep learning models on large remote sensing datasets. Using MRS, I investigate how well modern deep learning models generalize to imagery collected over novel geographic locations, providing comprehensive experimental evidence that the generalization of modern models is indeed poor. Based upon these results, and to address this problem, I explore the use of synthetic overhead imagery for training deep learning models. Synthetic overhead imagery allows a designer to systematically add and vary many sources of real-world image variability that would be cost-prohibitive to collect and hand-label using real satellites. To accomplish this goal, I develop a new process for generating synthetic overhead imagery. This software package offers users simple controls of key properties of the synthetic imagery, and also generates imagery in a fully automatic fashion. Subsequently I use the new software package to create two datasets of synthetic overhead imagery, termed Synthinel-1 and Synthinel-2. I then demonstrate experimentally that augmenting real-world training imagery with Synthinel-1 or Synthinel-2 consistently yields more robust deep learning models, especially when the models are applied to novel geographic locations. Finally, I analyze the impact of different design choices for the synthetic imagery, and analyze potential reasons why the synthetic imagery is beneficial. Collectively my work has elucidated a major limitation of modern deep learning models that has prevented their widespread adoption for practical applications. Upon elucidating these limitations, my work has also taken several major steps towards overcoming these limitations and advancing the state of computer vision in overhead imagery.