Browsing by Subject "Deep learning"
Results Per Page
Sort Options
Item Open Access A 3-D Multiparametric Ultrasound Elasticity Imaging System for Targeted Prostate Biopsy Guidance(2023) Chan, Derek Yu XuanProstate cancer is the most common cancer and second-leading cause of cancer death among men in the United States. Early and accurate diagnosis of prostate cancer remains challenging; following an abnormal rectal exam or elevated levels of prostate-specific antigen in serum, clinical guidelines recommend transrectal ultrasound-guided biopsy. However, lesions are often indistinguishable from noncancerous prostate tissue in conventional B-mode ultrasound images, which have a diagnostic sensitivity of about 30%, so the biopsy is not typically targeted to suspicious regions. Instead, the biopsy systematically samples 12 pre-specified regions of the gland. Systematic sampling often fails to detect cancer during the first biopsy, and while multiparametric MRI (mpMRI) techniques have been developed to guide a targeted biopsy, fused with live ultrasound, this approach remains susceptible to registration errors, and is expensive and less accessible.
The goal of this work is to leverage ultrasound elasticity imaging methods, including acoustic radiation force impulse (ARFI) imaging and shear wave elasticity imaging (SWEI), to develop and optimize a robust 3-D elasticity imaging system for ultrasound-guided prostate biopsies and to quantify its performance in prostate cancer detection. Towards that goal, in this dissertation advanced techniques for generating ARFI and SWEI images are developed and evaluated, and a deep learning framework is explored for multiparametric ultrasound (mpUS) imaging, which combines data from different ultrasound-based modalities.
In Chapter 3, an algorithm is implemented that permits the simultaneous imaging of prostate cancer and zonal anatomy using both ARFI and SWEI. This combined sequence involves using closely spaced push beams across the lateral field of view, which enables the collection of higher signal-to-noise (SNR) shear wave data to reconstruct the SWEI volume than is typically acquired. Data from different push locations are combined using an estimated shear wave propagation time between push excitations to align arrival times, resulting in SWEI imaging of prostate cancer with high contrast-to-noise ratio (CNR), enhanced spatial resolution, and reduced reflection artifacts.
In Chapter 4, a fully convolutional neural network (CNN) is used for ARFI displacement estimation in the prostate. A novel method for generating ultrasound training data is described, in which synthetic 3-D displacement volumes with a combination of randomly seeded ellipsoids are used to displace scatterers, from which simulated ultrasonic imaging is performed. The trained network enables the visualization of in vivo prostate cancer and prostate anatomy, providing comparable performance with respect to both accuracy and speed compared to standard time delay estimation approaches.
Chapter 5 explores the application of deep learning for mpUS prostate cancer imaging by evaluating the use of a deep neural network (DNN) to generate an mpUS image volume from four ultrasound-based modalities for the detection of prostate cancer: ARFI, SWEI, quantitative ultrasound, and B-mode. The DNN, which was trained to maximize lesion CNR, outperforms the previous method of using a linear support vector machine to combine the input modalities, and generates mpUS image volumes that provide clear visualization of prostate cancer.
Chapter 6 presents the results of the first in vivo clinical trial that assesses the use of ARFI imaging for targeted prostate biopsy guidance in a single patient visit, comparing its performance with mpMRI-targeted biopsy and systematic sampling. The process of data acquisition, processing, and biopsy targeting is described. The study demonstrates the feasibility of using 3-D ARFI for guiding a targeted biopsy of the prostate, where it is most sensitive to higher-grade cancers. The findings also indicate the potential for using 2-D ARFI imaging to confirm target location during live B-mode imaging, which could improve existing ultrasonic fusion biopsy workflows.
Chapter 7 summarizes the research findings and considers potential directions for future research. By developing advanced ARFI and SWEI imaging techniques for imaging the prostate gland, and combining information from different ultrasound modalities, prostate cancer and zonal anatomy can be imaged with high contrast and resolution. The findings from this work suggest that ultrasound elasticity imaging holds great promise for facilitating image-guided targeted biopsies of clinically significant prostate cancer.
Item Open Access A Comparative Study of Radiomics and Deep-Learning Approaches for Predicting Surgery Outcomes in Early-Stage Non-Small Cell Lung Cancer (NSCLC)(2022) Zhang, HaozhaoPurpose: To compare radiomics and deep-learning (DL) methods for predicting NSCLC surgical treatment failure. Methods: A cohort of 83 patients undergoing lobectomy or wedge resection for early-stage NSCLC from our institution was studied. There were 7 local failures and 16 non-local failures (regional and/or distant). Gross tumor volumes (GTV) were contoured on pre-surgery CT datasets after 1mm3 isotropic resolution resampling. For the radiomics analysis, 92 radiomics features were extracted from the GTV and z-score normalizations were performed. The multivariate association between the extracted features and clinical endpoints were investigated using a random forest model following 70%-30% training-test split. For the DL analysis, both 2D and 3D model designs were executed using two different deep neural networks as transfer learning problems: in 2D-based design, 8x8cm2 axial fields-of-view(FOVs) centered within the GTV were adopted for VGG-16 training; in 3D-based design, 8x8x8 cm3 FOVs centered within the GTV were adopted for U-Net’s encoder path training. In both designs, data augmentation (rotation, translation, flip, noise) was included to overcome potential training convergence problems due to the imbalanced dataset, and the same 70%-30% training-test split was used. The performances of the 3 models (Radiomics, 2D-DL, 3D-DL) were tested to predict outcomes including local failure, non-local failure, and disease-free survival. Sensitivity/specificity/accuracy/ROC results were obtained from their 20 trained versions. Results: The radiomics models showed limited performances in all three outcome prediction tasks. The 2D-DL design showed significant improvement compared to the radiomics results in predicting local failure (ROC AUC = 0.546±0.056). The 3D-DL design achieved the best performance for all three outcomes (local failure ROC AUC = 0.768 ± 0.051, non-local failure ROC AUC = 0.683±0.027, disease-free ROC AUC = 0.694±0.042) with statistically significant improvements from radiomics/2D-DL results. Conclusions: 3D-DL execution outperformed the 2D-DL in predicting clinical outcomes after surgery for early-stage NSCLC. By contrast, classic radiomics approach did not achieve satisfactory results.
Item Open Access A Comprehensive Framework for Adaptive Optics Scanning Light Ophthalmoscope Image Analysis(2019) Cunefare, DavidDiagnosis, prognosis, and treatment of many ocular and neurodegenerative diseases, including achromatopsia (ACHM), require the visualization of microscopic structures in the eye. The development of adaptive optics ophthalmic imaging systems has made high resolution visualization of ocular microstructures possible. These systems include the confocal and split detector adaptive optics scanning light ophthalmoscope (AOSLO), which can visualize human cone and rod photoreceptors in vivo. However, the avalanche of data generated by such imaging systems is often too large, costly, and time consuming to be evaluated manually, making automation necessary. The few currently available automated cone photoreceptor identification methods are unable to reliably identify rods and cones in low-quality images of diseased eyes, which are common in clinical practice.
This dissertation describes the development of automated methods for the analysis of AOSLO images, specifically focusing on cone and rod photoreceptors which are the most commonly studied biomarker using these systems. A traditional image processing approach, which requires little training data and takes advantage of intuitive image features, is presented for detecting cone photoreceptors in split detector AOSLO images. The focus is then shifted to deep learning using convolutional neural networks (CNNs), which have been shown in other image processing tasks to be more adaptable and produce better results than classical image processing approaches, at the cost of requiring more training data and acting as a “black box”. A CNN based method for detecting cones is presented and validated against state-of-the-art cone detections methods for confocal and split detector images. The CNN based method is then modified to take advantage of multimodal AOSLO information in order to detect cones in images of subjects with ACHM. Finally, a significantly faster CNN based approach is developed for the classification and detection of cones and rods, and is validated on images from both healthy and pathological subjects. Additionally, several image processing and analysis works on optical coherence tomography images that were carried out during the completion of this dissertation are presented.
The completion of this dissertation led to fast and accurate image analysis tools for the quantification of biomarkers in AOSLO images pertinent to an array of retinal diseases, lessening the reliance on subjective and time-consuming manual analysis. For the first time, automatic methods have comparable accuracy to humans for quantifying photoreceptors in diseased eyes. This is an important step in the long-term goal to facilitate early diagnosis, accurate prognosis, and personalized treatment of ocular and neurodegenerative diseases through optimal visualization and quantification of microscopic structures in the eye.
Item Open Access A Convolutional Neural Network for SPECT Image Reconstruction(2022) Guan, ZixuPurpose: Single photon emission computed tomography (SPECT) is considered as a functional nuclear medicine imaging technique which is commonly used in the clinic. However, it suffers from low resolution and high noise because of the physical structure and photon scatter and attenuation. This research aims to develop a compact neural network reconstructing SPECT images from projection data, with better resolution and low noise. Methods and Materials: This research developed a MATLAB program to generate 2-D brain phantoms. We totally generated 20,000 2-D phantoms and corresponding projection data. Furthermore, those projection data were processed with Gaussian filter and Poisson noise to simulate the real clinical situation. And 16,000 of them were used to train the neural network, 2,000 for validation, and the final 2,000 for testing. To simulate the real clinical situation, there are five groups of projection data with decreasing acquisition views are used to train the network. Inspired by the SPECTnet, we used a two-step training strategy for network design. The full-size phantom images (128×128 pixels) were compressed into a vector (256×1) at first, then they were decompressed to full-size images again. This process was achieved by the AutoEncoder (AE) consisting of encoder and decoder. The compressed vector generated by the encoder works as targets in the second network, which map projection to compressed images. Then those compressed vectors corresponding to the projection were reconstructed to full-size images by the decoder. Results: A total of 10,000 testing dataset divided into 5 groups with 360 degrees, 180 degrees, 150 degrees, 120 degrees and 90 degrees acquisition, respectively, are generated by the developed neural network. Results were compared with those generated by conventional FBP methods. Compared with FBP algorithm, the neural network can provide reconstruction images with high resolution and low noise, even if under the limited-angles acquisitions. In addition, the new neural network had a better performance than SPECTnet. Conclusions: The network successfully reconstruct projection data to activity images. Especially for the groups whose view angles is less than 180 degrees, the reconstruction images by neural network have the same excellent quality as other images reconstructed by projection data over 360 degrees, even has a higher efficiency than the SPECTnet. Keywords: SPECT; SPECT image reconstruction; Deep learning; convolution neural network. Purpose: Single photon emission computed tomography (SPECT) is considered as a functional nuclear medicine imaging technique which is commonly used in the clinic. However, it suffers from low resolution and high noise because of the physical structure and photon scatter and attenuation. This research aims to develop a compact neural network reconstructing SPECT images from projection data, with better resolution and low noise. Methods and Materials: This research developed a MATLAB program to generate 2-D brain phantoms. We totally generated 20,000 2-D phantoms and corresponding projection data. Furthermore, those projection data were processed with Gaussian filter and Poisson noise to simulate the real clinical situation. And 16,000 of them were used to train the neural network, 2,000 for validation, and the final 2,000 for testing. To simulate the real clinical situation, there are five groups of projection data with decreasing acquisition views are used to train the network. Inspired by the SPECTnet, we used a two-step training strategy for network design. The full-size phantom images (128×128 pixels) were compressed into a vector (256×1) at first, then they were decompressed to full-size images again. This process was achieved by the AutoEncoder (AE) consisting of encoder and decoder. The compressed vector generated by the encoder works as targets in the second network, which map projection to compressed images. Then those compressed vectors corresponding to the projection were reconstructed to full-size images by the decoder. Results: A total of 10,000 testing dataset divided into 5 groups with 360 degrees, 180 degrees, 150 degrees, 120 degrees and 90 degrees acquisition, respectively, are generated by the developed neural network. Results were compared with those generated by conventional FBP methods. Compared with FBP algorithm, the neural network can provide reconstruction images with high resolution and low noise, even if under the limited-angles acquisitions. In addition, the new neural network had a better performance than SPECTnet. Conclusions: The network successfully reconstruct projection data to activity images. Especially for the groups whose view angles is less than 180 degrees, the reconstruction images by neural network have the same excellent quality as other images reconstructed by projection data over 360 degrees, even has a higher efficiency than the SPECTnet. Keywords: SPECT; SPECT image reconstruction; Deep learning; convolution neural network.
Item Open Access A Deep-Learning Method of Automatic VMAT Planning via MLC Dynamic Sequence Prediction (AVP-DSP) Using 3D Dose Prediction: A Feasibility Study of Prostate Radiotherapy Application(2020) Ni, YiminIntroduction: VMAT treatment planning requires time-consuming DVH-based inverse optimization process, which impedes its application in time-sensitive situations. This work aims to develop a deep-learning based algorithm, Automatic VMAT Planning via MLC Dynamic Sequence Prediction (AVP-DSP), for rapid prostate VMAT treatment planning.
Methods: AVP-DSP utilizes a series of 2D projections of a patient’s dose prediction and contour structures to generate a single 360º dynamic MLC sequence in a VMAT plan. The backbone of AVP-DSP is a novel U-net implementation which has a 4-resolution-step analysis path and a 4-resolution-step synthesis path. AVP-DSP was developed based on 131 previous prostate patients who received simultaneously-integrated-boost (SIB) treatment (58.8Gy/70Gy to PTV58.8/PTV70 in 28fx). All patients were planned by a 360º single-arc VMAT technique using an in-house intelligent planning tool in a commercial treatment planning system (TPS). 120 plans were used in AVP-DSP training/validation, and 11 plans were used as independent tests. Key dosimetric metrics achieved by AVP-DSP were compared against the ones planned by the commercial TPS.
Results: After dose normalization (PTV70 V70Gy=95%), all 11 AVP-DSP test plans met institutional clinic guidelines of dose distribution outside PTV. Bladder (V70Gy=6.8±3.6cc, V40Gy=19.4±9.2%) and rectum (V70Gy=2.8±1.8cc, V40Gy=26.3±5.9%) results in AVP-DSP plans were comparable with the commercial TPS plan results (bladder V70Gy=4.1±2.0cc, V40Gy=17.7±8.9%; rectum V70Gy=1.4±0.7cc, V40Gy=24.0±5.0%). 3D max dose results in AVP-DSP plans(D1cc=118.9±4.1%) were higher than the commercial TPS plans results(D1cc=106.7±0.8%). On average, AVP-DSP used 30 seconds for a plan generation in contrast to the current clinical practice (>20 minutes).
Conclusion: Results suggest that AVP-DSP can generate a prostate VMAT plan with clinically-acceptable dosimetric quality. With its high efficiency, AVP-DSP may hold great potentials of real-time planning application after further validation.
Item Open Access A Deep-Learning-based Multi-segment VMAT Plan Generation Algorithm from Patient Anatomy for Prostate Simultaneous Integrated Boost (SIB) Cases(2021) Zhu, QingyuanIntroduction: Several studies have realized fluence-map-prediction-based DL IMRT planning algorithms. However, DL-based VMAT planning remains unsolved. A main difficult in DL-based VMAT planning is how to generate leaf sequences from the predicted radiation intensity maps. Leaf sequences are required for a large number of control points and meet physical restrictions of MLC. A previous study1 reported a DL algorithm to generate 64-beam IMRT plans to approximate VMAT plans with certain dose distributions as input. As a step forward, another study2 reported a DL algorithm to generate one-arc VMAT plans from patient anatomy. This study generated MLC leaf sequence from thresholded predicted intensity maps for one-arc VMAT plans. Based on this study, we developed an algorithm to convert DL-predicted intensity maps to multi-segment VMAT plans to improve the performance of one-arc plans.
Methods: Our deep learning model utilizes a series of 2D projections of a patient’s dose prediction and contour structures to generate a multi-arc 360º dynamic MLC sequence in a VMAT plan. The backbone of this model is a novel U-net implementation which has a 4-resolution-step analysis path and a 4-resolution-step synthesis path. In the pretrained DL model, a total of 130 patients were involved, with 120 patients in the training and 11 patients in testing groups, respectively. These patients were prescribed with 70Gy/58.8Gy to the primary/boost PTVs in 28 fractions in a simulated integrated boost (SIB) regime. In this study, 7-8 arcs with the same collimator angle are used to simulate the predicted intensity maps. The predicted intensity maps are separated into 7-8 segments along the collimator angle. Hence, the arcs could separately simulate predicted intensity maps with independent weight factors. This separation also potentially allows MLC leaves to simulate more dose gradient in the predicted intensity mapsResults: After dose normalization (PTV70 V70Gy=95%), all 11 multi-segment test plans met institutional clinic guidelines of dose distribution outside PTV. Bladder (V70Gy=5.3±3.3cc, V40Gy=16.1±8.6%) and rectum (V70Gy=4.5±2.3cc, V40Gy=33.4±8.1%) results in multi-segment plans were comparable with the commercial TPS plan results. 3D max dose results in AVP-DSP plans(D1cc=112.6±1.9%) were higher than the commercial TPS plans results(D1cc=106.7±0.8%). On average, AVP-DSP used 600 seconds for a plan generation in contrast to the current clinical practice (>20 minutes).
Conclusion: Results suggest that multi-segment plans can generate a prostate VMAT plan with clinically-acceptable dosimetric quality. the proposed multi-segment plan generation algorithm has the capability to achieve higher modulation and lower maximum dose. With its high efficiency, multi-segment may hold great potentials of real-time planning application after further validation.
Item Open Access A Radiomics-Incorporated Deep Ensemble Learning Model for Multi-Parametric MRI-based Glioma Segmentation(2023) YANG, CHENAbstractPurpose: To develop a deep ensemble learning model with a radiomics spatial encoding execution for improved glioma segmentation accuracy using multi-parametric MRI (mp-MRI). Materials/Methods: This radiomics-incorporated deep ensemble learning model was developed using 369 glioma patients with a 4-modality mp-MRI protocol: T1, contrast-enhanced T1 (T1-Ce), T2, and FLAIR. In each modality volume, a 3D sliding kernel was implemented across the brain to capture image heterogeneity: fifty-six radiomic features were extracted within the kernel, resulting in a 4th order tensor. Each radiomic feature can then be encoded as a 3D image volume, namely a radiomic feature map (RFM). For each patient, all RFMs extracted from all 4 modalities were processed by the Principal Component Analysis (PCA) for dimension reduction, and the first 4 principal components (PCs) were selected. Next, four deep neural networks following the U-net’s architecture were trained for the segmenting of a region-of-interest (ROI): each network utilizes the mp-MRI and 1 of the 4 PCs as a 5-channel input for 2D execution. Last, the 4 softmax probability results given by the U-net ensemble were superimposed and binarized by Otsu’s method as the segmentation result. Three deep ensemble models were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT), respectively. Segmentation results given by the proposed ensemble were compared to the mp-MRI-only U-net results. Results: All 3 radiomics-incorporated deep learning ensemble models were successfully implemented: Compared to mp-MRI-only U-net results, the dice coefficients of ET (0.777→0.817), TC (0.742→0.757), and WT (0.823→0.854) demonstrated improvements. Accuracy, sensitivity, and specificity results demonstrated the same patterns. Conclusion: The adopted radiomics spatial encoding execution enriches the image heterogeneity information that leads to the successful demonstration of the proposed neural network ensemble design, which offers a new tool for mp-MRI-based medical image segmentation.
Item Open Access Accelerating Brain DTI and GYN MRI Studies Using Neural Network(2021) Yan, YuhaoThere always exists a demand to accelerate the time-consuming MRI acquisition process. Many methods have been proposed to achieve this goal, including deep learning method which appears to be a robust tool compared to conventional methods. While many works have been done to evaluate the performance of neural networks on standard anatomical MR images, few attentions have been paid to accelerating other less conventional MR image acquisitions. This work aims to evaluate the feasibility of neural networks on accelerating Brain DTI and Gynecological Brachytherapy MRI. Three neural networks including U-net, Cascade-net and PD-net were evaluated. Brain DTI data was acquired from public database RIDER NEURO MRI while cervix gynecological MRI data was acquired from Duke University Hospital clinic data. A 25% Cartesian undersampling strategy was applied to all the training and test data. Diffusion weighted images and quantitative functional maps in Brain DTI, T1-spgr and T2 images in GYN studies were reconstructed. The performance of the neural networks was evaluated by quantitatively calculating the similarity between the reconstructed images and the reference images, using the metric Total Relative Error (TRE). Results showed that with the architectures and parameters set in this work, all three neural networks could accelerate Brain DTI and GYN T2 MR imaging. Generally, PD-net slightly outperformed Cascade-net, and they both outperformed U-net with respect to image reconstruction performance. While this was also true for reconstruction of quantitative functional diffusion weighted maps and GYN T1-spgr images, the overall performance of the three neural networks on these two tasks needed further improvement. To be concluded, PD-net is very promising on accelerating T2-weighted-based MR imaging. Future work can focus on adjusting the parameters and architectures of the neural networks to improve the performance on accelerating GYN T1-spgr MR imaging and adopting more robust undersampling strategy such as radial undersampling strategy to further improve the overall acceleration performance.
Item Open Access Accelerator Architectures for Deep Learning and Graph Processing(2020) Song, LinghaoDeep learning and graph processing are two big-data applications and they are widely applied in many domains. The training of deep learning is essential for inference and has not yet been fully studied. With data forward, error backward, and gradient calculation, deep learning training is a more complicated process with higher computation and communication intensity. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. In this dissertation, I present AccPar, a principled and systematic method of determining the tensor partition for multiple heterogeneous accelerators for efficient training acceleration. Emerging resistive random access memory (ReRAM) is promising for processing in memory (PIM). For high-throughput training acceleration in ReRAM-based PIM accelerator, I present PipeLayer, an architecture for layer-wise pipelined parallelism. Graph processing is well-known for poor locality and high memory bandwidth demand. In conventional architectures, graph processing incurs a significant amount of data movements and energy consumption. I present GraphR, the first ReRAM-based graph processing accelerator which follows the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost. Sparse matrix-vector multiplication (SpMV), a subset of graph processing, is the key computation in iterative solvers for scientific computing. The efficiently accelerating floating-point processing in ReRAM remains a challenge. In this dissertation, I present ReFloat, a data format, and a supporting accelerator architecture, for low-cost floating-point processing in ReRAM for scientific computing.
Item Open Access Advancing Deep-Generated Speech and Defending against Its Misuse(2023) Cai, ZexinDeep learning has revolutionized speech generation, spanning synthesis areas such as text-to-speech and voice conversion, leading to diverse advancements. On the one hand, when trained on high-quality datasets, artificial voices now exhibit a level of synthesized quality that rivals human speech in naturalness. On the other, cutting-edge deep synthesis research is making strides in producing controllable systems, allowing for generating audio signals in arbitrary voice and speaking style.
Yet, despite their impressive synthesis capabilities, current speech generation systems still face challenges in controlling and manipulating speech attributes. Control over crucial attributes, such as speaker identity and language, essential for enhancing the functionality of a synthesis system, still needs to be improved. Specifically, systems capable of cloning a target speaker's voice in cross-lingual contexts or replicating unseen voices are still in their nascent stages. On the other hand, the heightened naturalness of synthesized speech has raised concerns, posing security threats to both humans and automated speech processing systems. The rise of accessible audio deepfakes, capable of spreading misinformation or bypassing biometric security, accentuates the complex interplay between advancing and defencing against deep-synthesized speech.
Consequently, this dissertation delves into the dynamics of deep-generated speech, viewing it from two perspectives. Offensively, we aim to enhance synthesis systems to elevate their capabilities. On the defensive side, we introduce methodologies to counter emerging audio deepfake threats, offering solutions grounded in detection-based approaches and reliable synthesis system design.
Our research yields several noteworthy findings and conclusions. First, we present an improved voice cloning method incorporated with our novel feedback speaker consistency mechanism. Second, we demonstrate the feasibility of achieving cross-lingual multi-speaker speech synthesis with a limited amount of bilingual data, offering a synthesis method capable of producing diverse audio across various speakers and languages. Third, our proposed frame-level detection model for partially fake audio attacks proves effective in detecting tampered utterances and locating the modified regions within. Lastly, by employing an invertible synthesis system, we can trace back to the original speaker of a converted utterance. Despite these strides, each domain of our study still confronts challenges, further fueling our motivation for persistent research and refinement of the associated performance.
Item Open Access Advancing the Design and Utility of Adversarial Machine Learning Methods(2021) Inkawhich, Nathan AlbertWhile significant progress has been made to craft Deep Neural Networks (DNNs) with super-human recognition performance, their reliability and robustness in challenging operating conditions is still a major concern. In this work, we study multiple facets of the DNN robustness problem by pursuing two main threads of research. The key methodological linkage throughout our investigations is the consistent design/development/utilization/deployment of Adversarial Machine Learning techniques, which have remarkable abilities to both degrade and enhance model performance. Our ultimate goal is to help construct the more safe and reliable models of the future.
In the first thread of research, we take the perspective of an adversary who wishes to find novel and increasingly potent ways to fool current DNN models. Our approach is centered around the development of a feature space attack, and the construction of novel adversarial threat models that work to reduce required knowledge assumptions. Interestingly, we find that a transfer-based blackbox adversary can be significantly more powerful than previously believed, and can reliably cause targeted misclassifications with imperceptible noises. Further, we find that the attacker does not necessarily require access to the target model's training distribution to create transferable attacks, which is a more practically concerning scenario due to the reduction of required attacker knowledge.
Along the second thread of research, we take the perspective of a DNN model designer whose job is to create systems capable of robust operation in ``open-world'' environments, where both known and unknown target types may be encountered. Our approach is to establish a classifier + out-of-distribution (OOD) detector system co-design that is centered around an adversarial training procedure and an outlier exposure-based learning objective. Through various experiments, we find that our systems can achieve high accuracy in extended operating conditions, while reliably detecting and rejecting fine-grained OOD target types. We also develop a method for efficiently improving OOD detection by learning from the deployment environment. Overall, by exposing novel vulnerabilities of current DNNs while also improving the reliability of existing models to known vulnerabilities, our work makes significant progress towards creating the next-generation of more trustworthy models.
Item Open Access Applications of Deep Learning, Machine Learning, and Remote Sensing to Improving Air Quality and Solar Energy Production(2021) Zheng, TongshuExposure to higher PM2.5 can lead to increased risks of mortality; however, the spatial concentrations of PM2.5 are not well characterized, even in megacities, due to the sparseness of regulatory air quality monitoring (AQM) stations. This motivates novel low-cost methods to estimate ground-level PM2.5 at a fine spatial resolution so that PM2.5 exposure in epidemiological research can be better quantified and local PM2.5 hotspots at a community-level can be automatically identified. Wireless low-cost particulate matter sensor network (WLPMSN) is among these novel low-cost methods that transform air quality monitoring by providing PM information at finer spatial and temporal resolutions; however, large-scale WLPMSN calibration and maintenance remain a challenge because the manual labor involved in initial calibration by collocation and routine recalibration is intensive, the transferability of the calibration models determined from initial collocation to new deployment sites is questionable, as calibration factors typically vary with urban heterogeneity of operating conditions and aerosol optical properties, and the stability of low-cost sensors can drift or degrade over time. This work presents a simultaneous Gaussian Process regression (GPR) and simple linear regression pipeline to calibrate and monitor dense WLPMSNs on the fly by leveraging all available reference monitors across an area without resorting to pre-deployment collocation calibration. We evaluated our method for Delhi, where the PM2.5 measurements of all 22 regulatory reference and 10 low-cost nodes were available for 59 days from January 1, 2018 to March 31, 2018 (PM2.5 averaged 138 ± 31 μg m-3 among 22 reference stations), using a leave-one-out cross-validation (CV) over the 22 reference nodes. We showed that our approach can achieve an overall 30 % prediction error (RMSE: 33 μg m-3) at a 24 h scale and is robust as underscored by the small variability in the GPR model parameters and in the model-produced calibration factors for the low-cost nodes among the 22-fold CV. Of the 22 reference stations, high-quality predictions were observed for those stations whose PM2.5 means were close to the Delhi-wide mean (i.e., 138 ± 31 μg m-3) and relatively poor predictions for those nodes whose means differed substantially from the Delhi-wide mean (particularly on the lower end). We also observed washed-out local variability in PM2.5 across the 10 low-cost sites after calibration using our approach, which stands in marked contrast to the true wide variability across the reference sites. These observations revealed that our proposed technique (and more generally the geostatistical technique) requires high spatial homogeneity in the pollutant concentrations to be fully effective. We further demonstrated that our algorithm performance is insensitive to training window size as the mean prediction error rate and the standard error of the mean (SEM) for the 22 reference stations remained consistent at ~30 % and ~3–4 % when an increment of 2 days’ data were included in the model training. The markedly low requirement of our algorithm for training data enables the models to always be nearly most updated in the field, thus realizing the algorithm’s full potential for dynamically surveilling large-scale WLPMSNs by detecting malfunctioning low-cost nodes and tracking the drift with little latency. Our algorithm presented similarly stable 26–34 % mean prediction errors and ~3–7 % SEMs over the sampling period when pre-trained on the current week’s data and predicting 1 week ahead, therefore suitable for online calibration. Simulations conducted using our algorithm suggest that in addition to dynamic calibration, the algorithm can also be adapted for automated monitoring of large-scale WLPMSNs. In these simulations, the algorithm was able to differentiate malfunctioning low-cost nodes (due to either hardware failure or under heavy influence of local sources) within a network by identifying aberrant model-generated calibration factors (i.e., slopes close to zero and intercepts close to the Delhi-wide mean of true PM2.5). The algorithm was also able to track the drift of low-cost nodes accurately within 4 % error for all the simulation scenarios. The simulation results showed that ~20 reference stations are optimum for our solution in Delhi and confirmed that low-cost nodes can extend the spatial precision of a network by decreasing the extent of pure interpolation among only reference stations. Our solution has substantial implications in reducing the amount of manual labor for the calibration and surveillance of extensive WLPMSNs, improving the spatial comprehensiveness of PM evaluation, and enhancing the accuracy of WLPMSNs. Satellite-based ground-level PM2.5 modeling is another such low-cost method. Satellite-retrieved aerosol products are in particular widely used to estimate the spatial distribution of ground-level PM2.5. However, these aerosol products can be subject to large uncertainties due to many approximations and assumptions made in multiple stages of their retrieval algorithms. Therefore, estimating ground-level PM2.5 directly from satellites (e.g., satellite images) by skipping the intermediate step of aerosol retrieval can potentially yield lower errors because it avoids retrieval error propagating into PM2.5 estimation and is desirable compared to current ground-level PM2.5 retrieval methods. Additionally, the spatial resolutions of estimated PM2.5 are usually constrained by those of the aerosol products and are currently largely at a comparatively coarse 1 km or greater resolution. Such coarse spatial resolutions are unable to support scientific studies that thrive on highly spatially-resolved PM2.5. These limitations have motivated us to devise a computer vision algorithm for estimating ground-level PM2.5 at a high spatiotemporal resolution by directly processing the global-coverage, daily, near real-time updated, 3 m/pixel resolution, three-band micro-satellite imagery of spatial coverages significantly smaller than 1 × 1 km (e.g., 200 × 200 m) available from Planet Labs. In this study, we employ a deep convolutional neural network (CNN) to process the imagery by extracting image features that characterize the day-to-day dynamic changes in the built environment and more importantly the image colors related to aerosol loading, and a random forest (RF) regressor to estimate PM2.5 based on the extracted image features along with meteorological conditions. We conducted the experiment on 35 AQM stations in Beijing over a period of ~3 years from 2017 to 2019. We trained our CNN-RF model on 10,400 available daily images of the AQM stations labeled with the corresponding ground-truth PM2.5 and evaluated the model performance on 2622 holdout images. Our model estimates ground-level PM2.5 accurately at a 200 m spatial resolution with a mean absolute error (MAE) as low as 10.1 μg m-3 (equivalent to 23.7% error) and Pearson and Spearman r scores up to 0.91 and 0.90, respectively. Our trained CNN from Beijing is then applied to Shanghai, a similar urban area. By quickly retraining only RF but not CNN on the new Shanghai imagery dataset, our model estimates Shanghai 10 AQM stations’ PM2.5 accurately with a MAE and both Pearson and Spearman r scores of 7.7 μg m-3 (18.6% error) and 0.85, respectively. The finest 200 m spatial resolution of ground-level PM2.5 estimates from our model in this study is higher than the vast majority of existing state-of-the-art satellite-based PM2.5 retrieval methods. And our 200 m model’s estimation performance is also at the high end of these state-of-the-art methods. Our results highlight the potential of augmenting existing spatial predictors of PM2.5 with high-resolution satellite imagery to enhance the spatial resolution of PM2.5 estimates for a wide range of applications, including pollutant emission hotspot determination, PM2.5 exposure assessment, and fusion of satellite remote sensing and low-cost air quality sensor network information. We later, however, found out that this CNN-RF sequential model, despite effectively capturing spatial variations, yields higher average PM2.5 prediction errors than its RF part alone using only meteorological conditions, most likely the result of CNN-RF sequential model being unable to fully use the information in satellite images in the presence of meteorological conditions. To break this bottleneck in PM2.5 prediction performance, we reformulated the previous CNN-RF sequential model into a RF-CNN joint model that adopts a residual learning ideology that forces the CNN part to most effectively exploit the information in satellite images that is only “orthogonal” to meteorology. The RF-CNN joint model achieved low normalized root mean square error for PM2.5 of within ~31% and normalized mean absolute error of within ~19% on the holdout samples in both Delhi and Beijing, better than the performances of both the CNN-RF sequential model and the RF part alone using only meteorological conditions. To date, few studies have used their simulated ambient PM2.5 to detect hotspots. Furthermore, even the hotspots studied in these very limited works are all “global” hotspots that have the absolute highest PM2.5 levels in the whole study region. Little is known about “local” hotspots that have the highest PM2.5 only relative to their neighbors at fine-scale community levels, even though the disparities in outdoor PM2.5 exposures and their associated risks of mortality between populations in local hotspots and coolspots within the same communities can be rather large. These limitations motivated us to concatenate a local contrast normalization (LCN) algorithm at the end of the RF-CNN joint model to automatically reveal local PM2.5 hotspots from the estimated PM2.5 maps. The RF-CNN-LCN pipeline reasonably predicts urban PM2.5 local hotspots and coolspots by capturing both the main intra-urban spatial trends in PM2.5 and the local variations in PM2.5 with urban landscape, with local hotspots relating to compact urban spatial structures while coolspots being open areas and green spaces. Based on 20 sampled representative neighborhoods in Delhi, our pipeline revealed that on average a significant 9.2 ± 4.0 μg m-3 long-term PM2.5 exposure difference existed between the local hotspots and coolspots within the same community, with Indian Gandhi International Airport area having the steepest increase of 20.3 μg m-3 from the coolest spot (the residential area immediately outside the airport) to the hottest spot (airport runway). This work provides a possible means of automatically identifying local PM2.5 hotspots at 300 m in heavily polluted megacities. It highlights the potential existence of substantial health inequalities in long-term outdoor PM2.5 exposures within even the same local neighborhoods between local hotspots and coolspots. Apart from posing serious health risks, deposition of dust and anthropogenic particulate matter (PM) on solar photovoltaics (PVs), known as soiling, can diminish solar energy production appreciably. As of 2018, the global cumulative PV capacity crossed 500 GW, of which at least 3–4% was estimated to be lost due to soiling, equivalent to ~4–6 billion USD revenue losses. In the context of a projected ~16-fold increase of global solar capacity to 8.5 TW by 2050, soiling will play an increasingly more important part in estimating and forecasting the performance and economics of solar PV installations. However, reliable soiling information is currently lacking because the existing soiling monitoring systems are expensive. This work presents a low-cost remote sensing algorithm that estimates utility-scale solar farms’ daily solar energy loss due to PV soiling by directly processing the daily (near real-time updated), 3 m/pixel resolution, and global coverage micro-satellite surface reflectance (SR) analytic product from the commercial satellite company Planet. We demonstrate that our approach can estimate daily soiling loss for a solar farm in Pune, India over three years that on average caused ~5.4% reduction in solar energy production. We further estimated that around 437 MWh solar energy was lost in total over the 3 years, equivalent to ~11799 USD, at this solar farm. Our approach’s average soiling estimation matches perfectly with the ~5.3% soiling loss reported by a previous published model for this solar farm site. Compared to other state-of-the-art PV soiling modeling approaches, the proposed unsupervised approach has the benefit of estimating PV soiling at a precisely solar farm level (as in contrast to coarse regional modeling for only large spatial grids in which a solar farm resides) and at an unprecedently high temporal resolution (i.e., 1 day) without resorting to solar farms’ proprietary solar energy generation data or knowledge about the specific components of deposited PM or these species’ dry deposition flux and other physical properties. Our approach allows solar farm owners to keep close track of the intensity of soiling at their sites and perform panel cleaning operations more strategically rather than based on a fixed schedule.
Item Open Access Cone Beam Computed Tomography Image Quality Augmentation using Novel Deep Learning Networks(2019) Zhao, YaoPurpose: Cone beam computed tomography (CBCT) plays an important role in image guidance for interventional radiology and radiation therapy by providing 3D volumetric images of the patient. However, CBCT suffers from relatively low image quality with severe image artifacts due to the nature of the image acquisition and reconstruction process. This work investigated the feasibility of using deep learning networks to substantially augment the image quality of CBCT by learning a direct mapping from the original CBCT images to their corresponding ground truth CT images. The possibility of using deep learning for scatter correction in CBCT projections was also investigated.
Methods: Two deep learning networks, i.e. a symmetric residual convolutional neural network (SR-CNN) and a U-net convolutional network, were trained to use the input CBCT images to produce high-quality CBCT images that match with the corresponding ground truth CT images. Both clinical and Monte Carlo simulated datasets were included for model training. In order to eliminate the misalignments between CBCT and the corresponding CT, rigid registration was applied to clinical database. The binary masks achieved by Otsu auto-thresholding method were applied to for Monte Carlo simulate data to avoid the negative impact of non-anatomical structures on images. After model training, a new set of CBCT images were fed into the trained network to obtain augmented CBCT images, and the performances were evaluated and compared both qualitatively and quantitatively. The augmented CBCT images were quantitatively compared to CT using the peak-signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM).
Regarding the study for using deep learning for the scatter correction in CBCT, the scatter signal for each projection was acquired by Monte Carlo simulation. U-net model was trained to predict the scatter signals based on the original CBCT projections. Then the predicted scatter components were subtracted from the original CBCT projections to obtain scatter-corrected projections. CBCT image reconstructed by the scatter-corrected projections were quantitatively compared with that reconstructed by original projections.
Results: The augmented CBCT images by both SR-CNN and U-net models showed substantial improvement in image quality. Compared to original CBCT, the augmented CBCT images also achieve much higher PSNR and SSIM in quantitative evaluation. U-net demonstrated better performance than SR-CNN in quantitative evaluation and computational speed for CBCT image quality augmentation.
With the scatter correction in CBCT projections predicted by U-net, the scatter-corrected CBCT images demonstrated substantial improvement of the image contrast and anatomical details compared to the original CBCT images.
Conclusion: The proposed deep learning models can effectively augment CBCT image quality by correcting artifacts and reducing scatter. Given their relatively fast computational speeds and great performance, they can potentially become valuable tools to substantially enhance the quality of CBCT to improve its precision for target localization and adaptive radiotherapy.
Item Open Access Data Driven Style Transfer for Remote Sensing Applications(2022) Stump, EvanRecent recognition models for remote sensing data (e.g., infrared cameras) are based upon machine learning models such as deep neural networks (DNNs) and typically require large quantities of labeled training data. However, many applications in remote sensing suffer from limited quantities of training data. To address this problem, we explore style transfer methods to leverage preexisting large and diverse datasets in more data-abundant sensing modalities (e.g., color imagery) so that they can be used to train recognition models on data-scarce target tasks. We first explore the potential efficacy of style transfer in the context of Buried Threat Detection using ground penetrating radar data. Based upon this work we found that simple pre-processing of downward-looking GPR makes it suitable to train machine learning models that are effective at recognizing threats in hand-held GPR. We then explore cross modal style transfer (CMST) for color-to-infrared stylization. We evaluate six contemporary CMST methods on four publicly-available IR datasets, the first comparison of its kind. Our analysis reveals that existing data-driven methods are either too simplistic or introduce significant artifacts into the imagery. To overcome these limitations, we propose meta-learning style transfer (MLST), which learns a stylization by composing and tuning well-behaved analytic functions. We find that MLST leads to more complex stylizations without introducing significant image artifacts and achieves the best overall performance on our benchmark datasets.
Item Open Access Deep Automatic Threat Recognition: Considerations for Airport X-Ray Baggage Screening(2020) Liang, Kevin JDeep learning has made significant progress in recent years, contributing to major advancements in many fields. One such field is automatic threat recognition, where methods based on neural networks have surpassed more traditional machine learning methods. In particular, we evaluate the performance of convolutional object detection models within the context of X-ray baggage screening at airport checkpoints. To do so, we collected a large dataset of scans containing threats from a diverse set of classes, and then trained and compared a number of models. Many currently deployed X-ray scanners contain multiple X-ray emitter-detector pairs arranged to give multiple views of the scanned object, and we find that combining predictions from these improves overall performance. We select the best-performing models fitting our design criteria and integrate them into the X-ray scanning machines, resulting in functional prototypes capable of simulating live screening deployment.
We also explore a number of subfields of deep learning with potential to improve these deep automatic threat recognition algorithms. For example, as data collection efforts are scaled up and the number threat categories are expanded, the likelihood of missing annotations will also increase, especially if this new data is collected from real airport traffic. Such a setting is actually common in object detection datasets, and we show that a positive-unlabeled learning assumption better fits the characteristics of the data. Additionally, real-world data distributions tend to drift over time or evolve cyclically with the seasons. Baggage scan images also tend to be sensitive, meaning storing data may represent a security or privacy risk. As a result, a continual learning setting may be more appropriate for these kinds of data, which we examine in the context of generative adversarial networks. Finally, the sensitivity of security applications makes understanding models especially important. We thus spend some time examining how certain popular neural networks emerge from assumptions made starting from kernel methods. Through these works, we find that deep learning methods show considerable promise to improve existing automatic threat recognition systems.
Item Open Access Deep Generative Models for Image Representation Learning(2018) Pu, YunchenRecently there has been increasing interest in developing generative models of data, offering the promise of learning based on the often vast quantity of unlabeled data. With such learning, one typically seeks to build rich, hierarchical probabilistic models that are able to
fit to the distribution of complex real data, and are also capable of realistic data synthesis. In this dissertation, novel models and learning algorithms are proposed for deep generative models.
This disseration consists of three main parts.
The first part developed a deep generative model joint analysis of images and associated labels or captions. The model is efficiently learned using variational autoencoder. A multilayered (deep) convolutional dictionary representation is employed as a decoder of the
latent image features. Stochastic unpooling is employed to link consecutive layers in the image model, yielding top-down image generation. A deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone. Excellent results are obtained on several benchmark datasets, including ImageNet, demonstrating that the proposed model achieves results that are highly competitive with similarly sized convolutional neural networks.
The second part developed a new method for learning variational autoencoders (VAEs), based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the ImageNet data, demonstrating the scalability of the model to large datasets.
The third part developed a new form of variational autoencoder, in which the joint distribution of data and codes is considered in two (symmetric) forms: (i) from observed data fed through the encoder to yield codes, and (ii) from latent codes drawn from a simple
prior and propagated through the decoder to manifest data. Lower bounds are learned for marginal log-likelihood fits observed data and latent codes. When learning with the variational bound, one seeks to minimize the symmetric Kullback-Leibler divergence of
joint density functions from (i) and (ii), while simultaneously seeking to maximize the two marginal log-likelihoods. To facilitate learning, a new form of adversarial training is developed. An extensive set of experiments is performed, in which we demonstrate state-of-the-art data reconstruction and generation on several image benchmark datasets.
Item Open Access Deep Generative Models for Vision and Language Intelligence(2018) Gan, ZheDeep generative models have achieved tremendous success in recent years, with applications in various tasks involving vision and language intelligence. In this dissertation, I will mainly discuss the contributions that I have made in this field during my Ph.D. study. Specifically, the dissertation is divided into two parts.
In the first part, I will mainly focus on one specific kind of deep directed generative model, called Sigmoid Belief Network (SBN). First, I will present a fully Bayesian algorithm for efficient learning and inference of SBN. Second, since the original SBN can be only used for binary image modeling, I will also discuss the generalization of it to model spare count-valued data for topic modeling, and sequential data for motion capture synthesis, music generation and dynamic topic modeling.
In the second part, I will mainly focus on visual captioning (i.e., image-to-text generation), and conditional image synthesis. Specifically, I will first present Semantic Compositional Network for visual captioning, and emphasize interpretability and controllability revealed in the learning algorithm, via a mixture-of-experts design, and the usage of detected semantic concepts. I will then present Triangle Generative Adversarial Network, which is a general framework that can be used for joint distribution matching and learning the bidirectional mappings between two different domains. We consider the joint modeling of image-label, image-image and image-attribute pairs, with applications in semi-supervised image classification, image-to-image translation and attribute-based image editing.
Item Open Access Deep Generative Models for Vision, Languages and Graphs(2019) Wang, WenlinDeep generative models have achieved remarkable success in modeling various types of data, ranging from vision, languages and graphs etc. They offer flexible and complementary representations for both labeled and unlabeled data. Moreover, they are naturally capable of generating realistic data. In this thesis, novel variations of generative models have been proposed for various learning tasks, which can be categorized into three parts.
In the first part, generative models are designed to learn generalized representation for images under Zero-Shot Learning (ZSL) setting. An attribute conditioned variational autoencoder is introduced, representing each class as a latent-space distribution and enabling learning highly discriminative and robust feature representations. It endows the generative model discriminative power by choosing one class that maximize the variational lower bound. I further show that the model can be naturally generalized to transductive and few-shot setting.
In the second part, generative models are proposed for controllable language generation. Specifically, two types of topic enrolled language generation models have been proposed. The first introduces a topic compositional neural language model for controllable and interpretable language generation via a mixture-of-expert model design. While the second solve the problem via a VAE framework with a topic-conditioned GMM model design. Both of the two models have boosted the performance of existing language generation systems with controllable properties.
In the third part, generative models are introduced for the broaden graph data. First, a variational homophilic embedding (VHE) model is proposed. It is a fully generative model that learns network embeddings by modeling the textual semantic information with a variational autoencoder, while accounting for the graph structure information through a homophilic prior design. Secondly, for the heterogeneous multi-task learning, a novel graph-driven generative model is developed to unifies them into the same framework. It combines graph convolutional network (GCN) with multiple VAEs, thus embedding the nodes of graph in a uniform manner while specializing their organization and usage to different tasks.
Item Open Access Deep image prior for undersampling high-speed photoacoustic microscopy.(Photoacoustics, 2021-06) Vu, Tri; DiSpirito, Anthony; Li, Daiwei; Wang, Zixuan; Zhu, Xiaoyi; Chen, Maomao; Jiang, Laiming; Zhang, Dong; Luo, Jianwen; Zhang, Yu Shrike; Zhou, Qifa; Horstmeyer, Roarke; Yao, JunjiePhotoacoustic microscopy (PAM) is an emerging imaging method combining light and sound. However, limited by the laser's repetition rate, state-of-the-art high-speed PAM technology often sacrifices spatial sampling density (i.e., undersampling) for increased imaging speed over a large field-of-view. Deep learning (DL) methods have recently been used to improve sparsely sampled PAM images; however, these methods often require time-consuming pre-training and large training dataset with ground truth. Here, we propose the use of deep image prior (DIP) to improve the image quality of undersampled PAM images. Unlike other DL approaches, DIP requires neither pre-training nor fully-sampled ground truth, enabling its flexible and fast implementation on various imaging targets. Our results have demonstrated substantial improvement in PAM images with as few as 1.4 % of the fully sampled pixels on high-speed PAM. Our approach outperforms interpolation, is competitive with pre-trained supervised DL method, and is readily translated to other high-speed, undersampling imaging modalities.Item Open Access Deep Latent-Variable Models for Natural Language Understanding and Generation(2020) Shen, DinghanDeep latent-variable models have been widely adopted to model various types of data, due to its ability to: 1) infer rich high-level information from the input data (especially in a low-resource setting); 2) result in a generative network that can synthesize samples unseen during training. In this dissertation, I will present the contributions I have made to leverage the general framework of latent-variable model to various natural language processing problems, which is especially challenging given the discrete nature of text sequences. Specifically, the dissertation is divided into two parts.
In the first part, I will present two of my recent explorations on leveraging deep latent-variable models for natural language understanding. The goal here is to learn meaningful text representations that can be helpful for tasks such as sentence classification, natural language inference, question answering, etc. Firstly, I will propose a variational autoencoder based on textual data to digest unlabeled information. To alleviate the observed posterior collapse issue, a specially-designed deconvolutional decoder is employed as the generative network. The resulting sentence embeddings greatly boost the downstream tasks performances. Then I will present a model to learn compressed/binary sentence embeddings, which is storage-efficient and applicable to on-device applications.
As to the second part, I will introduce a multi-level Variational Autoencoder (VAE) to model long-form text sequences (with as many as 60 words). A multi-level generative network is leveraged to capture the word-level, sentence-level coherence, respectively. Moreover, with a hierarchical design of the latent space, long-form and coherent texts can be more reliably produced (relative to baseline text VAE models). Semantically-rich latent representations are also obtained in such an unsupervised manner. Human evaluation further demonstrates the superiority of the proposed method.