Browsing by Author "Lian, Wenzhao"
Results Per Page
Sort Options
Item Open Access Modeling Time Series and Sequences: Learning Representations and Making Predictions(2015) Lian, WenzhaoThe analysis of time series and sequences has been challenging in both statistics and machine learning community, because of their properties including high dimensionality, pattern dynamics, and irregular observations. In this thesis, novel methods are proposed to handle the difficulties mentioned above, thus enabling representation learning (dimension reduction and pattern extraction), and prediction making (classification and forecasting). This thesis consists of three main parts.
The first part analyzes multivariate time series, which is often non-stationary due to high levels of ambient noise and various interferences. We propose a nonlinear dimensionality reduction framework using diffusion maps on a learned statistical manifold, which gives rise to the construction of a low-dimensional representation of the high-dimensional non-stationary time series. We show that diffusion maps, with affinity kernels based on the Kullback-Leibler divergence between the local statistics of samples, allow for efficient approximation of pairwise geodesic distances. To construct the statistical manifold, we estimate time-evolving parametric distributions by designing a family of Bayesian generative models. The proposed framework can be applied to problems in which the time-evolving distributions (of temporally localized data), rather than the samples themselves, are driven by a low-dimensional underlying process. We provide efficient parameter estimation and dimensionality reduction methodology and apply it to two applications: music analysis and epileptic-seizure prediction.
The second part focuses on a time series classification task, where we want to leverage the temporal dynamic information in the classifier design. In many time series classification problems including fraud detection, a low false alarm rate is required; meanwhile, we enhance the positive detection rate. Therefore, we directly optimize the partial area under the curve (PAUC), which maximizes the accuracy in low false alarm rate regions. Latent variables are introduced to incorporate the temporal information, while maintaining a max-margin based method solvable. An optimization routine is proposed with its properties analyzed; the algorithm is designed as scalable to web-scale data. Simulation results demonstrate the effectiveness of optimizing the performance in the low false alarm rate regions.
The third part focuses on pattern extraction from correlated point process data, which consist of multiple correlated sequences observed at irregular times. The analysis of correlated point process data has wide applications, ranging from biomedical research to network analysis. We model such data as generated by a latent collection of continuous-time binary semi-Markov processes, corresponding to external events appearing and disappearing. A continuous-time modeling framework is more appropriate for multichannel point process data than a binning approach requiring time discretization, and we show connections between our model and recent ideas from the discrete-time literature. We describe an efficient MCMC algorithm for posterior inference, and apply our ideas to both synthetic data and a real-world biometrics application.
Item Open Access Multichannel electrophysiological spike sorting via joint dictionary learning and mixture modeling(IEEE Transactions on Biomedical Engineering, 2014-01-01) Carlson, David E; Vogelstein, Joshua T; Wu, Qisong; Lian, Wenzhao; Zhou, Mingyuan; Stoetzner, Colin R; Kipke, Daryl; Weber, Douglas; Dunson, David B; Carin, LawrenceWe propose a methodology for joint feature learning and clustering of multichannel extracellular electrophysiological data, across multiple recording periods for action potential detection and classification (sorting). Our methodology improves over the previous state of the art principally in four ways. First, via sharing information across channels, we can better distinguish between single-unit spikes and artifacts. Second, our proposed "focused mixture model" (FMM) deals with units appearing, disappearing, or reappearing over multiple recording days, an important consideration for any chronic experiment. Third, by jointly learning features and clusters, we improve performance over previous attempts that proceeded via a two-stage learning process. Fourth, by directly modeling spike rate, we improve the detection of sparsely firing neurons. Moreover, our Bayesian methodology seamlessly handles missing data. We present the state-of-the-art performance without requiring manually tuning hyperparameters, considering both a public dataset with partial ground truth and a new experimental dataset. © 2013 IEEE.Item Open Access Predictive Models for Point Processes(2015) Lian, WenzhaoPoint process data are commonly observed in fields like healthcare and social science. Designing predictive models for such event streams is an under-explored problem, due to often scarce training data. In this thesis, a multitask point process model via a hierarchical Gaussian Process (GP) is proposed, to leverage statistical strength across multiple point processes. Nonparametric learning functions implemented by a GP, which map from past events to future rates, allow analysis of flexible arrival patterns. To facilitate efficient inference, a sparse construction for this hierarchical model is proposed, and a variational Bayes method is derived for learning and inference. Experimental results are shown on both synthetic data and as well as real electronic health-records data.