Browsing by Subject "Time series"
Results Per Page
Sort Options
Item Open Access An Analysis Comparing Mangrove Conditions under Different Management Scenarios in Southeast Asia(2017-04-27) Shi, CongjieMangroves in Phang Nga Bay, Thailand and in Matang Mangrove Reserve, Malaysia serve a variety of crucial ecosystem services. However, they are threatened by various natural and human-influenced factors such as tsunami damage and development in recent decades. This project provides a look at how distribution and status of mangrove forests have changed over time and how mangrove health changes over time. Selected Landsat 5 TM images from 2000 to 2010 were analyzed to classify the land use changes by object-oriented method using feature extraction and by supervised classification. The expansion in urban development and agriculture is concerning for both Thailand and Malaysia according to the literature review (Gopal and Chauhan 2006; Giri et al. 2008). The Phang Nga Bay mangroves experienced significant 6.3% decline from 2003 to 2010 according to the supervised classification with tasseled-cap transformation. The Matang mangroves experienced a 3.95% decline from 2000 to 2010 according to the supervised classification. Although these mangroves are declining at a slower rate than the reported national and global average, the rate of decrease is still concerning compare to other Southeast Asian mangroves. We also examined the overall characteristics such as EVI, NDVI, GPP, and NDWI using Google Earth Engine to compare the overall patterns in the two study areas. There is no significant difference in EVI between the two study areas. The EVI value is 0.54 for the site in Thailand and 0.52 for the site in Malaysia. NDVI is higher for mangroves in the Thai site (0.61) than the Malaysian site (0.42). Mangroves at the Malaysian site has higher GPP and NDWI. The mean GPP for the site in Malaysia is 354 kg*C/m^2, while the mean GPP is only 217 kg*C/m^2 for the site in Thailand. The trend in GPP can be fit into an ARIMA(1, 0, 1)*(1, 0, 0)46 model for the Thai site and an ARIMA(2, 0, 1)*(1, 0, 0)46 model for the Malaysia site. The NDWI values are 0.149 and 0.137 for the Malaysian site and the Thai site correspondingly. The derived indices (tasseled cap, NDVI, and SAVI) were used to classify the mangrove areas into subclasses. An EO-1 Hyperion imagery from 2014 was examined to classify mangrove types in the Thai study area. We were able to classify mangroves into edge, island, riverine, estuary, and inland types based on the good spectral bands. A spectral library for the region or field data is necessary for more exact species classification. In terms of management, the local conservation departments and national park services in Thailand need to reach out more frequently to the local community and educate the fishermen and hoteliers about the ecosystem services of mangroves. It can be worthwhile for Matang forest managers to test the mixed block method with managed and natural mangrove patches to sustain biodiversity and ecological function of mangrove forests.Item Open Access Bayesian Computation for Variable Selection and Multivariate Forecasting in Dynamic Models(2020) Lavine, IsaacChallenges arise in time series analysis due to the need for sequential forecasting and updating of model parameters as data is observed. This dissertation presents techniques for efficient Bayesian computation in multivariate time series analysis. Computational scalability is a core focus of this work, and often rests on the decouple-recouple concept in which multivariate models are decoupled into univariate models for efficient inference, and then recoupled to produce joint forecasts. The first section of this dissertation develops novel methods for variable selection in which models are scored and weighted based on specific forecasting and decision goals. In the time series setting, standard marginal likelihoods correspond to 1−step forecast densities, and considering alternate objectives is shown to improve long-term forecast accuracy. Scoring models based on forecast objectives can be computationally intensive, so the model space is reduced by evaluating univariate models separately along each dimension. This enables an efficient search over large, higher dimensional model spaces. A second area of focus in this dissertation is product demand forecasting, driven by applied considerations in grocery store sales. A novel copula model is developed for multivariate forecasting with Dynamic Generalized Linear Models (DGLMs), with a variational Bayes strategy for inference in latent factor DGLMs. Three applied case studies demonstrate that these techniques increase computational efficiency by several orders of magnitude over comparable multivariate models, without any loss of forecast accuracy. An additional area of interest in product demand forecasting is the effect of holidays and special events. An error correction model is introduced for this context, demonstrating strong predictive performance across a variety of holidays and retail item categories. Finally, a new Python package for Bayesian DGLM analysis, PyBATS, provides a set of tools for user-friendly analysis of univariate and multivariate time series.
Item Open Access Causal Inference for Natural Language Data and Multivariate Time Series(2023) Tierney, GrahamThe central theme of this dissertation is causal inference for complex data, and highlighting how for certain estimation problems, collecting more data has limited benefit. The central application areas are natural language data and multivariate time series. For text, large language models are trained on predictive tasks not necessarily well-suited for causal inference. Moreover, documents that vary in some treatment feature will often also vary systematically in other, unknown ways that prohibit attribution of causal effects to the feature of interest. Multivariate time series, even with high-quality contemporaneous predictors, still exhibit positive dependencies such that even with many treated and control units, the amount of information available to estimate causal quantities is quite low.
Chapter 2 builds a model for short text, as is typically found on social media platforms. Chapter 3 analyzes a randomized experiment that paired Democrats and Republicans to have a conversation about politics, then develops a sensitivity procedure to test for mediation effects attributable to the politeness of the conversation. Chapter 4 expands on the limitations of observational, model-based methods for causal inference with text and designs an experiment to validate how significant those limitations are. Chapter 5 covers experimentation with multivariate time series.
The general conclusion from these chapters is that causal inference always requires untestable assumptions. A researcher trying to make causal conclusions needs to understand the underlying structure of the problem they are studying to validate whether those assumptions hold. The work here shows how to still conduct causal analysis when commonly made assumptions are violated.
Item Open Access Computational Methods for Investigating Dendritic Cell Biology(2011) de Oliveira Sales, Ana PaulaThe immune system is constantly faced with the daunting task of protecting the host from a large number of ever-evolving pathogens. In vertebrates, the immune response results from the interplay of two cellular systems: the innate immunity and the adaptive immunity. In the past decades, dendritic cells have emerged as major players in the modulation of the immune response, being one of the primary links between these two branches of the immune system.
Dendritic cells are pathogen-sensing cells that alert the rest of the immune system of the presence of infection. The signals sent by dendritic cells result in the recruitment of the appropriate cell types and molecules required for effectively clearing the infection. A question of utmost importance in our understanding of the immune response and our ability to manipulate it in the development of vaccines and therapies is: "How do dendritic cells translate the various cues they perceive from the environment into different signals that specifically activate the appropriate parts of the immune system that result in an immune response streamlined to clear the given pathogen?"
Here we have developed computational and statistical methods aimed to address specific aspects of this question. In particular, understanding how dendritic cells ultimately modulate the immune response requires an understanding of the subtleties of their maturation process in response to different environmental signals. Hence, the first part of this dissertation focuses on elucidating the changes in the transcriptional
program of dendritic cells in response to the detection of two common pathogen- associated molecules, LPS and CpG. We have developed a method based on Langevin and Dirichlet processes to model and cluster gene expression temporal data, and have used it to identify, on a large scale, genes that present unique and common transcriptional behaviors in response to these two stimuli. Additionally, we have also investigated a different, but related, aspect of dendritic cell modulation of the adaptive immune response. In the second part of this dissertation, we present a method to predict peptides that will bind to MHC molecules, a requirement for the activation of pathogen-specific T cells. Together, these studies contribute to the elucidation of important aspects of dendritic cell biology.
Item Open Access Dynamic modeling and Bayesian predictive synthesis(2017) McAlinn, KenichiroThis dissertation discusses model and forecast comparison, calibration, and combination from a foundational perspective. For nearly five decades, the field of forecast combination has grown exponentially. Its practicality and effectiveness in important real world problems concerning forecasting, uncertainty, and decisions propels this. Ample research-- theoretical and empirical-- into new methods and justifications have been produced. However, its foundations-- the philosophical/theoretical underpinnings on which methods and strategies are built upon-- have been unexplored in recent literature. Bayesian predictive synthesis (BPS) defines a coherent theoretical basis for combining multiple forecast densities, whether from models, individuals, or other sources, and generalizes existing forecast pooling and Bayesian model mixing methods. By understanding the underlying foundation that defines the combination of forecasts, multiple extensions are revealed, resulting in significant advances in the understanding and efficacy of the methods for decision making in multiple fields.
The extensions discussed in this dissertation are into the temporal domain. Many important decision problems are time series, including policy decisions in macroeconomics and investment decisions in finance, where decisions are sequentially updated over time. Time series extensions of BPS are implicit dynamic latent factor models, allowing adaptation to time-varying biases, mis-calibration, and dependencies among models or forecasters. Multiple studies using different data and different decision problems are presented, demonstrating the effectiveness of dynamic BPS, in terms of forecast accuracy and improved decision making, and highlighting the unique insight it provides.
Item Open Access Dynamic Risk Frameworks for Commodity Portfolio Optimization(2013-04-17) Martorana, David; Piccolella, ChristopherCommodities are an important yet poorly understood asset class, but outsized losses and gains in commodities in recent years have garnered public attention. Partly in reaction to financial market crashes, the suite of risk management tools has expanded considerably since 1996 when J.P. Morgan published Value-at-Risk. In parallel, significant advancements in financial forecasting have been made since Engle’s ARCH model. We compare these and other tools in extreme commodity market environments and observe that dollar-denomination effects, high volatility, and high correlation adversely affect their performance. Our results have implications for investors, commercial hedgers, and regulators tasked with reducing systemic risk.Item Open Access Gaussian Process Kernels for Cross-Spectrum Analysis in Electrophysiological Time Series(2016) Ulrich, Kyle RichardMulti-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.
Item Open Access Modeling Time Series and Sequences: Learning Representations and Making Predictions(2015) Lian, WenzhaoThe analysis of time series and sequences has been challenging in both statistics and machine learning community, because of their properties including high dimensionality, pattern dynamics, and irregular observations. In this thesis, novel methods are proposed to handle the difficulties mentioned above, thus enabling representation learning (dimension reduction and pattern extraction), and prediction making (classification and forecasting). This thesis consists of three main parts.
The first part analyzes multivariate time series, which is often non-stationary due to high levels of ambient noise and various interferences. We propose a nonlinear dimensionality reduction framework using diffusion maps on a learned statistical manifold, which gives rise to the construction of a low-dimensional representation of the high-dimensional non-stationary time series. We show that diffusion maps, with affinity kernels based on the Kullback-Leibler divergence between the local statistics of samples, allow for efficient approximation of pairwise geodesic distances. To construct the statistical manifold, we estimate time-evolving parametric distributions by designing a family of Bayesian generative models. The proposed framework can be applied to problems in which the time-evolving distributions (of temporally localized data), rather than the samples themselves, are driven by a low-dimensional underlying process. We provide efficient parameter estimation and dimensionality reduction methodology and apply it to two applications: music analysis and epileptic-seizure prediction.
The second part focuses on a time series classification task, where we want to leverage the temporal dynamic information in the classifier design. In many time series classification problems including fraud detection, a low false alarm rate is required; meanwhile, we enhance the positive detection rate. Therefore, we directly optimize the partial area under the curve (PAUC), which maximizes the accuracy in low false alarm rate regions. Latent variables are introduced to incorporate the temporal information, while maintaining a max-margin based method solvable. An optimization routine is proposed with its properties analyzed; the algorithm is designed as scalable to web-scale data. Simulation results demonstrate the effectiveness of optimizing the performance in the low false alarm rate regions.
The third part focuses on pattern extraction from correlated point process data, which consist of multiple correlated sequences observed at irregular times. The analysis of correlated point process data has wide applications, ranging from biomedical research to network analysis. We model such data as generated by a latent collection of continuous-time binary semi-Markov processes, corresponding to external events appearing and disappearing. A continuous-time modeling framework is more appropriate for multichannel point process data than a binning approach requiring time discretization, and we show connections between our model and recent ideas from the discrete-time literature. We describe an efficient MCMC algorithm for posterior inference, and apply our ideas to both synthetic data and a real-world biometrics application.
Item Open Access Predictive Models for Point Processes(2015) Lian, WenzhaoPoint process data are commonly observed in fields like healthcare and social science. Designing predictive models for such event streams is an under-explored problem, due to often scarce training data. In this thesis, a multitask point process model via a hierarchical Gaussian Process (GP) is proposed, to leverage statistical strength across multiple point processes. Nonparametric learning functions implemented by a GP, which map from past events to future rates, allow analysis of flexible arrival patterns. To facilitate efficient inference, a sparse construction for this hierarchical model is proposed, and a variational Bayes method is derived for learning and inference. Experimental results are shown on both synthetic data and as well as real electronic health-records data.
Item Open Access SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data.(Genome Biol, 2016-05-23) Welch, Joshua D; Hartemink, Alexander J; Prins, Jan FSingle cell experiments provide an unprecedented opportunity to reconstruct a sequence of changes in a biological process from individual "snapshots" of cells. However, nonlinear gene expression changes, genes unrelated to the process, and the possibility of branching trajectories make this a challenging problem. We develop SLICER (Selective Locally Linear Inference of Cellular Expression Relationships) to address these challenges. SLICER can infer highly nonlinear trajectories, select genes without prior knowledge of the process, and automatically determine the location and number of branches and loops. SLICER recovers the ordering of points along simulated trajectories more accurately than existing methods. We demonstrate the effectiveness of SLICER on previously published data from mouse lung cells and neural stem cells.