Browsing by Subject "Time series analysis"
- Results Per Page
- Sort Options
Item Open Access Applications of Topological Data Analysis and Sliding Window Embeddings for Learning on Novel Features of Time-Varying Dynamical Systems(2017) Ghadyali, Hamza MustafaThis work introduces geometric and topological data analysis (TDA) tools that can be used in conjunction with sliding window transformations, also known as delay-embeddings, for discovering structure in time series and dynamical systems in an unsupervised or supervised learning framework. For signals of unknown period, we introduce an intuitive topological method to discover the period, and we demonstrate its use in synthetic examples and real temperature data. Alternatively, for almost-periodic signals of known period, we introduce a metric called Geometric Complexity of an Almost Periodic signal (GCAP), based on a topological construction, which allows us to continuously measure the evolving variation of its periods. We apply this method to temperature data collected from over 200 weather stations in the United States and describe the novel patterns that we observe. Next, we show how geometric and TDA tools can be used in a supervised learning framework. Seizure-detection using electroencephalogram (EEG) data is formulated as a binary classification problem. We define new collections of geometric and topological features of multi-channel data, which utilizes temporal and spatial context of EEG, and show how it results in better overall performance of seizure detection than using the usual time-domain and frequency domain features. Finally, we introduce a novel method to sonify persistence diagrams, and more generally any planar point cloud, using a modified version of the harmonic table. This auditory display can be useful for finding patterns that visual analysis alone may miss.
Item Open Access Damming Uncertainty: Creating Accurate and Resilient Models for Inflow Forecasting(2022-04-22) Culberson, Benjamin; Vanover, Abi; Xue, KeyangOn the border between Paraguay and southern Brazil lies the Itaipu Binacional Dam, the world’s second largest hydroelectric dam. Both countries contributed to the construction of the dam, which began in 1971. The dam started operation of the first two turbines in 1984, the last turbine started operation in 2007. When it was finally finished, the Itaipu dam possessed twenty turbines with a total of 14,000 MW of installed capacity. Itaipu Binacional holds two of these turbines in reserve in the event of a mechanical issue with one of the other eighteen. With these eighteen turbines, the dam can still produce up to 12,600 MW at any given moment. In a treaty signed in 1973, both Brazil and Paraguay have agreed to equally share the dam’s power output (6,300 MW maximum each). This power allocation is enough to cover 85% of Paraguay’s energy needs and 8% of Brazil’s. Paraguay’s share more than covers their energy needs; terms of the treaty allow them to sell their surplus to Brazil at production cost. As the treaty between the two countries expires in 2023, the negotiations will be intense. Brazil wants to reallocate the power generation from the dam to give it a larger share of the Itaipu energy production, while Paraguay wants to keep the status quo of equal power distribution. Paraguay is also pushing to be able to sell the surplus electricity to third parties at market price. As both countries position themselves for the renegotiation of a future power-sharing agreement, accurate forecasts of future power outputs will become ever more critical. Itaipu Binacional has consistently improved its water inflow forecasting models over the past five decades, and with each improvement, the dam has been able to produce an increasing amount of power. The improvements to these models are so consequential that although the Parana River region, where Itaipu is located, has been under drought conditions for years, the dam currently produces more power than it ever has in its current lifetime. However, properly forecasting inflows into the dam remains challenging and there is still room for improvement. The primary models Itaipu uses to predict these future inflows are deterministic, which means that they predict a single value rather than a range of values. In essence, they do not forecast with uncertainty. Furthermore, these models do not fully capture the non-linear relationship of incremental inflows and other factors that influence hydrological models, such as precipitation. To aid Itaipu Binacional with forecasting future power outputs and to give the engineers there a greater understanding of forecast uncertainty, we constructed an artificial neural network (ANN) to predict future water inflows into the Itaipu water reservoir. This ANN uses repeated iteration to gradually reach an understanding of the relationship between exogenous input variables that may influence the rate of incremental inflow into the Itaipu Dam and the incremental inflows themselves. This iterative process relies on trial and error to form these relationships; eventually the ANN model will find an optimal connection between the inputs and the incremental inflows such that the ANN can accurately predict incremental inflows just by looking at the inputs. The final ANN model can outperform a standard autoregressive integrated moving average (ARIMA) time series forecast in many situations and can help the engineers at Itaipu Binacional more comprehensively understand inflow uncertainty to Itaipu. Even in the situations in which the ARIMA model more accurately forecasts incremental inflows, the ANN model still consistently provides more useful information to the user. The ARIMA model forecasts quite conservatively and fails to model the variability of the incremental inflows. Past data shows incremental inflows into Itaipu to be constantly increasing or decreasing, and never stagnant for long. In general, the final ANN model more accurately predicts this variability while the ARIMA generally forecasts a linear trend, a linear trend that does not align with past observed inflows. Thus, the ANN model, when combined with the reasonably accurate models currently used by Itaipu Binacional, provides much more insight than the ARIMA model. For the operators at the dam to optimize power production, they will need as much information as possible about future extreme inflows. For the purposes of providing this kind of information, the ANN model is significantly more useful than the ARIMA model. While the ANN model is unlikely to replace Itaipu Binacional’s current deterministic hydrological models, its ability to assist in the forecast of extreme incremental inflows into the dam means it can provide value to the engineers at Itaipu Binacional.Item Open Access MULTITAPER WAVE-SHAPE F-TEST FOR DETECTING NON-SINUSOIDAL OSCILLATIONS(2023-04-25) Liu, YijiaMany practical periodic signals are not sinusoidal and contami nated by complicated noise. The traditional spectral approach is limited in this case due to the energy spreading caused by the non-sinusoidal oscillation. We systematically study the multitaper spectral estimate and generalize the Thomson’s F-statistic under the setup physically dependent random process to analyze periodic signals of this kind. The developed statistic is applied to estimate the walking activity from the actinogram signals.Item Open Access Programming DNA for molecular-scale temporal barcoding and enzymatic computation(2020) Shah, ShalinDNA, the blueprint of life, is more than a carrier of genetic information. It offers a highly programmable substrate that can be used for computing, nanorobotics, and advanced imaging techniques. In this work, we use the programmable nature of synthetic DNA to engineer two novel applications. In the first part, DNA is programmed to improve the multiplexing capabilities of a fluorescence microscope while in the second part, we design a novel DNA computing architecture that uses a strand displacing polymerase enzyme. This thesis is a collection of 2 experimental papers, 2 theory papers, and 1 software paper. The general theme of this thesis is to exploit the programmable nature of DNA to develop new applications for the wider field of molecular biology, nanoimaging, and computer engineering.
Optical multiplexing is defined as the ability to study, detect, or quantify multiple objects of interest simultaneously. There are several ways to improve optical multiplexing, namely, using orthogonal wavelengths, multiple mesoscale geometries, orthogonal nucleic acid probes, or a combination of these. Most traditional techniques employ either the geometry or the color of single molecules to uniquely identify (or barcode) different species of interest. However, these techniques require complex sample preparation and multicolor hardware setup. In this work, we introduce a time-based amplification-free single-molecule barcoding technique using easy-to-design nucleic acid strands. A dye-labeled complementary reporter strand transiently binds to the programmed nucleic acid strands to emit temporal intensity signals. We program the DNA strands to emit uniquely identifiable temporal signals for molecular-scale fingerprinting. Since the reporters bind transiently to DNA devices, our method offers relative immunity to photobleaching. We use a single universal reporter strand for all DNA devices making our design extremely cost-effective. We show DNA strands can be programmed for generating a multitude of uniquely identifiable molecular barcodes. Our technique can be easily incorporated with the existing orthogonal methods that use wavelength or geometry to generate a large pool of distinguishable molecular barcodes thereby enhancing the overall multiplexing capabilities of single-molecule imaging. The proposed project has exciting transformative potential for nanoscale applications in fluorescence microscopy and cell biology since the development of temporal barcodes would allow for applications such as sensing miRNAs which are largely associated with disease diagnosis and therapeutics.
The regulation of cellular and molecular processes typically involves complex biochemical networks. Synthetic nucleic acid reaction networks (both enzyme-based and enzyme-free) can be systematically designed to approximate sophisticated biochemical processes. However, most of the prior experimental protocols for chemical reaction networks (CRNs) relied on either strand-displacement hybridization or restriction and exonuclease enzymatic reactions. These resulting synthetic systems usually suffer from either slow rates or leaky reactions. This work proposes an alternative architecture to implement arbitrary reaction networks, that is based entirely on strand-displacing polymerase reactions with nonoverlapping I/O sequences. First, the design for a simple protocol that can approximate arbitrary unimolecular and bimolecular reactions using polymerase strand displacement reactions is presented. Then these fundamental reaction systems are used as modules to show large-scale applications of the architecture, including an autocatalytic amplifier, a molecular-scale consensus protocol, and a dynamic oscillatory system. Finally, we engineer an \textit{in vitro} catalytic amplifier system as a proof-of-concept of our polymerase architecture since such sustainable amplifiers require careful sequence design and implementation.
Item Open Access Qualitative Performance Analysis for Large-Scale Scientific Workflows(2008-05-30) Buneci, EmmaToday, large-scale scientific applications are both data driven and distributed. To support the scale and inherent distribution of these applications, significant heterogeneous and geographically distributed resources are required over long periods of time to ensure adequate performance. Furthermore, the behavior of these applications depends on a large number of factors related to the application, the system software, the underlying hardware, and other running applications, as well as potential interactions among these factors.
Most Grid application users are primarily concerned with obtaining the result of the application as fast as possible, without worrying about the details involved in monitoring and understanding factors affecting application performance. In this work, we aim to provide the application users with a simple and intuitive performance evaluation mechanism during the execution time of their long-running Grid applications or workflows. Our performance evaluation mechanism provides a qualitative and periodic assessment of the application's behavior by informing the user whether the application's performance is expected or unexpected. Furthermore, it can help improve overall application performance by informing and guiding fault-tolerance services when the application exhibits persistent unexpected performance behaviors.
This thesis addresses the hypotheses that in order to qualitatively assess application behavioral states in long-running scientific Grid applications: (1) it is necessary to extract temporal information in performance time series data, and that (2) it is sufficient to extract variance and pattern as specific examples of temporal information. Evidence supporting these hypotheses can lead to the ability to qualitatively assess the overall behavior of the application and, if needed, to offer a most likely diagnostic of the underlying problem.
To test the stated hypotheses, we develop and evaluate a general qualitative performance analysis framework that incorporates (a) techniques from time series analysis and machine learning to extract and learn from data, structural and temporal features associated with application performance in order to reach a qualitative interpretation of the application's behavior, and (b) mechanisms and policies to reason over time and across the distributed resource space about the behavior of the application.
Experiments with two scientific applications from meteorology and astronomy comparing signatures generated from instantaneous values of performance data versus those generated from temporal characteristics support the former hypothesis that temporal information is necessary to extract from performance time series data to be able to accurately interpret the behavior of these applications. Furthermore, temporal signatures incorporating variance and pattern information generated for these applications reveal signatures that have distinct characteristics during well-performing versus poor-performing executions. This leads to the framework's accurate classification of instances of similar behaviors, which represents supporting evidence for the latter hypothesis. The proposed framework's ability to generate a qualitative assessment of performance behavior for scientific applications using temporal information present in performance time series data represents a step towards simplifying and improving the quality of service for Grid applications.