Browsing by Subject "Cloud computing"
Results Per Page
Sort Options
Item Open Access A Cloud-Based Infrastructure for Cancer Genomics(2020) Panea, Razvan IoanThe advent of new genomic approaches, particularly next generation sequencing (NGS) has resulted in explosive growth of biological data. As the size of biological data keeps growing at exponential rates, new methods for data management and data processing are becoming essential in bioinformatics and computational biology. Indeed, data analysis has now become the central challenge in genomics.
NGS has provided rich tools for defining genomic alterations that cause cancer. The processing time and computing requirements have now become a serious bottleneck to the characterization and analysis of these genomic alterations. Moreover, as the adoption of NGS continues to increase, the computing power required often exceeds what any single institution can provide, leading to major restraints in the type and number of analyses that can be performed.
Cloud computing represents a potential solution to this problem. On a cloud platform, computing resources can be available on-demand, thus allowing users to implement scalable and highly parallel methods. However, few centralized frameworks exist to allow the average researcher the ability to apply bioinformatics workflows using cloud resources. Moreover, bioinformatics approaches are associated with multiple processing challenges, such as the variability in the methods or data used and the reproducibility requirements of the research analysis.
Here, we present CloudConductor, a software system that is specifically designed to harness the power of cloud computing to perform complex analysis pipelines on large biological datasets. CloudConductor was designed with five central features in mind: scalability, modularity, parallelism, reproducibility and platform agnosticism.
We demonstrate the processing power afforded by CloudConductor on a real-world genomics problem. Using CloudConductor, we processed and analyzed 101 whole genome tumor-normal paired samples from Burkitt lymphoma subtypes to identify novel genomic alterations. We identified a total of 72 driver genes associated with the disease. Somatic events were identified in both coding and non-coding regions of nearly all driver genes, notably in genes IGLL5, BACH2, SIN3A, and DNMT1. We have developed the analysis framework by implementing a graphical user interface, a back-end database system, a data loader and a workflow management system.
In this thesis, we develop the concepts and describe an implementation of automated cloud-based infrastructure to analyze genomics data, creating a fast and efficient analysis resource for genomics researchers.
Item Open Access Cumulon: Simplified Matrix-Based Data Analytics in the Cloud(2016) Huang, BotongCumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data in public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and cloud software platforms. Given user-specified requirements in terms of time, monetary cost, and risk tolerance, Cumulon automatically makes intelligent decisions on implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings -- such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for them. Cumulon explores two alternative approaches toward supporting such markets, with different trade-offs between system and optimization complexity. Experimental study is conducted to show the efficiency of Cumulon's execution engine, as well as the optimizer's effectiveness in finding the optimal plan in the vast plan space.
Item Open Access Head into the Cloud: An Analysis of the Emerging Cloud Infrastructure(2016) Chandrasekaran, BalakrishnanWe are witnessing a paradigm shift in computing---people are increasingly using Web-based software for tasks that only a few years ago were carried out using software running locally on their computers. The increasing use of mobile devices, which typically have limited processing power, is catalyzing the idea of offloading computations to the cloud. It is within this context of cloud computing that this thesis attempts to address a few key questions: (a) With more computations moving to the cloud, what is the state of the Internet's core? In particular, do routing changes and consistent congestion in the Internet's core affect end users' experiences? (b) With software-defined networking (SDN) principles increasingly being used to manage cloud infrastructures, are the software solutions robust (i.e., resilient to bugs)? With service outage costs being prohibitively expensive, how can we support network operators in experimenting with novel ideas without crashing their SDN ecosystems? (c) How can we build a large-scale passive IP geolocation system to geolocate the entire IP address space at once so that cloud-based software can utilize the geolocation database in enhancing the end-user experience? (d) Why is the Internet so slow? Since a low-latency network allows more offloading of computations to the cloud, how can we reduce the latency in the Internet?
Item Open Access Nonlinear Prediction in Credit Forecasting and Cloud Computing Deployment Optimization(2015) Jarrett, Nicholas Walton DanielThis thesis presents data analysis and methodology for two prediction problems. The first problem is forecasting midlife credit ratings from personality information collected during early adulthood. The second problem is analysis of matrix multiplication in cloud computing.
The goal of the credit forecasting problem is to determine if there is a link between personality assessments of young adults with their propensity to develop credit in middle age. The data we use is from a long term longitudinal study of over 40 years. We do find an association between credit risk and personality in this cohort Such a link has obvious implications for lenders but also can be used to improve social utility via more efficient resource allocation
We analyze matrix multiplication in the cloud and model I/O and local computation for individual tasks. We established conditions for which the distribution of job completion times can be explicitly obtained. We further generalize these results to cases where analytic derivations are intractable.
We develop models that emulate the multiplication procedure, allowing job times for different deployment parameter settings to be emulated after only witnessing a subset of tasks, or subsets of tasks for nearby deployment parameter settings.
The modeling framework developed sheds new light on the problem of determining expected job completion time for sparse matrix multiplication.
Item Open Access Predicting Application Performance in the Cloud(2011) Zong, XuanranDespite the exceptional prominence of the cloud computing, the customers are
lack of direct sense to select the cloud that delivers the best performance,
due to the performance heterogeneity of each cloud provider. Existing solutions
either migrate the application to each cloud and evaluate the performance
individually, or benchmark each cloud along various dimensions and predict the
overall performance of the application. However, the former incurs significant
migration and configuration overhead, while the latter may suffer from coarse
prediction accuracy.
This thesis introduces two systems to address this issue. CloudProphet predicts the web
application performance by tracing and replaying the on-premise resource demand
on the cloud machines. DTRCP further predicts the performance for general
applications. In particular, it addresses the execution path divergence
manifested during replaying the on-premise resource demand. Our experiment
results show that both systems can accurately predict the application
performance.
Item Open Access Towards Systematic and Accurate Environment Selection for Emerging Cloud Applications(2012) Li, AngAs cloud computing is gaining popularity, many application owners are migrating their
applications into the cloud. However, because of the diversity of the cloud environments
and the complexity of the modern applications, it is very challenging to find out which
cloud environment is best fitted for one's application.
In this dissertation, we design and build systems to help application owners select the
most suitable cloud environments for their applications. The first part of this thesis focuses
on how to compare the general fitness of the cloud environments. We present CloudCmp,
a novel comparator of public cloud providers. CloudCmp measures the elastic computing,
persistent storage, and networking services offered by a cloud along metrics that directly
reflect their impact on the performance of customer applications. CloudCmp strives to
ensure fairness, representativeness, and compliance of these measurements while limiting
measurement cost. Applying CloudCmp to four cloud providers that together account
for most of the cloud customers today, we find that their offered services vary widely in
performance and costs, underscoring the need for thoughtful cloud environment selection.
From case studies on three representative cloud applications, we show that CloudCmp can
guide customers in selecting the best-performing provider for their applications.
The second part focuses on how to let customers compare cloud environments in the
context of their own applications. We describe CloudProphet, a novel system that can
accurately estimate an application's performance inside a candidate cloud environment
without the need of migration. CloudProphet generates highly portable shadow programs
to mimic the behavior of a real application, and deploys them inside the cloud to estimate
the application's performance. We use the trace-and-replay technique to automatically
generate high-fidelity shadows, and leverage the popular dispatcher-worker pattern
to accurately extract and enforce the inter-component dependencies. Our evaluation in
three popular cloud platforms shows that CloudProphet can help customers pick the bestperforming
cloud environment, and can also accurately estimate the performance of a
variety of applications.