Browsing by Subject "Data analytics"
Results Per Page
Sort Options
Item Open Access Automating Memory Management in Data Analytics(2019) Kunjir, MayureshRecent years have seen unprecedented growth in the volume, velocity, and variety of the data managed by data analytics platforms. At the same time, the skilled IT staff required to develop and operate the datacenters are growing at a much smaller pace. This trend suggests a big interest in making the data analytics platforms more autonomic (or, more popularly, self-driving). There are, however, several major challenges in this task. Firstly, multiple `one-size' systems need to co-exist and co-operate in order to support a variety of computation needs such as log processing, business predictions, and real-time analysis. Secondly, cluster resources are managed at multiple levels exhibiting complex interactions between the many distributed system components. Finally, multiple tenants share a cluster, each with specific performance expectations restricting opportunities for optimal use of resources.
We have built an integrated management platform, called Thoth, that provides a data-centric view over the data analytics system environment. This platform is used to develop multiple auto-tuning algorithms to help systems meet their performance goals. We specifically focus on memory-based data analytics considering the growing sizes of---and effectively, more aggressive use of---memory in data processing systems. Our first contribution is a cache manager targeted at multi-tenant cluster setups. It supports a novel fairness model providing guarantees to tenants on the performance speedups experienced by their workload.
Our second contribution is automatic tuning of memory management decisions taken at multiple levels during an application execution. This problem is approached in two ways: (i) A black-box modeling assisted with system internal knowledge, and (ii) An empirically-driven white-box approach. The two algorithms that we have developed significantly improve the state-of-the-art tuning techniques, while exhibiting different trade-offs between the convergence guarantees and the speed of optimization.
We expect the work presented here act as a major step towards building self-driving data processing systems, motivating further work in automating components such as physical design of data storage and root cause analysis of performance problems.
Item Open Access Knowledge-Based Statistical Inference Method for Plan Quality Quantification(2019) Zhang, JiangThe aim of the study is to develop a geometrically adaptive and statistically robust plan quality inference method. A knowledge-based plan quality inference method is proposed in this study. It references to similar plans in the history database for patient-specific plan quality evaluation. Similar plans are retrieved using a novel plan similarity metric, and dosimetric statistical inferences are obtained from the selected similar plans. Two plan quality metrics—dosimetric result probability (DRP) and dose deviation index (DDI)—are proposed to quantify plan quality amongst prior similar plans. 927 clinical approved head-and-neck treatment plans with two planning targets were exported and used as the historical database. Eight organs-at-risk (OARs), including brainstem, spinal cord, larynx, mandible, pharynx, oral cavity, left parotid and right parotid were analyzed in this study. Statistical analysis is performed to validate the similarity of the selected reference plans. 12 sub-optimal plans identified by DRP were re-planned to validate the capability of the proposed methods in identifying inferior plans, To demonstrate the potential of our proposed method as a plan quality data analytics tool, a population-wise analysis was conducted on all retrieved plans sorted every two years. A ready-to-use stand-along application was also developed to streamline the evaluation process.
After replanning, left and right parotid median dose are reduced by 31.7\% and 18.2\% respectively. 83\% of these cases would not be identified as suboptimal without the proposed similarity plan selection. The population plan quality analysis reveals that the average parotid sparing has increased by 21.7\% from 2005 to 2018. Notably, the increasing dose sparing over time in retrospective plan quality analysis is strongly correlated with the increasing dose prescription ratios to the two planning targets, revealing the collective trend in planning conventions.
The proposed similar plan retrieval and analysis methodology has been proven to be predictive of the current plan quality. Therefore, the proposed workflow can potentially be applied in the clinics as a real-time plan quality assurance tool. The proposed metrics can also serve the purpose of plan quality analytics in finding connections and historical trends in the clinical treatment planning workflow.