Integrated Management of the Persistent-Storage and Data-Processing Layers in Data-Intensive Computing Systems
Over the next decade, it is estimated that the number of servers (virtual and physical) in enterprise datacenters will grow by a factor of 10, the amount of data managed by these datacenters will grow by a factor of 50, and the number of files the datacenter has to deal with will grow by a factor of 75. Meanwhile, skilled information technology (IT) staff to manage the growing number of servers and data will increase less than 1.5 times. Thus, a system administrator will face the challenging task of managing larger and larger numbers of production systems. We have developed solutions to make the system administrator more productive by automating some of the hard and time-consuming tasks in system management. In particular, we make new contributions in the Monitoring, Problem Diagnosing, and Testing phases of the system management cycle.
We start by describing our contributions in the Monitoring phase. We have developed a tool called Amulet that can continuously monitor and proactively detect problems on production systems. A notoriously hard problem that Amulet can detect is that of data corruption where bits of data in persistent storage differ from their true values. Once a problem is detected, our DiaDS tool helps in diagnosing the cause of the problem. DiaDS uses a novel combination of machine learning techniques and domain knowledge encoded in a symptoms database to guide the system administrator towards the root cause of the problem.
Before applying any change (e.g., changing a configuration parameter setting) to the production system, the system administrator needs to thoroughly understand the effect that this change can have. Well-meaning changes to production systems have led to performance or availability problems in the past. For this phase, our Flex tool enables administrators to evaluate the change hypothetically in a manner that is fairly accurate while avoiding overheads on the production system. We have conducted a comprehensive evaluation of Amulet, DiaDS, and Flex in terms of effectiveness, efficiency, integration of these contributions in the system management cycle, and how these tools bring data-intensive computing systems closer the goal of self-managing systems.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations