Browsing by Author "Chase, Jeffrey S"
Results Per Page
Sort Options
Item Open Access A Logical Controller Architecture for Network Security(2020) Yao, YuanjunNetworked infrastructure-as-a-service testbeds are evolving with higher capacity and more advanced capabilities. Modern testbeds offer stitched virtual circuit capability, programmable dataplanes with software-defined networking (SDN), and in-network processing on adjacent cloud servers. With these capabilities they are able to host virtual network service providers (NSPs) that peer and exchange traffic with edge subnets and with other NSPs in the testbeds. Testbed tenants may configure and program their NSPs to provide a range of functions and capabilities. Programmable NSPs enable innovation in network services and protocols following the pluralist philosophy of network architecture.
Advancing testbeds offer an opportunity to harness their power to deploy production NSPs with topology and value-added features tailored to the needs of specific user communities. For example, one objective of this research is to define abstractions and tools to support built-to-order virtual science networks for data-intensive science collaborations that share and exchange datasets securely at high speed. A virtual science network may combine dedicated high-speed circuits on advanced research fabrics with integrated in-network processing on virtual cloud servers, and links to exchange traffic with customer campus networks and/or testbed slices. We propose security-managed science networks with additional security features including access control, embedded virtual security appliances, and managed connectivity according to customer policy. A security-managed NSP is in essence a virtual software-defined exchange (SDX) that applies customer-specified policy to mediate connectivity.
This dissertation proposes control abstractions for dynamic NSPs, with a focus on managing security in the control plane based on programmable security policy. It defines an architecture for automated NSP controllers that orchestrate and program an NSP's SDN dataplane and manage its interactions with customers and peer NSPs. A key element of the approach is to use declarative trust logic to program the control plane: all control-plane interactions---including route advertisements, address assignments, policy controls, and governance authority---are represented as signed statements in a logic (trust datalog). NSP controllers use a logical inference engine to authorize all interactions and check for policy compliance.
To evaluate these ideas, we develop the ExoPlex controller framework for secure policy-based networking over programmable network infrastructures. An ExoPlex NSP combines a logical NSP controller with an off-the-shelf SDN controller and an existing trust logic platform (SAFE), both of which were enhanced for this project. Experiments with the software on testbeds---ExoGENI, ESnet, and Chameleon---demonstrate the power and potential of the approach. The dissertation presents the research in four parts.
The first part introduces the foundational capabilities of research testbeds that enables the approach, and presents the design of the ExoPlex controller framework to leverage those capabilities for hosted NSPs. We demonstrate a proof-of-concept deployment of an NSP with network function virtualization, an elastic dataplane, and managed traffic security on the ExoGENI testbed.
The second part introduces logical trust to structure control-plane interactions and program security policy. We show how to use declarative trust logic to address the challenges for managing identity, resource access, peering, connectivity and secure routing. We present off-the-shelf SAFE logic templates and rules to demonstrate a virtual SDX that authorizes network stitching and connectivity with logical trust.
The third part applies the controller architecture to secure policy-based interdomain routing among transit NSPs based on a logical trust plane. Signed logic exchanges propagate advertised routes and policies through the network. We show that trust logic rules capture and represent current and evolving Internet security protocols, affording protection equivalent to BGPsec for secure routing and RPKI for origin authentication. The logic also supports programmable policy for managed connectivity with end-to-end trust, allowing customers to permission the NSPs so that customer traffic does not pass through untrusted NSPs (path control).
The last part introduces SCIF, which extends logical peering and routing to incorporate customizable policies to defend against packet spoofing and route leaks. It uses trust logic to define more expressive route advertisements and compliance checks to filter advertisements that propagate outside of their intended scope. For SCIF, we extended the ExoPlex SDN dataplanes to configure ingress packet filters automatically from accepted routes (unicast Reverse Path Forwarding). We present logic templates that capture the defenses of valley-free routing and the Internet MANRS approach based on a central database of route ingress/egress policies (RADb/RPSL). We show how to extend their expressive power for stronger routing security, and complement it with path control policies that constrain the set of trusted NSPs for built-to-order internetworks.
Item Open Access An Operating System Architecture for Networked Server Infrastructure(2007-12-14) Irwin, David EmoryCollections of hardware components are the foundation of computation and consist of interconnections of different types of the same core elements: processors, disks, memory cards, I/O devices, and network links. Designing a system for managing collections of hardware is challenging because modern infrastructures (i) distribute resource control across multiple autonomous sites, (ii) operate diverse sets of hardware, and (iii) support a variety of programming models for developing and executing software services. An operating system is a software layer that manages hardware by coordinating its interaction with software. This thesis defines and evaluates an architecture for a networked operating system that manages collections of hardware in infrastructures spread across networks, such as the Internet. The foundation of a networked operating system determines how software services share a common hardware platform. A fundamental property common to all forms of resource sharing is that software services, by definition, share hardware components and do not use them forever. A lease is a natural construct for restricting the use of a shared resource to a well-defined length of time. Our architecture employs a general neutrality principle, which states that a networked operating system should be policy-neutral, since only users and site administrators, and not operating system developers, know how to manage their software and hardware. Experience building, deploying, and using a prototype has led us to view neutrality as a guiding design principle. Our hypothesis is that an operating system architecture for infrastructure resource management that focuses narrowly on leasing control of hardware provides a foundation for multi-lateral resource negotiation, arbitration, and fault tolerance. In evaluating our hypothesis we make the following contributions:*Introduce a set of design principles for networked operating systems. The principles adapt and extend principles from node operating system design to a networked environment. We evaluate existing systems with respect to these principles, describe how they deviate from them, and explore how these deviations limit the capabilities of higher level software.*Combine the idea of a reconfigurable data center with the Sharp framework for secure resource peering to demonstrate a prototype networked operating system capable of sharing aggregations of resources in infrastructures. *Design, implement, and deploy the architecture using a single programming abstraction---the lease---and show how the lease abstraction embodies the design principles of a networked operating system.*Show that leases are a foundational primitive for addressing arbitration in a networked operating system. Leasing currency defines a configurable tradeoff between proportional-share scheduling and a market economy, and also serves as a basis for implementing other forms of arbitration. *Show how combining the use of leases for long-term resource management with state recovery mechanisms provides robustness to transient faults and failures in a loosely coupled distributed system that coordinates resource allocation.*Evaluate the flexibility and performance of a prototype by managing aggregations of physical and virtual hardware present in modern data centers, and showing that the architecture could scale to manage thousands of machines. *Present case studies of integrating multiple software services including the PlanetLab network testbed, the Plush distributed application manager, and the GridEngine batch scheduler, and leverage the architecture to prototype and evaluate Jaws, a new light-weight batch scheduler that instantiates one or more virtual machines per task.Item Open Access Extensible Resource Management for Networked Virtual Computing(2007-12-14) Grit, Laura EllenAdvances in server virtualization offer new mechanisms to provideresource management for shared server infrastructures. Resourcesharing requires coordination across self-interested systemparticipants (e.g., providers from different administrative domains orthird-party brokering intermediaries). Assignments of the sharedinfrastructure must be fluid and adaptive to meet the dynamic demandsof clients. This thesis addresses the hypothesis that a new, foundational layerfor virtual computing is sufficiently powerful to support a diversityof resource management needs in a general and uniform manner.Incorporating resource management at a lower virtual computing layerprovides the ability to dynamically share server infrastructurebetween multiple hosted software environments (e.g., grid computingmiddleware and job execution systems). Resource assignments withinthe virtual layer occur through a lease abstraction, and extensiblepolicy modules define management functions. This research makes thefollowing contributions: * Defines the foundation for resource management in a virtual computinglayer. Defines protocols and extensible interfaces for formulatingresource contracts between system participants. Separates resourcemanagement functionalities across infrastructure providers,application controllers, and brokering intermediaries, and explores theimplications and limitations of this structure. * Demonstrates policy extensibility by implementing a virtualcomputing layer prototype, Shirako, and evaluating a range of resource arbitration policies for various objectives. Provides results with proportional share, priority, worst-fit, andmulti-dimensional resource slivering. * Defines a proportional share policy, WINKS, that integrates a fairqueuing algorithm with a calendar scheduler. Provides a comprehensiveset of features and extensions for virtual computing systems (e.g.,requests for multiple resources, advance reservations,multi-dimensional allocation, and dynamic resource pools). Shows thepolicy preserves fairness properties across queue transformations andcalendar operations needed to implement these extensions. * Explores at what layer, and at what granularity, decisions about resource control should occur. Shows that resource management at a lower layer can expose dynamic resource control to hosted middleware,at a modest cost in fidelity to the goals of the policy.Item Open Access Multi-version Indexing in Flash-based Key-Value Stores(2019-12-02) Misra, Pulkit A; Chase, Jeffrey S; Gehrke, Johannes; Lebeck, Alvin RItem Open Access Output Performance of Petascale File Systems(2017) Xie, BingHPC applications generate periodic output bursts for intermediate results, checkpointing and visualization. For a typical HPC application, when writing a burst its entire execution is stalled until all data reaches disks. In general, since cores must idle during a burst, a supercomputer and its I/O system must absorb the output burst fast to use compute cores efficiently.
This thesis studies the performance of file writes in supercomputer file systems under production load, including quantitative behavior analysis, component performance profiling and performance prediction modeling. Our target environment is Titan-the 4th fastest supercomputer in the world-and its Lustre parallel file stores. The results of behavior analysis and performance profiling can inform file system configuration choices and the design of I/O software in the application, operating system, and adaptive I/O middleware systems. Moreover, the predictive model we build is useful for output performance prediction of supercomputer file systems in live use.
To quantify the performance behavior of production supercomputer file systems, we introduce a statistical benchmarking methodology to measure the impact of parameter choices on burst absorption rates. Our approach combines many samples of their impacts over time to filter out interference caused by the transient congestion from competing workloads in the production setting. These samples are also used to characterize the performance of individual stages and components in the multi-stage write pipelines, and their variations over time.
We find that Titan's I/O system is variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write-sharing of files across clients (compute nodes). I/O parallelism is most effective when the application-or its I/O middleware system-distributes the I/O load so that each target stores files for multiple clients and each client writes files on multiple targets, in a balanced way with minimal contention. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify "good locations" in the machine or in the file system: component performance is driven by transient load conditions, and past performance is not a useful predictor of future performance. For example, we do not observe diurnal load patterns that are predictable.
Beyond the observation of performance variability of Titan, we also notice that: 1. The mean performance is stable and consistent over typical application run times; 2. Output performance is non-linearly related to its correlated parameters due to interference and saturation on individual stages on the path. These observations enable us to build a predictive model of expected write times of output patterns and I/O configurations, using feature transformations to capture non-linear relationships. We identify the candidate features based on the structure of the Lustre/Titan write path, and use feature transformation functions to produce a linear model space with 135,000 candidate models. By searching for the minimal mean square error in this space we identify a good model and show that it is effective.
Item Open Access SAFE: A Declarative Trust-Agile System with Linked Credentials(2016) Thummala, VamsidharSecure Access For Everyone (SAFE), is an integrated system for managing trust
using a logic-based declarative language. Logical trust systems authorize each
request by constructing a proof from a context---a set of authenticated logic
statements representing credentials and policies issued by various principals
in a networked system. A key barrier to practical use of logical trust systems
is the problem of managing proof contexts: identifying, validating, and
assembling the credentials and policies that are relevant to each trust
decision.
SAFE addresses this challenge by (i) proposing a distributed authenticated data
repository for storing the credentials and policies; (ii) introducing a
programmable credential discovery and assembly layer that generates the
appropriate tailored context for a given request. The authenticated data
repository is built upon a scalable key-value store with its contents named by
secure identifiers and certified by the issuing principal. The SAFE language
provides scripting primitives to generate and organize logic sets representing
credentials and policies, materialize the logic sets as certificates, and link
them to reflect delegation patterns in the application. The authorizer fetches
the logic sets on demand, then validates and caches them locally for further
use. Upon each request, the authorizer constructs the tailored proof context
and provides it to the SAFE inference for certified validation.
Delegation-driven credential linking with certified data distribution provides
flexible and dynamic policy control enabling security and trust infrastructure
to be agile, while addressing the perennial problems related to today's
certificate infrastructure: automated credential discovery, scalable
revocation, and issuing credentials without relying on centralized authority.
We envision SAFE as a new foundation for building secure network systems. We
used SAFE to build secure services based on case studies drawn from practice:
(i) a secure name service resolver similar to DNS that resolves a name across
multi-domain federated systems; (ii) a secure proxy shim to delegate access
control decisions in a key-value store; (iii) an authorization module for a
networked infrastructure-as-a-service system with a federated trust structure
(NSF GENI initiative); and (iv) a secure cooperative data analytics service
that adheres to individual secrecy constraints while disclosing the data. We
present empirical evaluation based on these case studies and demonstrate that
SAFE supports a wide range of applications with low overhead.
Item Open Access System Support for Strong Accountability(2009) Yumerefendi, Aydan RafetComputer systems not only provide unprecedented efficiency and
numerous benefits, but also offer powerful means and tools for
abuse. This reality is increasingly more evident as deployed software
spans across trust domains and enables the interactions of
self-interested participants with potentially conflicting goals. With
systems growing more complex and interdependent, there is a growing
need to localize, identify, and isolate faults and unfaithful behavior.
Conventional techniques for building secure systems, such as secure
perimeters and Byzantine fault tolerance, are insufficient to ensure
that trusted users and software components are indeed
trustworthy. Secure perimeters do not work across trust domains and fail
when a participant acts within the limits of the existing security
policy and deliberately manipulates the system to her own
advantage. Byzantine fault tolerance offers techniques to tolerate
misbehavior, but offers no protection when replicas collude or are
under the control of a single entity.
Complex interdependent systems necessitate new mechanisms that
complement the existing solutions to identify improper behavior and
actions, limit the propagation of incorrect information, and assign
responsibility when things go wrong. This thesis
addresses the problems of misbehavior and abuse by offering tools and
techniques to integrate accountability into computer systems. A
system is accountable if it offers means to identify and expose
semantic misbehavior by its participants. An accountable system
can construct undeniable evidence to demonstrate its correctness---the
evidence serves as explicit proof of misbehavior and can be strong enough
to be used as a basis for social sanction external to the
system.
Accountability offers strong disincentives for abuse and
misbehavior but may have to be ``designed-in'' to an application's
specific protocols, logic, and internal representation; achieving
accountability using general techniques is a challenge. Extending
responsibility to end users for actions performed by software
components on their behalf is not trivial, as it requires an ability
to determine whether a component correctly represents a
user's intentions. Leaks of private information are yet another
concern---even correctly functioning
applications can leak sensitive information, for which their owners
may be accountable. Important infrastructure services, such as
distributed virtual resource economies, offer a range of application-specific
issues such as fine-grain resource delegation, virtual
currency models, and complex work-flows.
This thesis work addresses the aforementioned problems by designing,
implementing, applying, and evaluating a generic methodology for
integrating accountability into network services and applications. Our
state-based approach decouples application state management from
application logic to enable services to demonstrate that they maintain
their state in compliance with user requests, i.e., state changes do take
place, and the service presents a consistent view to all clients and
observers. Internal state managed in this way, can then be used to feed
application-specific verifiers to determine the correctness the service's
logic and to identify the responsible party. The state-based approach
provides support for strong accountability---any detected violation
can be proven to a third party without depending on replication and
voting.
In addition to the generic state-based approach, this thesis explores how
to leverage application-specific knowledge to integrate accountability in
an example application. We study the invariants and accountability
requirements of an example application--- a lease-based virtual resource
economy. We present the design and implementation of several key elements
needed to provide accountability in the system. In particular, we describe
solutions to the problems of resource delegation, currency spending, and
lease protocol compliance. These solutions illustrate a complementary
technique to the general-purpose state-based approach, developed in the
earlier parts of this thesis.
Separating the actions of software and its user is at the heart of the
third component of this dissertation. We design, implement, and evaluate
an approach to detect information leaks in a commodity operating system.
Our novel OS abstraction---a doppelganger process---helps track
information flow without requiring application rewrite or instrumentation.
Doppelganger processes help identify sensitive data as they are about to
leave the confines of the system. Users can then be alerted about the
potential breach and can choose to prevent the leak to avoid becoming
accountable for the actions of software acting on their behalf.
Item Open Access Workload Management for Data-Intensive Services(2013) Lim, Harold Vinson ChaoData-intensive web services are typically composed of three tiers: i) a display tier that interacts with users and serves rich content to them, ii) a storage tier that stores the user-generated or machine-generated data used to create this content, and iii) an analytics tier that runs data analysis tasks in order to create and optimize new content. Each tier has different workloads and requirements that result in a diverse set of systems being used in modern data-intensive web services.
Servers are provisioned dynamically in the display tier to ensure that interactive client requests are served as per the latency and throughput requirements. The challenge is not only deciding automatically how many servers to provision but also when to provision them, while ensuring stable system performance and high resource utilization. To address these challenges, we have developed a new control policy for provisioning resources dynamically in coarse-grained units (e.g., adding or removing servers or virtual machines in cloud platforms). Our new policy, called proportional thresholding, converts a user-specified performance target value into a target range in order to account for the relative effect of provisioning a server on the overall workload performance.
The storage tier is similar to the display tier in some respects, but poses the additional challenge of needing redistribution of stored data when new storage nodes are added or removed. Thus, there will be some delay before the effects of changing a resource allocation will appear. Moreover, redistributing data can cause some interference to the current workload because it uses resources that can otherwise be used for processing requests. We have developed a system, called Elastore, that addresses the new challenges found in the storage tier. Elastore not only coordinates resource allocation and data redistribution to preserve stability during dynamic resource provisioning, but it also finds the best tradeoff between workload interference and data redistribution time.
The workload in the analytics tier consists of data-parallel workflows that can either be run in a batch fashion or continuously as new data becomes available. Each workflow is composed of smaller units that have producer-consumer relationships based on data. These workflows are often generated from declarative specifications in languages like SQL, so there is a need for a cost-based optimizer that can generate an efficient execution plan for a given workflow. There are a number of challenges when building a cost-based optimizer for data-parallel workflows, which includes characterizing the large execution plan space, developing cost models to estimate the execution costs, and efficiently searching for the best execution plan. We have built two cost-based optimizers: Stubby for batch data-parallel workflows running on MapReduce systems, and Cyclops for continuous data-parallel workflows where the choice of execution system is made a part of the execution plan space.
We have conducted a comprehensive evaluation that shows the effectiveness of each tier's automated workload management solution.