Show simple item record

dc.contributor.advisor Yang, Jun en_US
dc.contributor.author Chandramouli, Badrish en_US
dc.date.accessioned 2008-08-15T11:57:00Z
dc.date.available 2008-08-15T11:57:00Z
dc.date.issued 2008-08-01 en_US
dc.identifier.uri http://hdl.handle.net/10161/698
dc.description Dissertation en_US
dc.description.abstract <p>With the advent of Web 2.0 and the Digital Age, we are witnessing an unprecedented increase in the amount of information collected, and in the number of users interested in different types of information. This growth means that traditional techniques, where users poll data sources for information of interest, are no longer sufficient. Polling too frequently does not scale, while polling less often may result in users missing important updates. The alternative push technology has long been the goal of publish/subscribe systems, which proactively push updates (events) to users with matching interests (expressed as subscriptions). The push model is better suited for ensuring scalability and timely delivery of updates, important in many application domains: personal (e.g., RSS feeds, online auctions), financial (e.g., portfolio monitoring), security (e.g., reporting network anomalies), etc.</p><p>Early publish/subscribe systems were based on predefined subjects (channels), and were too coarse-grained to meet the specific interests of different subscribers. The second generation of content-based publish/subscribe systems offer greater flexibility by supporting subscriptions defined as predicates over message contents. However, subscriptions are still stateless filters over individual messages, so they cannot express queries across different messages or over the event history. The few systems that support more powerful database-style subscriptions do not address the problem of efficiently delivering updates to a large number of subscribers over a wide-area network. Thus, there is a need to develop next-generation publish/subscribe systems that unify the support for richer database-style subscription queries and flexible wide-area notification. This support needs to be complemented with robust processing and dissemination techniques that scale to high event rates and large databases, as well as to a large number of subscribers over the Internet.</p><p>The main contribution of our work is a collection of techniques to support efficient and scalable event processing and notification dissemination for an Internet-scale publish/subscribe system with a rich subscription model. We investigate the interface between event processing by a database server and notification delivery by a dissemination network. Previous research in publish/subscribe has largely been compartmentalized; database-centric and network-centric approaches each have their own limitations, and simply putting them together does not lead to an efficient solution. A closer examination of database/network interfaces yields a spectrum of new and interesting possibilities. In particular, we propose message and subscription reformulation as general techniques to support stateful subscriptions over existing content-driven networks, by converting them into equivalent but stateless forms. We show how reformulation can successfully be applied to various stateful subscriptions including range-aggregation, select-joins, and subscriptions with value-based notification conditions. These techniques often provide orders-of-magnitude improvement over simpler techniques adopted by current systems, and are shown to scale to millions of subscriptions. Further, the use of a standard off-the-shelf content-driven dissemination interface allows these techniques to be easily deployed, managed, and maintained in a large-scale system.</p><p>Based on our findings, we have built a high-performance publish/subscribe system named ProSem (to signify the inseparability of database processing and network dissemination). ProSem uses our novel techniques for group-processing many types of complex and expressive subscriptions, with a per-event optimization framework that chooses the best processing and dissemination strategy at runtime based on online statistics and system objectives.</p> en_US
dc.format.extent 3205992 bytes
dc.format.mimetype application/pdf
dc.language.iso en_US
dc.subject Computer Science en_US
dc.subject database en_US
dc.subject publish en_US
dc.subject subscribe en_US
dc.subject network en_US
dc.subject processing en_US
dc.subject dissemination en_US
dc.title Unifying Databases and Internet-Scale Publish/Subscribe en_US
dc.type Dissertation en_US
dc.department Computer Science en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record