dc.description.abstract |
<p>With the advent of Web 2.0 and the Digital Age, we are witnessing an unprecedented
increase in the amount of information collected, and in the number of users interested
in different types of information. This growth means that traditional techniques,
where users poll data sources for information of interest, are no longer sufficient.
Polling too frequently does not scale, while polling less often may result in users
missing important updates. The alternative push technology has long been the goal
of publish/subscribe systems, which proactively push updates (events) to users with
matching interests (expressed as subscriptions). The push model is better suited for
ensuring scalability and timely delivery of updates, important in many application
domains: personal (e.g., RSS feeds, online auctions), financial (e.g., portfolio monitoring),
security (e.g., reporting network anomalies), etc.</p><p>Early publish/subscribe systems
were based on predefined subjects (channels), and were too coarse-grained to meet
the specific interests of different subscribers. The second generation of content-based
publish/subscribe systems offer greater flexibility by supporting subscriptions defined
as predicates over message contents. However, subscriptions are still stateless filters
over individual messages, so they cannot express queries across different messages
or over the event history. The few systems that support more powerful database-style
subscriptions do not address the problem of efficiently delivering updates to a large
number of subscribers over a wide-area network. Thus, there is a need to develop next-generation
publish/subscribe systems that unify the support for richer database-style subscription
queries and flexible wide-area notification. This support needs to be complemented
with robust processing and dissemination techniques that scale to high event rates
and large databases, as well as to a large number of subscribers over the Internet.</p><p>The
main contribution of our work is a collection of techniques to support efficient and
scalable event processing and notification dissemination for an Internet-scale publish/subscribe
system with a rich subscription model. We investigate the interface between event
processing by a database server and notification delivery by a dissemination network.
Previous research in publish/subscribe has largely been compartmentalized; database-centric
and network-centric approaches each have their own limitations, and simply putting
them together does not lead to an efficient solution. A closer examination of database/network
interfaces yields a spectrum of new and interesting possibilities. In particular,
we propose message and subscription reformulation as general techniques to support
stateful subscriptions over existing content-driven networks, by converting them into
equivalent but stateless forms. We show how reformulation can successfully be applied
to various stateful subscriptions including range-aggregation, select-joins, and subscriptions
with value-based notification conditions. These techniques often provide orders-of-magnitude
improvement over simpler techniques adopted by current systems, and are shown to scale
to millions of subscriptions. Further, the use of a standard off-the-shelf content-driven
dissemination interface allows these techniques to be easily deployed, managed, and
maintained in a large-scale system.</p><p>Based on our findings, we have built a high-performance
publish/subscribe system named ProSem (to signify the inseparability of database processing
and network dissemination). ProSem uses our novel techniques for group-processing
many types of complex and expressive subscriptions, with a per-event optimization
framework that chooses the best processing and dissemination strategy at runtime based
on online statistics and system objectives.</p>
|
|