G(arbage) C(ollected) X(Query)

A Streaming XQuery Engine with Static and Dynamic Buffer Minimization



The G(arbage) C(ollected) X(Query) engine is an in-memory XQuery engine, which is the first streaming XQuery engine that implements active garbage collection, a novel buffer management strategy in which both static and dynamic analysis are exploited. This technique actively purges main memory buffers at runtime based on the current status of query evaluation. In our paper, we show the various stages in evaluating composition-free XQuery with the GCX engine. Our technique has a significant impact on reducing both main memory consumption and query execution time, as can be seen in our experiments.


Active Garbage Collection in XQuery Engines

Garbage collection is a well-understood technique for automatic memory management in programming languages. The basic principle of any garbage collector is to determine which data objects in a program will not be accessed in the future, and consequently, to reclaim the storage used by these objects.

A simple yet effective garbage collection strategy is reference counting, where every object counts the number of references to it. When a reference is created to an object, its reference count is incremented. Likewise, the reference count is decremented when a reference is removed. Once the count reaches zero, the object is deleted and its memory is reclaimed. A major advantage of this approach is that the memory overhead is small.

The approach underlying active garbage collection for XQuery engines is strongly related to reference counting insofar as each single node in the buffer keeps track whether it is still relevant to the remaining XQuery evaluation. Instead of counting references, we employ the concept of roles which are assigned to nodes. Intuitively, a role serves as a metaphor for the future relevance of a given node.

Roles are statically derived from the XPath expressions in the query. While reading the input stream, the XML nodes are matched against the set of possible roles. A node can be assigned several roles when it is used in the query in several different contexts. Moreover, a role can be assigned to a node several times; this can happen if queries involve XPath expressions with descendant-axes.

While a traditional garbage collector is passive in the sense that it is invoked whenever there is no more space to allocate new objects, our approach differs in that it is active. That is, we purge buffers from irrelevant nodes early on. In fact, garbage collection is invoked whenever the scope of a variable ends. Thus, both the high watermark and the average main memory consumption remain low.

There is also a nice example demonstrating the GCX engine in action.

Project Members

Last updated: 2009-05-26
Albert-Ludwigs-Universtät Freiburg
Universtät des Saarlandes
Valid CSS! Valid XHTML 1.0 Strict