G(arbage) C(ollected) X(Query)
A Streaming XQuery Engine with Static and Dynamic Buffer Minimization
News
- (2009-05-25) The second release (v2.1) of the GCX engine is now available for download
- (2008-04-07) GCX is now a Sourceforge Project
- (2008-10-29) Diploma thesis on GCX integrates aggregate functions and multi-step path expressions into the GCX framework
-
Gunnar Jehl:
"Erweiterung und Optimierung der Garbage Collected XQuery Engine – Pfade mit mehreren Schritten und Aggregatfunktionen –" (engl. "Extension and Optimization of the Garbage Collected XQuery Engine – Multi-step Paths and Aggregate Functions –") [.pdf] (only available in german)
-
Gunnar Jehl:
- (2007-10-05) We succesfully demonstrated the GCX engine at VLDB‘07 in Vienna
- (2007-07-03) We gave a demonstration of the GCX engine at XIME-P‘07 in Beijing
- (2007-01-26) The first release (v1.0β) of the GCX engine is now available for download
Summary
The G(arbage) C(ollected) X(Query) engine is an in-memory XQuery engine, which is the first streaming XQuery engine that implements active garbage collection, a novel buffer management strategy in which both static and dynamic analysis are exploited. This technique actively purges main memory buffers at runtime based on the current status of query evaluation. In our paper, we show the various stages in evaluating composition-free XQuery with the GCX engine. Our technique has a significant impact on reducing both main memory consumption and query execution time, as can be seen in our experiments.
Publications/Papers
-
Michael Schmidt, Stefanie Scherzinger, Christoph Koch:
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation, In Proc. ICDE 2007 [.pdf] -
Christoph Koch, Stefanie Scherzinger, Michael Schmidt:
The GCX System: Dynamic Buffer Minimization in Streaming XQuery Evaluation, In Proc. VLDB 2007, Demo paper [.pdf]
Active Garbage Collection in XQuery Engines
Garbage collection is a well-understood technique for automatic memory management in programming languages.
The basic principle of any garbage collector is to determine which data objects in a program will not be accessed in the future,
and consequently, to reclaim the storage used by these objects.
A simple yet effective garbage collection strategy is reference counting, where every object counts
the number of references to it. When a reference is created to an object, its reference count is incremented.
Likewise, the reference count is decremented when a reference is removed.
Once the count reaches zero, the object is deleted and its memory is reclaimed.
A major advantage of this approach is that the memory overhead is small.
The approach underlying active garbage collection for XQuery engines is strongly related to reference counting
insofar as each single node in the buffer keeps track whether it is still relevant to the remaining XQuery evaluation.
Instead of counting references, we employ the concept of roles which are assigned to nodes.
Intuitively, a role serves as a metaphor for the future relevance of a given node.
Roles are statically derived from the XPath expressions in the query.
While reading the input stream, the XML nodes are matched against the set of possible roles.
A node can be assigned several roles when it is used in the query in several different contexts.
Moreover, a role can be assigned to a node several times; this can happen if queries involve XPath expressions
with descendant-axes.
While a traditional garbage collector is passive in the sense that it is invoked whenever there is no more space to allocate
new objects, our approach differs in that it is active. That is, we purge buffers from irrelevant nodes early on.
In fact, garbage collection is invoked whenever the scope of a variable ends.
Thus, both the high watermark and the average main memory consumption remain low.
⇒ There is also a nice example demonstrating the GCX engine in action.
Project Members
- Michael Schmidt (Freiburg University, Contact Person)
- Gunnar Jehl (Freiburg University, Contact Person)
- Prof. Dr. Christoph Koch (Cornell University)
- Prof. Dr. Georg Lausen (Freiburg University)
- Stefanie Scherzinger (IBM Böblingen)
Last updated: 2009-05-26