The GCX XQuery Engine – Benchmark Results for GCX v1.0β

G(arbage) C(collected) X(query) Engine – An open source in-memory XQuery engine

The GCX engine is an in-memory XQuery engine designed for memory-efficient XQuery evaluation against large XML documents. The C++-prototype, which was released in v1.0β, supports a powerful fragment of the XQuery language. The following experiments are part of our publication at ICDE 2007.

Experiments with the XMark Benchmark

We measured the performance of the GCX egnine v1.0β on benchmark data from XMark – An XML Benchmark Project. To this end, we generated XML documents of sizes between 10MB and 200MB with the XMark data generator xmlgen. As the GCX engine does not yet support the full XQuery standard, we modified the XMark benchmarks as follows:

In our experiments, all XQuery engines evaluated the same rewritten queries on exactly the same input XML documents.

Reference Implementations

The GCX engine has two main characteristics: It is an in-memory XQuery engine and it is geared towards streaming XQuery evaluation. With this in mind, we chose the following reference implementations.

Execution Platform

We ran our experiments on a 3GHz CPU Intel Pentium IV with 2GB RAM, running SuSe Linux 10.0. All Java-based systems were executed using J2RE v1.4.2.

The focus of the benchmarks was primarily on main memory consumption, but we also consider query execution time. Time is given either in seconds (abbreviated with "s") (e.g. 1.59s means 1 second and 59 millisecond) or in minutes (abbreviated with "m") (e.g. 02:07m means 2 minutes and 7 seconds). Memory consumption is given in megabytes (abbreviated with "MB") or gigabytes (abbreviated with "GB"). The main memory consumption was measured with the Linux "top" command. For each system and query we set a timeout of one hour. For each system and size of the input XML document, we measured the high watermark of non-swapped memory consumption, and the total query evaluation time. "Not available" (abbreviated with "n/a") indicates that the query could not be expressed in the language supported by the specific engine, while a dash (abbreviated with "") denotes failure, e.g. caused by segmentation faults. With the Java-based engines, we could observe that due to effects caused by automatic memory management and the Java Virtual Machine, memory consumption often increased with the XML document size even though the buffer size remained constant (e.g. in case of the FluXQuery engine).

Benchmark Results

The table below summarizes the runtime and memory consumption of the GCX engine (v1.0β) compared to the other XQuery engines. Note that the benchmark results are not permanently kept up to date. The behavior (needed runtime and memory consumption) of the current version (v2.1) of the GCX engine might differ from the benchmark results given in the following table.


Query/
Engine
XML
document size
GCX
v1.0β
FluXQuery Galax
v0.6.8
MonetDB
v4.12.0
XQuery
v0.12.0
Saxon
v8.7.1
Qizx/open
v1.1
Q1 10MB 0.18s /
1.2MB
1.59s /
50MB
5.45s /
186MB
0.86s /
30MB
1.48s /
80MB
1.20s /
38MB
50MB 0.92s /
1.2MB
3.96s /
111MB
42.33s /
880MB
3.69s /
98MB
4.29s /
292MB
3.74s /
195MB
100MB 1.87s /
1.2MB
6.94s /
111MB
02:07m /
1,8GB
7.19s /
225MB
7.96s /
547MB
6.56s /
285MB
200MB 3.53s /
1.2MB
12.27s /
111MB
timeout 13.60s /
244MB
14.30s /
973MB
11.82s /
480MB
Q6 10MB 0.34s /
1.2MB
n/a 7.66s /
240MB
0.98s /
29MB
1.73s /
82MB
1.56s /
33MB
50MB 1.68s /
1.2MB
n/a 57.98s /
1.2GB
5.06s /
111MB
5.78s /
292MB
6.13s /
169MB
100MB 3.33s /
1.2MB
n/a 5:08m /
2GB
9.94s /
253MB
10.85s /
622MB
11.74s /
484MB
200MB 6.42s /
1.2MB
n/a timeout 19.95s /
337MB
20.14s /
1.2GB
20.33s /
805MB
Q8 10MB 13.15s /
9.8MB
18.04s /
128MB
01:04m /
377MB
02:56m /
407MB
6.61s /
145MB
9.89s /
148MB
50MB 05:13m /
43MB
06:51m /
169MB
33:08m /
1.8GB
03:26m /
1.35GB
02:02m /
352MB
03:38m /
265MB
100MB 22:07m /
86MB
27:01m /
216MB
timeout 08:39m /
650MB
14:27m /
397MB
200MB timeout timeout timeout 32:43m /
1.15GB
52:05m /
636MB
Q13 10MB 0.17s /
1.2MB
1.60s /
52MB
5.92s /
182MB
0.80s /
31MB
1.53s /
48MB
1.26s /
28MB
50MB 0.85s /
1.2MB
3.98s /
111MB
43.91s /
899MB
3.64s /
98MB
4.45s /
292MB
3.85s /
195MB
100MB 1.69s /
1.2MB
7.00s /
111MB
02:04m /
1.8GB
7.34s /
224MB
8.35s /
547MB
6.81s /
285MB
200MB 3.24s /
1.2MB
12.33s /
111MB
timeout 13.52s /
271MB
15.02s /
1.05GB
12.30s /
480MB
Q20 10MB 0.25s /
1.2MB
1.65s /
48MB
6.95s /
215MB
0.85s /
34MB
1.65s /
62MB
1.43s /
39MB
50MB 1.24s /
1.2MB
4.19s /
111MB
53.08s /
1,5GB
4.17s /
120MB
4.90s /
292MB
4.18s /
195MB
100MB 2.48s /
1.2MB
7.37s /
111B
03:14m /
2GB
8.47s /
247MB
9.13s /
622MB
8.71s /
350MB
200MB 4.74s /
1.2MB
13.14s /
111MB
timeout 16.40s /
296MB
16.58s /
1.15GB
15.80s /
628MB

XMark Queries

Note: All following (XMark) queries were taken from XMark – An XML Benchmark Project and modified to match the GCX v1.0β supported XQuery fragment.

XMark Q1

<query1> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        if ($person/person_id="person0")
          then <result> {$person/name} </result>
          else ()
} </query1>

XMark Q6

<query6> {
  for $site in //site return
    for $regions in $site/regions return
      $regions//item
} </query6>

XMark Q8

<query8> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        <item> {
          (
            <person> {$person/name} </person>,
            <items_bought> {
              for $site2 in /site return
                for $cas in $site2/closed_auctions return
                  for $ca in $cas/closed_auction return
                    for $buyer in $ca/buyer return
                      if ($buyer/buyer_person=$person/person_id)
                        then <result> {$ca} </result>
                        else ()
              } </items_bought>
          )
        } </item>
} </query8>

XMark Q13

<query13> {
  for $site in /site return
    for $regions in $site/regions return
      for $australia in $regions/australia return
        for $item in $australia/item return
          <item> {
            (
              <name> {$item/name} </name>,
              <desc> {$item/description} </desc>
            )
          } </item>
} </query13>

XMark Q20

<query20> {
  for $site in /site return
    for $people in $site/people return
      for $person in $people/person return
        if (fn:not(fn:exists($person/person_income)))
          then $person
          else ()
} </query20>

Last updated: 2009-05-26
Albert-Ludwigs-Universtät Freiburg
Universtät des Saarlandes
Valid CSS! Valid XHTML 1.0 Strict