1 2 34

Research Project: Pathfinder

Research Project: Pathfinder

Our XQuery compiler Pathfinder turns relational database systems into scalable XQuery and XPath processors that can cope with huge XML inputs. Pathfinder comes packaged as a retargetable standalone compiler and also is integral part of MonetDB/XQuery.

Research Project: Ferry

Research Project: Ferry

Ferry establishes a new connection between two somewhat distant shores: programming languages and database technology: How far we can push the idea of relational database engines that directly and seamlessly participate in program evaluation?

Research Project: Nautilus

Research Project: Nautilus

Nautilus aims at providing database developers with tools that support them in analyzing, debugging, fixing, and testing data transformation processes within their database applications.

Download: MonetDB/XQuery

Download: MonetDB/XQuery

MonetDB/XQuery is the tight integration of our relational XQuery compiler Pathfinder with MonetDB, the super-efficient column store developed by our friends at CWI, Amsterdam. Open source. Try it out!

<< >> Play > Stop

Research (Forschung)
Nautilus PDF Print

Overview

"The deepest parts of [data transformation] are totally unknown to us. No soundings have been able to reach them. What goes on in those distant depths? What [data and transformations] inhabit, or could inhabit, those regions [...]? What is the constitution of these [...]? It's almost beyond conjecture.” A slightly altered version of an excerpt of Jules Verne's 20,000 Leagues under the Sea, Chapter II.

Database developers may formulate a thought similar to the one above when designing or modifying database applications that derive data by applying queries or, in general, transformations to source data. Indeed, using declarative languages such as SQL to specify queries, developers often face the problem that they cannot properly inspect or debug their query or transformation code. All they see is the tip of the iceberg once the result data is computed. If it does not comply with the developers’ expectation, they usually perform one or more tedious and mostly manual analyze-fix-test cycles until the expected result occurs. The goal of Nautilus is to support developers in this process by providing a suite of tools to accompany the process.

Read more: Nautilus
 
Ferry PDF Print

Ferry Overview

With project Ferry we try to establish a connection between to two somewhat distant shores: programming languages and database technology. Ferry explores of how far we can push the idea of relational database engines that directly and seamlessly participate in program evaluation. Programmers continue to use their language's very own syntax, idioms, and functions — Ferry is in charge to decide where the computation described by a given program will take place: on the programming language's heap or inside the relational database back-end.

Programs that touch and move huge amounts of data, think Computational Science, benefit the most. You may continue to analyse your experimental data using, say, Ruby scripts — Ferry will compile selected fragments of your Ruby code into (sequences of collaborating) database queries.

To this end, we search for, design, and implement new compilation strategies that allow programming and scripting language concepts like complex types (e.g., ordered lists, arrays, dictionaries) and control structures (e.g., nested iteration, conditionals, variable assignment and reference) to be mapped into efficient set-oriented algebraic programs. Ferry builds on technology developed in the context of our project Pathfinder.

Ferry targets relational database engines, but any platform that implements some variant of a set-oriented execution model can assume the role of a Ferry back-end. We've got IBM's DB2, CWI's MonetDB, and kx Systems' kdb+ on the workbench, currently.

Publications

 
Pathfinder: A Purely Relational XQuery Processor PDF Print

Pathfinder is a purely relational XQuery compiler.  At the core of this research project lies our desire to answer the question

"How far can we push relational database technology to construct an efficient and scalable XQuery implementation?"

If you are interested in a copy of Pathfinders SQL code generator and/or MonetDB/XQuery please have a look at the Download page. If you are interested in the ideas and the technology behind Pathfinder please have a look at the Technology and the Publications pages. If you just want to get a first overview please read on.


Pathfinder Overview


Pathfinder is a re-targetable query compiler that turns XQuery expressions into table algebra queries. While Pathfinder is tightly coupled with MonetDB we also provide a SQL code generator that allows any database to become a faithful XQuery processor.

 

The Approach

Pathfinder assumes a database to store shredded XML documents---documents that are transformed into a relational encoding. An incoming XQuery query is compiled by Pathfinder into a relational query plan. The database evaluates the generated query plan based on the shredded XML documents and returns a table. A serializer consumes this table and transforms it into an XQuery result sequence. (In MonetDB/XQuery automatic shredding and serialization as well as the tight integration of Pathfinder lead to a runtime where the relational approach is not visible for the user anymore.)

Motivation

We believe that relational database are the most researched and best engineered query processing infrastructures available today. They are able to efficiently query tons of data. By using a relational database as runtime environment for an XQuery processor we can port 30+ years of research to the XQuery domain and build a processor that is able to scale well with increasing input sizes. (For these benefits we are willing to pay the extra costs for shredding, serialization, and compilation.)

Download MonetDB/XQuery or the SQL Code Generator?

MonetDB/XQuery integrates the Pathfinder compiler into the MonetDB product and furthermore extends it with runtime extensions for offline and online shredding, serialization (with multiple serialization modes), efficient path step algorithms such as Staircase Join, and support for updates.
MonetDB/XQuery inherits the scalability of its database back-end: you may feed XML documents beyond 1GB size into the system and still expect reasonable, interactive query response times (for example, with the XMark benchmark).

Pathfinder's SQL Code Generator is still limited in its functionality. It is the ideal playground for everybody who is interested to extend his favourite database with XML support. We have tested the generated SQL code on DB2 v9. B-tree indexes faithfully speed up the range predicates that implement XPaths location step semantics. (The combination of Pathfinder and DB2 was even able to outperform DB2's built-in XQuery support on larger XML documents.) The performance analysis for other SQL backends such as e.g., PostgreSQL (and other XML encodings) is currently underway.

Extensibility of Pathfinder

Pathfinder is a re-targetable compiler that is able to produce optimized algebra plans (in XML format). These plans feature normal table algebra operators (such as select, project, join, ...) and some XML specific operators like e.g, path joins, node access. The XML specific operators are the operators interacting with the encoded XML document---they keep the interface to the encoded XML document abstract. This allows new backends to choose a XML encoding that works best.
Based on plans generated and optimized by Pathfinder we e.g., turned KX systems' kdb+ into an efficient XQuery processor.