Research Project Ferry



Nautilus PDF Print

Overview

"The deepest parts of [data transformation] are totally unknown to us. No soundings have been able to reach them. What goes on in those distant depths? What [data and transformations] inhabit, or could inhabit, those regions [...]? What is the constitution of these [...]? It's almost beyond conjecture.” A slightly altered version of an excerpt of Jules Verne's 20,000 Leagues under the Sea, Chapter II.

Database developers may formulate a thought similar to the one above when designing or modifying database applications that derive data by applying queries or, in general, transformations to source data. Indeed, using declarative languages such as SQL to specify queries, developers often face the problem that they cannot properly inspect or debug their query or transformation code. All they see is the tip of the iceberg once the result data is computed. If it does not comply with the developers’ expectation, they usually perform one or more tedious and mostly manual analyze-fix-test cycles until the expected result occurs. The goal of Nautilus is to support developers in this process by providing a suite of tools to accompany the process.

First, Nautilus allows developers to analyze what is going on below the surface, and thus serves as a debugging tool for queries or data transformation processes. Currently, Nautilus allows developers to analyze view definition queries by asking the question of why data a developer expected is not in the result of a query. The explanations Nautilus produces using the Artemis algorithm [1] are based on the source data, but in the future we also plan to provide explanations based on the query. In addition to explaining missing data, Nautilus should also support explaining existing data. This relates to the well-known data lineage or data provenance problem.

Instance-based explanations and query-based explanations allow developers to better understand the queries and transformations they formulate. But Nautilus will go even further as we plan to generate sensible suggestions to repair the analyzed transformations, thus supporting the fixing phase of the previously manual analyze-fix-test cycle.

Finally, Nautilus tracks changes that occurred during the fixing phase as well as changes that they cause throughout the transformation and data. Based on this data, Nautilus can support developers in the test phase as it can for instance point out unexpected and possibly undesired changes.

 

Publications

 

  1. Artemis: A System for Analyzing Missing Answers (also available: poster).
    Melanie Herschel, Mauricio A. Hernandez, Wang Chiew Tan.
    Proceedings of the VLDB Endowment, Volume 2, August 2009.
    This work was done while Melanie Herschel was a post-doc researcher at the IBM Almaden Research Center.