Fundamentals of Physical Design and Query Compilation
Dr. David Toman (University of Waterloo, Canada)
David Toman is an Associate Professor in the David R. Cheriton School of Computer Science at the University of Waterloo. His research focuses on database theory and systems, in particular on query processing under constraints and on query compilation; on tem aspects of data management; and on logic in Computer Science, in general. He has published extensively in his research area including preparing invited contributions to several reference collections, such as the Encyclopedia of Database Systems and the Handbook of Temporal Reasoning in Artificial Intelligence. He has earned Bachelor's and Master's degrees from the Masaryk University in Czechoslovakia in 1992 and a PhD from Kansas State University in 1996, all in Computer Science. He has been awarded numerous research grants, among which are the NATO-NSERC Postdoctoral fellowship and the Ontario Premier's Research Excellence Award.
National Institute of Informatics, 20F, 2009 Meeting Room(Lecture6 is held at 2005 Lecture Room)
The current approaches to physical design closely follows conceptual design: e.g., creating base files for all tables, adding additional indices, etc. New applications and performance requirements have lead to the introduction of additional physical structures, e.g., materialized views, but query optimization technology has fallen behind; typically using only ad-hoc techniques for including materialized views into query plans. The lecture will survey current practices, identify their weaknesses and outline possible solutions. It will also introduce the unifying theme for the remaining lectures: the development of an uniform and integrated approach to physical design that is decoupled from conceptual schemes and to query compilation and optimization in this setting.
How do we describe actual physical designs and how do we link them to a conceptual view of the data? The lecture will review data models and integrity constraints with the help of Description Logic and show how such a development relates to classical database constraints such as functional and inclusion dependencies. Furthermore it will discuss additional annotations, such as binding patterns and their use to describe physical designs, possibly up to the level of (sets of) main-memory records connected by pointers. The theoretical underpinnings will be accompanied by examples of fine-grained descriptions of physical designs by elaborating on traditionally monolithic data structures (such as B+ trees) via constraints. It will also consider the reasoning complexity (decidability) vs. expressive power trade-offs in schema languages: what the right trade-off and the impact on query languages, query evaluation, and query "safety" issues might be.
The lecture will study chase-based approaches to query rewriting and its limitations (e.g., the inability of rewriting conjunctive queries over conjunctive views); the impact of binding patterns for accessing indices, and the integration with (simple) cost models. Other issues discussed in this lecture will relate to handling duplicates and order of data and to approaches for accommodating these crucial features in query plans.
The lecture will first discuss shortcomings of existing approaches to rewriting complex queries based on ad-hoc approaches, such as the query graph model (QGM), and then introduce a novel technique based on the application of Craig's Interpolation Theorem to the query rewriting problem, in particular it will show how to extract rewritings from refutation proofs. In addition it will discuss the usual extensions needed for efficient query processing, e.g., binding patterns, duplicates, and ordering.
The last lecture (organized as a seminar) will conclude with an overview of topics for future investigation and research: the topics will relate to studying constraints and queries beyond first-order logic, such as Datalog/inductive data types and their impact on physical design and query processing, the issue of updates in a decoupled conceptual and physical designs, and the impact of transactions.
The presentation will consider an alternative to query rewriting (equivalent under constraints): the computation of certain answers. It will introduce the approach and discuss its computational price in terms of how powerful query and constraint languages are used. It will show that such an approach is computationally feasible for only relatively weak languages. It will then discuss the possibility of generating certain answer based on first-order rewritability in such a setting, i.e., for conjunctive queries over ontologies formulated in families of suitably restricted description logics, such as ELH and DL-Lite.