Query optimization through the looking glass and some lessons from building an LLVM-based query compiler
In this talk I will present material from my academic paper "How Good Are Query Optimizers, Really?", which was published at PVLDB 2015. This tries to shed light on the query optimization process.
In addition, I will talk about some experiences building a query engine that compiles SQL queries to machine code.
In this talk I will present material from my academic paper "How Good Are Query Optimizers, Really?", which was published at PVLDB 2015.
Finding a good join order is crucial for query performance. Most systems use a cost-based approach: The query optimizer enumerates some subset of the valid join orders, for example using dynamic programming. Using cardinality estimates as its principal input, the
cost model then chooses the cheapest alternative from semantically equivalent plan alternatives. We investigate the cardinality estimates
of PostgreSQL and a number of commercial systems and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily
on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. Finally, we investigate plan enumeration techniques comparing exhaustive dynamic
programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the sub-optimal cardinality
Compiling queries to machine code has become the method of choice for executing queries in modern database systems. Compilation is used by a large and growing number of commercial systems (e.g., Microsoft Hekaton, MemSQL, Spark, and Cloudera Impala) as well as research projects (e.g., HyPer, HIQUE, Tupleware, LegoBase, Peloton). The main advantage of compilation is, of course, efficiency. By generating code for a given query, compilation avoids the interpretation overhead of traditional execution engines and thereby achieves much higher performance. In this talk, I will present our experience of the main-memory database system HyPer, which has been one of the pioneering systems compiling queries to machine code via LLVM since 2011.