4. ROW-ORIENTED EXECUTION
In this section, we discuss several different techniques that can
be used to implement a column-database design in a commercial
row-oriented DBMS (hereafter, System X). We look at three differ-
ent classes of physical design: a fully vertically partitioned design,
an “index only” design, and a materialized view design. In our
evaluation, we also compare against a “standard” row-store design
with one physical table per relation.
Vertical Partitioning: The most straightforward way to emulate
a column-store approach in a row-store is to fully vertically parti-
tion each relation [16]. In a fully vertically partitioned approach,
some mechanism is needed to connect fields from the same row
together (column stores typically match up records implicitly by
storing columns in the same order, but such optimizations are not
available in a row store). To accomplish this, the simplest approach
is to add an integer “position” column to every table – this is of-
ten preferable to using the primary key because primary keys can
be large and are sometimes composite (as in the case of the line-
order table in SSBM). This approach creates one physical table for
each column in the logical schema, where the ith table has two
columns, one with values from column i of the logical schema and
one with the corresponding value in the position column. Queries
are then rewritten to perform joins on the position attribute when
fetching multiple columns from the same relation. In our imple-
mentation, by default, System X chose to use hash joins for this
purpose, which proved to be expensive. For that reason, we exper-
imented with adding clustered indices on the position column of
every table, and forced System X to use index joins, but this did
not improve performance – the additional I/Os incurred by index
accesses made them slower than hash joins.
Index-only plans: The vertical partitioning approach has two
problems. First, it requires the position attribute to be stored in ev-
ery column, which wastes space and disk bandwidth. Second, most
row-stores store a relatively large header on every tuple, which
further wastes space (column stores typically – or perhaps even
by definition – store headers in separate columns to avoid these
overheads). To ameliorate these concerns, the second approach we
consider uses index-only plans, where base relations are stored us-
ing a standard, row-oriented design, but an additional unclustered
B+Tree index is added on every column of every table. Index-only
plans – which require special support from the database, but are
implemented by System X – work by building lists of (record-
id,value) pairs that satisfy predicates on each table, and merging
these rid-lists in memory when there are multiple predicates on the
same table. When required fields have no predicates, a list of all
(record-id,value) pairs from the column can be produced. Such
plans never access the actual tuples on disk. Though indices still
explicitly store rids, they do not store duplicate column values, and
they typically have a lower per-tuple overhead than the vertical par-
titioning approach since tuple headers are not stored in the index.
One problem with the index-only approach is that if a column
has no predicate on it, the index-only approach requires the index
to be scanned to extract the needed values, which can be slower
than scanning a heap file (as would occur in the vertical partition-
ing approach.) Hence, an optimization to the index-only approach
is to create indices with composite keys, where the secondary keys
are from predicate-less columns. For example, consider the query
SELECT AVG(salary) FROM emp WHERE age>40 – if we
have a composite index with an (age,salary) key, then we can an-
swer this query directly from this index. If we have separate indices
on (age) and (salary), an index only plan will have to find record-ids
corresponding to records with satisfying ages and then merge this
with the complete list of (record-id, salary) pairs extracted from
the (salary) index, which will be much slower. We use this opti-
mization in our implementation by storing the primary key of each
dimension table as a secondary sort attribute on the indices over the
attributes of that dimension table. In this way, we can efficiently ac-
cess the primary key values of the dimension that need to be joined
with the fact table.
Materialized Views: The third approach we consider uses mate-
rialized views. In this approach, we create an optimal set of materi-
alized views for every query flight in the workload, where the opti-
mal view for a given flight has only the columns needed to answer
queries in that flight. We do not pre-join columns from different
tables in these views. Our objective with this strategy is to allow
System X to access just the data it needs from disk, avoiding the
overheads of explicitly storing record-id or positions, and storing
tuple headers just once per tuple. Hence, we expect it to perform
better than the other two approaches, although it does require the
query workload to be known in advance, making it practical only
in limited situations.
5. COLUMN-ORIENTED EXECUTION
Now that we’ve presented our row-oriented designs, in this sec-
tion, we review three common optimizations used to improve per-
formance in column-oriented database systems, and introduce the
invisible join.
5.1 Compression
Compressing data using column-oriented compression algorithms
and keeping data in this compressed format as it is operated upon
has been shown to improve query performance by up to an or-
der of magnitude [4]. Intuitively, data stored in columns is more
compressible than data stored in rows. Compression algorithms
perform better on data with low information entropy (high data
value locality). Take, for example, a database table containing in-
formation about customers (name, phone number, e-mail address,
snail-mail address, etc.). Storing data in columns allows all of the
names to be stored together, all of the phone numbers together,
etc. Certainly phone numbers are more similar to each other than
surrounding text fields like e-mail addresses or names. Further,
if the data is sorted by one of the columns, that column will be
super-compressible (for example, runs of the same value can be
run-length encoded).
But of course, the above observation only immediately affects
compression ratio. Disk space is cheap, and is getting cheaper
rapidly (of course, reducing the number of needed disks will re-
duce power consumption, a cost-factor that is becoming increas-
ingly important). However, compression improves performance (in
addition to reducing disk space) since if data is compressed, then
less time must be spent in I/O as data is read from disk into mem-
ory (or from memory to CPU). Consequently, some of the “heavier-
weight” compression schemes that optimize for compression ratio
(such as Lempel-Ziv, Huffman, or arithmetic encoding), might be
less suitable than “lighter-weight” schemes that sacrifice compres-
sion ratio for decompression performance [4, 26]. In fact, com-
pression can improve query performance beyond simply saving on
I/O. If a column-oriented query executor can operate directly on
compressed data, decompression can be avoided completely and
performance can be further improved. For example, for schemes
like run-length encoding – where a sequence of repeated values is
replaced by a count and the value (e.g., 1, 1, 1, 2, 2 → 1 × 3, 2 × 2)
– operating directly on compressed data results in the ability of a
query executor to perform the same operation on multiple column
values at once, further reducing CPU costs.
Prior work [4] concludes that the biggest difference between
4