The rst approach is to create a new SQL-based language that is specically designed for JSON
data. Rather than operating on tables, this language operates on JSON collections, typically arrays
of JSON objects. The goal of this approach is to nd the minimum set of SQL extensions that are
sufcient for JSON queries and transforms. In 2015, Dr. Yannis Papakonstantinou and others at the
University of California, San Diego proposed an SQL-based JSON query language called SQL++ [1, 2].
Versions of SQL++ have been implemented by the AsterixDB project at the University of California,
Irvine [3, 4], and by Couchbase, Inc. [5, 6]. The Couchbase implementation is called N1QL, a name
that suggests “not rst normal form” because N1QL is not limited to querying at (rst normal form)
tables. An open-source language called PartiQL, similar to SQL++, is supported by Amazon Web
Services [7]. The SQL++ example queries that you are about to read are written in N1QL and have
run successfully on Couchbase Server and (with minor syntactic variations) on Amazon’s PartiQL
reference interpreter.
The second approach is to extend SQL to allow JSON documents to be stored in columns of tables.
This approach has been adopted by the latest version of the international SQL language standard,
released in 2016 and informally referred to as SQL:2016. In this approach, JSON documents are stored
in the form of strings, and a set of SQL functions are provided for processing the strings with JSON
semantics. The current version of the SQL standard, at several thousand pages, is daunting to read
(or to pick up, for that matter), and you have to pay to get a copy. Fortunately, the JSON-related
features of SQL:2016 have been summarized in a smaller and more readable technical report that is
available for free [8].
In the following sections, we’ll introduce SQL++ and SQL:2016, take a look at some example queries
written in both languages, and conclude with an analysis and comparison of the two languages. Each
example query will be labeled with the language in which it is written. One note on terminology:
various terms have been used for the name-value pairs that are found inside JSON objects. Following
the convention in [8], we’ll use the term "members" for these name-value pairs.
The Languages
In comparing SQL++ and SQL:2016, the rst thing to notice is that the two languages are designed
to operate in quite different environments. An SQL++ query operates on JSON inputs and generates
a JSON output. An SQL:2016 query operates on tables as inputs and generates a table as an output.
As far as SQL:2016 is concerned, JSON exists only in a column of a table. We’ll need to take these
differences into account when we construct example queries to compare the two languages.
SQL++
SQL++ is based on the observation that a JSON object is very similar to a row of a table. Both consist
of (name, value) pairs (in a row, the name is a column-name). Furthermore, an array of JSON objects
can be similar to a table if the objects have a common structure. Based on these similarities, SQL++
applies the familiar operations of SQL to JSON arrays of objects. The goal is to remain as close to
SQL as possible. The keywords of SQL including SELECT, FROM, WHERE, GROUP BY, HAVING,
ORDER BY, and JOIN are all present in SQL++, and have pretty much the same meaning that they
have in SQL. For example, FROM is used to iterate over the objects in an array, WHERE is used to
choose a subset of these objects based on their values, GROUP BY forms a list of objects into groups,
and SELECT is used to specify the desired output.
2