15
a collection of related objects that are treated as one unit [32]. In our case, a NoSQL document is the unit of
work, and each sub-document, key-value array, or key-value pair is a related object. The aggregate notion gives
the NoSQL database the ability to express complex structures that are not possible (or very difficult) to
implement in the relational model. The relational model is confined to storing data inside a limited data
structure otherwise known as a tuple [32]. Although the tuple is capable of storing a set of values, it is unable to
store another tuple. In other words, nesting of records is not possible. Therefore a tuple can represent one and
only one data record. This is completely opposite of the NoSQL document object. The document object is a
flexible structure which allows for nesting of records and storing of non-uniform data [32]. This leads us to our
first question, “How do we model the document object in a data model?”
There are several parts of the document object which will need to be defined such as data type
assignments, primary and alternate key identification, and foreign key identification. All are areas of the data
modeling process that are lacking for NoSQL. We could easily create an IDEF1x entity, and use it to represent
the document container itself. But, that is where the process would end. There must be a way to represent the
key-value arrays, key-value, or subdocuments of the document object. There must also be a way to represent
the aggregate relationship of those components to the document itself. We previously stated that the document
object has the capability to store non-uniform data. In the context of this paper, non-uniform data refers to
data where each record contains a different field set [32]. In the relational model, this can be accomplished
through nullable columns which can lead to sparse tables. Or, the modeler could use generic named columns
(e.g. Column1, column2). The NoSQL document allows the application (or input source) to store whatever fields
it wants to store (without adhering to a defined schema). This behavior is not allowed in the relational model
due the defined schema. The liberty granted by the “schemaless” aspect is not without consequence. The
responsibility of schema awareness is transferred from the database system to the application layer. This shift
in responsibility leads to the creation of implicit schema [32] in the application code. The schema is implicit
because the code behavior (selects, inserts, updates, and deletes) could be used to derive a possible database
schema. This is not a fool-proof method of schema derivation because there could be other applications that
use the database. In this case, the information stored by one application could be different from the
information stored by a previous application. This leads us to another issue that will need to be addressed,
“How do we create an explicit schema for a NoSQL database?” In addition explicit schema declaration, there
must be a way to transform the physical model to a physical implementation. Currently tools (e.g. Erwin), do
not support this transformation for NoSQL.
We are aware that what we are requiring pushes the boundaries of current data modeling. The
concepts that NoSQL bring are in some cases completely new to the data modeler. However, the new concepts
are not foreign to the application developer. For example, a document could be represented as a class in an
object-oriented programming language (e.g. C#). The key-value pair array could be represented as an array of
dictionary key-value pairs. A sub-document could be represented as a class property of type document. As you
can see, the NoSQL concepts seem to map to object-oriented programming languages fairly well. We want to
combine concepts like the aforementioned (from our software engineering experience) with relational data
modeling to produce an innovative, thorough data modeling solution. It is our belief that the emergence of
NoSQL as a viable data storage option warrants the thorough data modeling solution we are proposing.