Data Model Design for MongoDB, Release 3.2.3
Modeling application data for MongoDB depends on both the data itself, as well as the characteristics of MongoDB
itself. For example, different data models may allow applications to use more efficient queries, increase the throughput
of insert and update operations, or distribute activity to a sharded cluster more effectively.
These factors are operational or address requirements that arise outside of the application but impact the perfor-
mance of MongoDB based applications. When developing a data model, analyze all of your application’s read
operations and write operations in conjunction with the following considerations.
3.2.1 Document Growth
Changed in version 3.0.0.
Some updates to documents can increase the size of documents. These updates include pushing elements to an array
(i.e. $push) and adding new fields to a document.
When using the MMAPv1 storage engine, document growth can be a consideration for your data model. For
MMAPv1, if the document size exceeds the allocated space for that document, MongoDB will relocate the docu-
ment on disk. With MongoDB 3.0.0, however, the default use of the power-of-2-allocation minimizes the occurrences
of such re-allocations as well as allows for the effective reuse of the freed record space.
When using MMAPv1, if your applications require updates that will frequently cause document growth to exceeds
the current power of 2 allocation, you may want to refactor your data model to use references between data in distinct
documents rather than a denormalized data model.
You may also use a pre-allocation strategy to explicitly avoid document growth. Refer to the Pre-Aggregated Reports
Use Case
6
for an example of the pre-allocation approach to handling document growth.
See https://docs.mongodb.org/manual/core/mmapv1 for more information on MMAPv1.
3.2.2 Atomicity
In MongoDB, operations are atomic at the document level. No single write operation can change more than one
document. Operations that modify more than a single document in a collection still operate on one document at a time.
7
Ensure that your application stores all fields with atomic dependency requirements in the same document. If the
application can tolerate non-atomic updates for two pieces of data, you can store these data in separate documents.
A data model that embeds related data in a single document facilitates these kinds of atomic operations. For data mod-
els that store references between related pieces of data, the application must issue separate read and write operations
to retrieve and modify these related pieces of data.
See Model Data for Atomic Operations (page 30) for an example data model that provides atomic updates for a single
document.
3.2.3 Sharding
MongoDB uses sharding to provide horizontal scaling. These clusters support deployments with large data sets and
high-throughput operations. Sharding allows users to partition a collection within a database to distribute the collec-
tion’s documents across a number of mongod instances or shards.
To distribute data and application traffic in a sharded collection, MongoDB uses the shard key. Selecting the proper
shard key has significant implications for performance, and can enable or prevent query isolation and increased write
capacity. It is important to consider carefully the field or fields to use as the shard key.
6
https://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports
7
Document-level atomic operations include all operations within a single MongoDB document record: operations that affect multiple embedded
documents within that single record are still atomic.
14 Chapter 3. Data Modeling Concepts