Java - MongoDB Multi-Document ACID Transactions
Rate this quickstart
Introduced in June 2018 with MongoDB 4.0, multi-document ACID transactions are now supported.
But wait... Does that mean MongoDB did not support transactions before that?
No, MongoDB has consistently supported transactions, initially in the form of single-document transactions.
MongoDB 4.0 extends these transactional guarantees across multiple documents, multiple statements, multiple collections,
and multiple databases. What good would a database be without any form of transactional data integrity guarantee?
Before delving into the details, you can access the code and experiment with multi-document ACID
transactions.
- Update to Java 21
- Update Java Driver to 5.0.0
- Update
logback-classic
to 1.2.13
- Java 21
- Maven 3.8.7
- Docker (optional)
Or you can start an ephemeral single node replica set using Docker for testing quickly:
This demo contains two main programs:
ChangeStreams.java
and Transactions.java
.- The
ChangeSteams
class enables you to receive notifications of any data changes within the two collections used in this tutorial. - The
Transactions
class is the demo itself.
You need two shells to run them.
First shell:
Second shell:
Note: Always execute the
ChangeStreams
program first because it creates the product
collection with the
required JSON Schema.Let’s compare our existing single-document transactions with MongoDB 4.0’s ACID-compliant multi-document transactions
and see how we can leverage this new feature with Java.
Even in MongoDB 3.6 and earlier, every write operation is represented as a transaction scoped to the level of an
individual document in the storage layer. Because the document model brings together related data that would otherwise
be modeled across separate parent-child tables in a tabular schema, MongoDB’s atomic single-document operations provide
transaction semantics that meet the data integrity needs of the majority of applications.
Every typical write operation modifying multiple documents actually happens in several independent transactions: one for
each document.
Let’s take an example with a very simple stock management application.
First of all, I need a MongoDB replica set, so please follow the
instructions given above to start MongoDB.
Now, let’s insert the following documents into a
product
collection:Let’s imagine there is a sale on, and we want to offer our customers a 20% discount on all our products.
But before applying this discount, we want to monitor when these operations are happening in MongoDB with Change
Streams.
Keep this shell on the side, open another MongoDB shell, and apply the discount:
As you can see, both documents were updated with a single command line but not in a single transaction.
Here is what we can see in the change stream shell:
As you can see, the cluster times (see the
clusterTime
key) of the two operations are different: The operations
occurred during the same second but the counter of the timestamp has been incremented by one.Thus, here each document is updated one at a time, and even if this happens really fast, someone else could read the
documents while the update is running and see only one of the two products with the discount.
Most of the time, this is something you can tolerate in your MongoDB database because, as much as possible, we try to
embed tightly linked (or related) data in the same document.
Consequently, two updates on the same document occur within a single transaction:
However, sometimes, you cannot model all of your related data in a single document, and there are a lot of valid reasons
for choosing not to embed documents.
Multi-document ACID transactions in MongoDB closely resemble what
you may already be familiar with in traditional relational databases.
MongoDB’s transactions are a conversational set of related operations that must atomically commit or fully roll back with
all-or-nothing execution.
Transactions are used to make sure operations are atomic even across multiple collections or databases. Consequently,
with snapshot isolation reads, another user can only observe either all the operations or none of them.
Let’s now add a shopping cart to our example.
For this example, two collections are required because we are dealing with two different business entities: the stock
management and the shopping cart each client can create during shopping. The lifecycles of each document in these
collections are different.
A document in the product collection represents an item I’m selling. This contains the current price of the product and
the current stock. I created a POJO to represent
it: Product.java.
A shopping cart is created when a client adds their first item in the cart and is removed when the client proceeds to
check out or leaves the website. I created a POJO to represent
it: Cart.java.
The challenge here resides in the fact that I cannot sell more than I possess: If I have five beers to sell, I cannot have
more than five beers distributed across the different client carts.
To ensure that, I have to make sure that the operation creating or updating the client cart is atomic with the stock
update. That’s where the multi-document transaction comes into play.
The transaction must fail in case someone tries to buy something I do not have in my stock. I will add a constraint
on the product stock:
Note that this is already included in the Java code of the
ChangeStreams
class.To monitor our example, we are going to use MongoDB Change Streams
that were introduced in MongoDB 3.6.
In ChangeStreams.java,
I am going to monitor the database
test
which contains our two collections. It'll print each
operation with its associated cluster time.In this example, we have five beers to sell.
Alice wants to buy two beers, but we are not going to use a multi-document transaction for this. We will
observe in the change streams two operations at two different cluster times:
- One creating the cart
- One updating the stock
Then, Alice adds two more beers to her cart, and we are going to use a transaction this time. The result in the change
stream will be two operations happening at the same cluster time.
Finally, she will try to order two extra beers but the jsonSchema validator will fail the product update (as there is only
one in stock) and result in a
rollback. We will not see anything in the change stream.
Below is the source code
for Transaction.java:
Here is the console of the change stream:
As you can see here, we only get five operations because the two last operations were never committed to the database,
and therefore, the change stream has nothing to show.
- The first operation is the product collection initialization (create the product document for the beers).
- The second and third operations are the first two beers Alice adds to her cart without a multi-doc transaction. Notice that the two operations do not happen at the same cluster time.
- The two last operations are the two additional beers Alice adds to her cart with a multi-doc transaction. Notice that this time the two operations are atomic, and they are happening exactly at the same cluster time.
Here is the console of the transaction Java process that sums up everything I said earlier.
Thanks for taking the time to read my post. I hope you found it useful and interesting.
As a reminder, all the code is
available on the GitHub repository
for you to experiment.
If you're seeking an easy way to begin with MongoDB, you can achieve that in just five clicks using
our MongoDB Atlas cloud database service.