MongoDB: Modifying Documents Using Aggregation Pipelines and Update Expressions

MongoDB Modifying Documents Updating documents on MongoDB prior to version 4.2 was quite limited. It was not possible to set values to a conditional expression, combine fields, or update a field based on the value of another field on the server-side. Tracing a parallel to the SQL update statements, for example, it wasn’t possible to do something like the following:

Update t1 set t1.f1 = t1.f2 where…

It wasn’t possible to use a conditional expression either, something easily achieved with SQL standards:

UPDATE t1 SET t1.f1 = CASE WHEN f2 = 1 THEN 1 WHEN f2 = 2 THEN 5 END WHERE…

If something similar to both examples above was required and the deployment was 3.4+, probably the usage of $addFields would be an alternative way to accomplish it. However, it would not touch the current document because the $out output destination could only be a different collection.

With older versions, the only way around was creating a cursor with aggregation pipelines and iterating it on the client side. Inside the loop, it was possible to update using the proper $set values. It was a difficult and tedious task, which would result in a full javascript code.

With MongoDB 4.2 and onwards, it is possible to use an aggregation pipeline to update MongoDB documents conditionally, allowing the update/creation of a field based on another field. This article presents some very common/basic operations which are easily achieved with SQL databases.

Field Expressions in Update Commands (v4.2+)

Updating a field with the value of some other field:

This is similar to the classic example of an SQL command: update t1 set t1.f1 = t1.f2 + t1.f3

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.update({_id:3},[{$set:{result:{$add: [ "$f2", "$f3" ] } }} ]);
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.find({_id:3});
{ "_id" : 3, "f1" : 30, "f2" : 300, "f3" : 3000, "result" : 3300 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.update({_id:3},[{$set:{result:{$add: [ "$f2", "$f3" ] } }} ]);

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.find({_id:3});

{ "_id" : 3, "f1" : 30, "f2" : 300, "f3" : 3000, "result" : 3300 }

The key point is the “$” on the front of the field names being referenced (“f2” and “f3” in this example). These are the simplest type of field path expression, as they’re called in the MongoDB documentation. You’ve probably seen them in the aggregation pipeline before, but it was only in v4.2 that you could also use them in a normal update command.

Applying “CASE” conditions:

It is quite suitable now to determine conditions for a field value while updating a collection:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({_id:3});
{ "_id" : 3, "grade" : 8 }


replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.update(
  { _id: 3}, 
  [
    { $set: {result : { 
      $switch: {branches: [
        { case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, 
        { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, 
        { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } 
      ] } 
    } } } 
  ] 
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })


replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({ _id: 3});
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({_id:3});

{ "_id" : 3, "grade" : 8 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.update(

{ _id: 3},

[

{ $set: {result : {

$switch: {branches: [

{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" },

{ case: { $lte: [ "$grade", 5 ] }, then: "NOPE" },

{ case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" }

] }

} } }

]

)

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({ _id: 3});

{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Adding new fields for a specific filtered doc:

Let’s say that you want to stamp a document with the updated date = NOW and add a simple comment field:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})
{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z") }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.update( 
  { _id: 3 }, 
  [ 
    { $set: { "comment": "Comment3", mod_date: "$$NOW"} } 
  ] 
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})
{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z"), "comment" : "Comment3", "mod_date" : ISODate("2021-07-05T18:48:44.710Z") }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})

{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z") }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.update(

{ _id: 3 },

[

{ $set: { "comment": "Comment3", mod_date: "$$NOW"} }

]

)

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})

{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z"), "comment" : "Comment3", "mod_date" : ISODate("2021-07-05T18:48:44.710Z") }

Reaching several Docs with the same expression:

It is possible now to either use the command updateMany() and reach multiple docs with the same pipeline.

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});
{ "_id" : 1, "grade" : 8}
{ "_id" : 2, "grade" : 5}
{ "_id" : 3, "grade" : 8}

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.updateMany({}, 
  [
    { $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } 
  ] 
)
{ "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 2 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5, "result" : "NOPE" }
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});

{ "_id" : 1, "grade" : 8}

{ "_id" : 2, "grade" : 5}

{ "_id" : 3, "grade" : 8}

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.updateMany({},

[

{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } }

]

)

{ "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 2 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});

{ "_id" : 1, "grade" : 8, "result" : "PASSED" }

{ "_id" : 2, "grade" : 5, "result" : "NOPE" }

{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Or use the command option { multi: true } if you want to stick to using the original db.collection.update() command. Note that the default is false, which means that only the first occurrence will be updated.

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.update({}, [{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } ] )
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5 }
{ "_id" : 3, "grade" : 8 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.update({}, [{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } ] )

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});

{ "_id" : 1, "grade" : 8, "result" : "PASSED" }

{ "_id" : 2, "grade" : 5 }

{ "_id" : 3, "grade" : 8 }

When specifying {multi:true} the expected outcome is finally achieved:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.update({}, [{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } ],{multi:true} )
WriteResult({ "nMatched" : 3, "nUpserted" : 0, "nModified" : 2 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5, "result" : "NOPE" }
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

WriteResult({ "nMatched" : 3, "nUpserted" : 0, "nModified" : 2 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});

{ "_id" : 1, "grade" : 8, "result" : "PASSED" }

{ "_id" : 2, "grade" : 5, "result" : "NOPE" }

{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Update by $merge Stage in the Aggregation Pipeline

Prior to version 4.2, addressing the result of an aggregate pipeline to a new collection was achieved by using $out. Starting on version 4.2 it is possible to use $merge which is way more flexible considering that while using $out, the entire collection will be replaced, and with merge, it is possible to replace a single document and a few or more things. You may want to refer to the comparison table described here:

https://www.mongodb.com/docs/manual/reference/operator/aggregation/merge/#std-label-merge-out-comparison

With MongoDB 4.4 and onwards, it is allowed to update a collection directly on the aggregate pipeline through the $merge stage. The magic happens after determining the output collection with the same name as the one being aggregated. The example below illustrates how to flag the max grade of the student Rafa in math class:

Original document

replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"})
{ "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math" }

1 2	replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"}) { "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math" }

The aggregation pipeline

replset:PRIMARY> db.getSiblingDB("dbtest").students2.aggregate( [{ $match : { "name": "Rafa","class":"math" } }, {$project:{maxGrade:{$max:"$grades"}}}, {$merge : { into: { db: "dbtest", coll: "students2" }, on: "_id",  whenMatched: "merge", whenNotMatched:"discard"} } ]);

1	replset:PRIMARY> db.getSiblingDB("dbtest").students2.aggregate( [{ $match : { "name": "Rafa","class":"math" } }, {$project:{maxGrade:{$max:"$grades"}}}, {$merge : { into: { db: "dbtest", coll: "students2" }, on: "_id", whenMatched: "merge", whenNotMatched:"discard"} } ]);

Checking the result

replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"})
{ "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math", "maxGrade" : 9 }

1 2	replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"}) { "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math", "maxGrade" : 9 }

Note that the maxGrade field was merged into the doc, flagging that the max grade achieved by that student in math was 9.

Watch out: behind the scenes, the merge will trigger an update against the same collection. If that update changes the physical location of the document, the update might revisit the same document multiple times or even get into an infinite loop (Halloween Problem)

The other cool thing is using the $merge stage to work exactly how a SQL command INSERT AS SELECT works (and this is possible with MongoDB 4.2 and onwards). The example below demonstrates how to fill the collection colltest_reporting with the result of an aggregation hit against colltest5.

replset:PRIMARY> db.getSiblingDB("dbtest").colltest5.aggregate( [{ $match : { class: "A" } }, { $group: { _id: "$class",maxGrade: { $max: "$grade" } }},  {$merge : { into: { db: "dbtest", coll: "colltest_reporting" }, on: "_id",  whenMatched: "replace", whenNotMatched: "insert" } } ] );
replset:PRIMARY> db.getSiblingDB("dbtest").colltest_reporting.find()
{ "_id" : "A", "maxGrade" : 8 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest5.aggregate( [{ $match : { class: "A" } }, { $group: { _id: "$class",maxGrade: { $max: "$grade" } }}, {$merge : { into: { db: "dbtest", coll: "colltest_reporting" }, on: "_id", whenMatched: "replace", whenNotMatched: "insert" } } ] );

replset:PRIMARY> db.getSiblingDB("dbtest").colltest_reporting.find()

{ "_id" : "A", "maxGrade" : 8 }

Conclusion

There are plenty of new possibilities which will make a developer’s life easier (especially the life of those developers who are coming from SQL databases) considering that the aggregation framework provides several operators and various different stages to play. Although, it is important to highlight that the complexity of a pipeline may incur performance degradation (that may be a topic for another blog post). For more information on updates with aggregation pipelines, please refer to the official documentation.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB Today!

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MongoDB: Modifying Documents Using Aggregation Pipelines and Update Expressions

Field Expressions in Update Commands (v4.2+)

Updating a field with the value of some other field:

Applying “CASE” conditions:

Adding new fields for a specific filtered doc:

Reaching several Docs with the same expression:

Update by $merge Stage in the Aggregation Pipeline

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MongoDB: Modifying Documents Using Aggregation Pipelines and Update Expressions

Field Expressions in Update Commands (v4.2+)

Updating a field with the value of some other field:

Applying “CASE” conditions:

Adding new fields for a specific filtered doc:

Reaching several Docs with the same expression:

Update by $merge Stage in the Aggregation Pipeline

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation