Aggregation In MongoDB

Introduction

 
Welcome back to the MongoDB article series - Part 7. In the previous article, we discussed the advanced index concept in MongoDB. We have also discussed capped collections, TTL index, Full-Text Index, etc. Now, in this article, we will discuss how to implement the aggregation framework concepts in a MongoDB Database. If you want to read the previous articles of this series, then follow the links.
In the previous article, we have discussed the index in MongoDB. Now, in this article, we will discuss the aggregation framework in MongoDB, and also, we will see how we can use this in our data searching query. So, Data Aggregation is one kind of process where all types of information are collected and then represented in a summary format. The main purpose of the data aggregation is doing the analysis work like statistical analysis, comparison analysis, computations, etc.
 
Aggregation Frameworks
 
In MongoDB, the aggregation framework is mainly used for displaying results for the statistical and analytical analysis of data. This framework always provides us with the transformation of a document within a collection. Basically, the aggregation framework builds a pipeline that processes the data through several blocks of aggregation framework like – filtering, projecting, grouping, sorting, limiting, and skipping. This operation always processes the data from the documents and returns the projected results. This framework groups all the data from the different multiple documents together and then performs a variety of operations on this grouped data to return a single result.
 
The aggregate framework always returns an array as a result of the documents. This makes the aggregation framework as an alternative knowledge flow pipeline. It actually acts as a process flow. This framework first performs the transformation of documents and then combines the documents within a collection and returns the output. Every operator in an aggregation framework receives a stream of documents, then performs some kind of transformation on these documents, and then passes on the results of the transformation to the users. If it's the last pipeline operator, then the result returns to the user directly. Otherwise, the result is passed to the next transformation process as an input. Operators may be combined in any order and recur again and again as necessary. For instance, you'll perform "$match", "$group", and then again "$match" once more with totally different criteria.
 
The MongoDB aggregation pipeline always consists of stages. Every stage transforms the documents as they submit them to the pipeline. Pipeline stages don't get to turn out one output document for each input document, e.g. some stages might generate new documents or filtered documents as a result. Pipeline stages will execute multiple times within the pipeline. MongoDB provides the db.collection.aggregate() methodology within the data collections and therefore the combination command for aggregation pipeline.
 
$Project
 
Projection is one of the additional powerful syntaxes within the aggregation pipeline. It is mainly used in the aggregation pipeline rather than using it as a “normal” command syntax. "$project" permits you to extract fields from subdocuments, rename fields, and perform attention-grabbing operations on them. The simplest operation "$project" will perform is solely choosing fields from your incoming documents to incorporate or exclude a field, use the identical syntax you have used in the second argument of a command. In the below example, the syntax will return only those documents as a result that contains the field "author" within the collection.
 
By default, "_id" is often coming if it exists within the incoming document (some pipeline operators take away the "_id" or it is often removed in an exceedingly former projection). You can exclude it as shown in the below example. Inclusion and exclusion rules generally work identically as same as “normal” queries. You can also additionally rename the projected field. For instance, if you needed to come back the "_id" of each user as you can do,
  1. db.articles.aggregate({"$project" : {"author" : 1, "_id" : 0}})  
In the above syntax, “author” is used to represent the fieldname value within the aggregation framework. For example, "$age" would get replaced with the contents of the age field (and most likely be variety, not a string), and "$tags.3" would get replaced with the fourth component of the tags array. Thus, "$_id" is replaced by the "_id" field of every document coming back through the pipeline. Note that we just need to specifically exclude "_id" field so that it will not return as part of the result to the user. We'll be able to use this syntax to make multiple copies of a field for later use in an exceedingly "$group", say.
  1. db.users.aggregate({"$project" : {"userId" : "$_id""_id" : 0}})  
The simplest "$project" expressions are unit inclusion, exclusion, and rename field names ("$fieldname"). However, there are also several options available under $project syntax including the different types of expressions. There are several expressions available with aggregation which you can combine and nest to any depth to create more complex expressions.
  1. Mathematical Expression 
     
    Arithmetic expressions allow you to manipulate numeric values. You typically use these expressions by specifying an associate array of numbers to control. For example, the subsequent expression will calculate the total value of the "salary" and "bonus" fields. These are the syntax of each expression.
     
    • $add (expr1[, expr2, ...]) - Accepts one or more expressions and adds them together
    • $subtract (expr1, expr2) - Accepts two expressions and subtracts the second from the first.
    • $multiply (expr1[, expr2, ...]) - Takes one or more expressions and multiplies them together.
    • $divide (expr1, expr2) - Takes two expressions and divides the first by the second.
    • $mod (expr1, expr2) - Takes two expressions and returns the remainder of dividing the first by the second. 
       
  2. Date Expression
     
    Sometimes, we need to performs aggregations command on the time-based unit value like - What was happening last week? Or Last month? Or Over the last year? Therefore, aggregation features a set of expressions that may be used to extract date info in some additional helpful ways like "$year", "$month", "$week", "$dayOfMonth", "$dayOfWeek", "$dayOfYear", "$hour", "$minute", and "$second". So we can solely use date operations on fields that basically contain data type value. 
     
  3. String Expression
     
    There are a couple of basic string operations which we need to perform some times. Their signatures are as follows,
    • $substr (expr, startOffset, numToReturn)
       
      This returns a substring of the primary argument, beginning at the startOffset-th computer memory unit and together with consequent numToReturn bytes (note that this is often measured in bytes, not characters, therefore multibytes encodings can watch out of this). expr must be valuable to a string.
    • $concat (expr1[, expr2, ..., exprN])
       
      Concatenates each string expression (or string) given.
    • $toLower (expr)
       
      Returns the string in lower case. expr must evaluate to a string.
    • $toUpper (expr)
       
      Returns the string in upper case. expr must evaluate to a string
Sample Example of $add
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "totalPay": {  
  4.             "$add": ["$salary""$allowances"]  
  5.         }  
  6.     }  
  7. })  
Sample Example of $subtract
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "totalPay": {  
  4.             "$subtract": [{  
  5.                 "$add": ["$salary""$bonus"]  
  6.             }, "$401k"]  
  7.         }  
  8.     }  
  9. })  
Sample Example of Date expression
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "hiredIn": {  
  4.             "$month""$joinDate"  
  5.         }  
  6.     }  
  7. })  
Sample Example of $substr
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "email": {  
  4.             "$concat": [{  
  5.                 "$substr": ["$firstName", 0, 1]  
  6.             }, ".""$lastName""@example.com"]  
  7.         }  
  8.     }  
  9. }) 
$Match
 
$match syntax basically filters the documents so that we'll be able to run associate aggregation commands on a set of documents. For example, if you simply need to search out the documents concerning users of a specific State, you may add a "$match" expression like,
  1. {$match : {"state" : "ORL"}}  
"$match" will use all of the usual question operators ("$gt", "$lt", "$in", etc.). One notable exception is that we cannot use geospatial operators during a "$match". Generally, smart observe is to place "$match" expressions as early as potential within the pipeline. This has two benefits,
  1. It permits you to separate out needless documents quickly to provide faster performance in the pipeline.
  2. It will use indexes if it's run before any projections or groupings.
$group
 
Grouping is the process mechanism that basically gives us the opportunity to group or bundle documents on the basis of some fields and also can combine the values of those documents. Some samples of groupings in the database level are -
  • If we have a tendency to have per-minute measurements and we wish to search out the common humidity per day, we might group by the "day" field.
  • If we have a tendency to have a set of scholars and we wish to prepare students into teams based on grade, we have a tendency to group by their "grade" field.
  • If we have a tendency to have a set of users and we wish to check what percentage of users we have by town, we may cluster by each of the "state" and "city" fields, making one cluster per city/ state pair. 
When we opt for a field or field to group by, then we need to pass it to the "$group" syntax which performs as the group’s "_id" field. Thus, for the examples higher than, we have,
  1. {"$group" : {"_id" : "$day"}}  
  2. {"$group" : {"_id" : {"state" : "$state""city" : "$city"}}}  
In MongoDB, $group aggregation function contains some group operator which will actually perform the basic aggregation operations. There are two types of operators in the $group – one is Arithmetic Operator and another is Extreme Operator. The Arithmetic operator contains $sum and $avg operators. Whereas the Extreme operators contains $max, $min, $first, $last operators. 
  1. $sum
     
    This operator adds value for each document and returns the result. Also, although the example above used a literal (1), this can also take more complex values and returns the result.
     
  2. $avg
     
    Returns the average value of all input documents during the group.
     
  3. $max
     
    Returns the maximum values within the documents.
     
  4. $min
     
    Returns the minimum values within the documents
     
  5. $first
     
    Returns the first value within the group. 
     
  6. $last
     
    Returns the last value within the group.
Sample Example of $sum
  1. db.sales.aggregate({  
  2.     "$group": {  
  3.         "_id""$country",  
  4.         "totalRevenue": {  
  5.             "$sum""$revenue"  
  6.         }  
  7.     }  
  8. })  
Sample Example of $avg
  1. db.sales.aggregate({  
  2.     "$group": {  
  3.         "_id""$country",  
  4.         "totalRevenue": {  
  5.             "$average""$revenue"  
  6.         },  
  7.         "numSales": {  
  8.             "$sum": 1  
  9.         }  
  10.     }  
  11. })  
Sample Example of $last
  1. db.scores.aggregate({  
  2.     "$group": {  
  3.         "_id""$grade",  
  4.         "lowestScore": {  
  5.             "$min""$score"  
  6.         },  
  7.         "highestScore": {  
  8.             "$max""$score"  
  9.         }  
  10.     }  
  11. }) 
$sort
 
We can perform any type of sort operation either on a particular field or on multiple fields of a document by using the $sort syntax. If we need to sort a large volume of documents, then it is always highly recommended that first we will sort the data in the required order at the beginning of the pipeline and apply an index on it. Otherwise, it will take much more time and in that way, the process response will be slow or delayed. $sort is another example of roadblock operations just like a $group. Since, in this process, it needs to arrange all the records accordingly to sort the records in a specified process, then it needs to sort all the records, and then only will it send the individual shards to the mongo results for further processing.
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "compensation": {  
  4.             "$add": ["$salary""$bonus"]  
  5.         },  
  6.         "name": 1  
  7.     }  
  8. }, {  
  9.     "$sort": {  
  10.         "compensation": -1,  
  11.         "name": 1  
  12.     }  
  13. })  

Conclusion

 
In this article, we discussed some aggregation framework syntax like a $match, $group, $sort, $project etc. These syntaxes are basically used for performing many types of mathematical or analysis operations. In the next article, we will discuss some more syntaxes of the aggregation framework in MongoDB. I hope this article will help you. Any feedback or queries related to this article are most welcome.


Similar Articles