Losing a server without prior warnings is no longer a question of if, but when it happens — how to ensure high availability is not at stake? MongoDB This article provides a solution to automatically spin a replacement server when one of the server your replica set faces an unexpected server failure in order. This approach will help reduce the overall time your replica set is exposed to fault tolerance value of 0. Who should read it? The intended readers of this article are Operations, DevOps teams with basic knowledge of AWS products, MongoDB enthusiasts and all the technologist who want to know anything and everything. The technologies used in this article are , , , , , , , , , and MongoDB AWS EC2 Lambda Route 53 DynamoDB CloudWatch VPC Auto Scaling Ansible IAM Python Why should you read it? In an unexpected server failure situation, the fault tolerance of replica set may drop to 0. In other words, the replica set is prone to having should one more server fail. This article helps you “no primary” Build knowledge base on the above situation Steps required to restore the fault tolerance value > 0 Automate the process in AWS Although the article discussed the solution for , you could apply the knowledge base for other platforms such as , or . Amazon Web Services Google Cloud Platform Pivotal Cloud Foundry Kubernetes What will you learn? The article outlines the following topics in detail Worst case scenarios when fault tolerance of replica sets is 0 Typical process followed by operations team to resolve issue Ways to speed up the process replacement via automation Pros and Cons of the re-spawn approach Scope for enhancements to fit your needs What’s the solution? Considering that this article gets deep into details, I would like to give an executive brief on . Unlike the stateless web server, MongoDB server is all about state. So, I have used ‘How it’s done?’ database CloudWatch, Lambda to detect a dead/deteriorating EC2 instance Auto Scaling to re-spawn and replace a dead server Ansible & Resource Tags to associate disk volumes to EC2 instances Route53, Lambda to resolve/update the DNS/server names Let the new replica set member sync up automatically Hopefully that’s short and sweat summary! Kudos I would like to thank , and for their valuable time in reviewing this article. Alex Komyagin Gupta Garuda Anant Srivastava Cautionary note The solution discussed in this article . It primarily helps reduce the duration of your replica set exposed with fault tolerance value of 0. I want you to be aware that neither this article nor the solution discussed in here is officially backed/supported by . So, feel free use it at your own risk! will not help you attain higher fault tolerance MongoDB Inc Source code The source code is available at repo: . https://github.com/sarjarapu/whitepapers/tree/master/mongodb/respawn/code Introduction Being a consulting engineer on the field, I work with different client-side teams consisting of operations, DBAs, architects and developers. Many of these clients hosts their database servers on cloud providers such as , etc. Some of the larger clients with the big footprint in Amazon AWS and/or who has been using AWS for a few years have come across issues like below MongoDB Amazon Web Services Google Cloud Platform Corrupt EC2 instances Could not SSH to the server Amazon etc. retired your instance One of the very interesting and challenging questions surfaced while working with such clients was “ ” Losing an EC2 instance without prior warnings is no longer a question of if, but rather when it does happen — what actions can we take to ensure MongoDB is highly available? Like any responsible operations team, these teams are highly concerned about the worst-case scenarios and wanted to be proactive by having contingency plans to act upon. With one of the replica set member facing unexpected server failure, the fault tolerance of the replica set could be at 0. In such scenarios your replica set is in a vulnerable state as it cannot sustain an additional system failures to be highly available. To restore the fault tolerance back to normal, the process typically requires Diagnosis of the server issue Analyze if replacement of the server is warranted If replacement of server is required Re-configure the MongoDB replica set on the new server Data synchronization Update application connection string to use new server I would like to highlight that this approach requires manual intervention from an operations team and consumes a lot of time. Apart from re-spawning a new server tasks, the operations have to be highly vigilant of overall system health; ensuring no additional failures happen while the fault tolerance of your replica set is at 0. This article offers you a solution, which may help you sleep better through the night in spite of server failures. So, let’s get started! High availability MongoDB offers redundancy and of the database via . Whenever a member in replica set goes down, the count of the MongoDB replica set is reduced by 1. high availability replication fault tolerance If a of the replica set members are inaccessible or unavailable then you cannot to elect a and the replica set would not be able to accept writes. So it is crucial to have a primary member or have enough members to elect a primary. Most importantly, reduce the time you are exposed to vulnerable state of losing the primary with a loss of an additional member. majority primary member If you are scared to death of losing yet another member and not have primary member then you should have higher fault tolerance than what you currently have. Referring to the , to attain peace of mind with fault tolerance of 2, you would at least need a replica set of 5 members. However, if you cannot afford a 5 member replica set, then this article may help take steps to reduce the time period during which the fault tolerance of your replica set is 0. number of members v/s fault tolerance matrix The solution — Manual version If a replica set member becomes unresponsive and cannot be brought back up without replacing the entire server then you would need manual intervention to resolve the issue. As discussed in the summary section, you would need to do the following What’s the solution? Replace the unresponsive server with the a new replacement server Reconfigure the replica set with new replacement server Wait for data synchronization to complete Update the application connection string with new server The below image shows a scenario with a replica set of 3 members where is unresponsive. The disk volume - is when compared to the other two members as the data is not being replicated to . Server 03 Disk 3 STALE Server 03 MongoDB Server Replacement Depending on the DataSize, a full data synchronization from the scratch on a brand new server takes a lot of time. So, first lets discuss on how to make the data synchronization faster Faster data synchronization You could achieve faster syncs by the data volumes reusing Un-mount the MongoDB data volume — from Disk 3 Server 03 Mount the data volume — on the replacement server Disk 3 Server 04 Doing so one could completely avoid the and just do the synchronization of oplog data that was not yet. In other words, if your old/dead server is replaced in 10 mins, then replacement server just needs to replicate the oplog entries happened during those 10 mins. initial sync replicated A few assumptions made while considering reuse of existing disk volumes to achieve faster sync time are : The MongoDB data files are written onto separate disk volume than underlying Operating System disks so that you could un-mount/re-mount elsewhere. Use of separate data volume : can be attached to replacement EC2 instance. However, the EC2 Instance Store / that are physically attached and cannot be mounted on to another instance. Re-mountable disks EBS volumes Ephemeral disks : If the disk volume or MongoDB data files are corrupted, then re-mounting the same disk volume on the replacement server would still be in the same corrupted state. To fix corrupted disk/data you may have to full on a new disk volume Disk or data is not corrupted initial sync The solution — Automated version Previously mentioned solution works fine, but still requires manual intervention to do the following Creating a replacement servers Reconfiguring the replica set with newer member Mounting of re-used disk volume & data synchronization The automation process is the core of this article and gets thick real quick and definitely not for the faint hearts. So, I would like to introduce you the automation process in the order of increasing complexity. Before we get into the details, here is on how you could automate the above steps. high-level overview When the replacement server comes up, a shell script will mount all the disk volumes tagged to . AWS CloudWatch will invoke a function based on event. The Lambda function, written in , will update the DNS entry on the so that the FQDN of the instance is now mapped onto the of . High-level overview: Server 04 Server 03 AWS Lambda 'running' Python AWS Route 53 Server 03 IP address Server 04 Automate: Mounting of data volumes In order to automate the process of mounting the disk volumes onto an EC2 instance, first you need to identify/search for the resources. I am using the AWS to search for the EC2 instances and their associated disk volumes. Below image shows a few tags associated with disk volume — mounted on an EC2 instance . resource tags ‘/dev/xvdb’ skamon_demoapp_rs03 AWS Resource Tags If EC2 instance associated with tag: is deemed unresponsive then you need to provision a new replacement EC2 instance for it. When the new server boots, if it has not already mounted, an script will fetching the disk volume using tags and mounts them onto the current EC2 instance. You could potentially use any tools at your disposal including , , etc for achieving the same. instance_name skamon_demoapp_rs03 Ansible AWS Command Line Interface Puppet Chef Avoid replica set reconfiguration None of the members in the replica set can work with as the server is not part of the replica set yet. However, reconfiguring a replica set can cause the current primary member to step down, which may for a new primary. You could completely avoid the reconfiguration of the replica set by using the (FQDN) of the servers in your than using default , which is unique for every EC2 instance. Server 04 trigger an election Fully Qualified Domain Name replica set configuration private DNS name Rest of this section discusses about the implementation of VPC Setup & Components Dynamic DNS VPC Setup & Components I have created a VPC with Public and Private Subnets as shown in the below diagram. AWS VPC Components The detailed steps for creating VPC, subnets etc. is out of scope for the current article. If you are not aware of the concepts in VPC or not sure about how to isolate database servers from the public internet then I highly recommend you to read the articles and . I also found the below videos to be extremely useful. What is Amazon VPC VPC with Public and Private Subnets Dynamic DNS As mentioned earlier, you could completely avoid reconfiguration of the replica set by using the Fully Qualified Domain Name (FQDN) of the servers in your replica set configuration than using default private DNS name. In this section, I shall go into the details of setting up a Dynamic DNS in AWS using and a slew of other products. Amazon Route53 While the default VPC DNS can provide basic name resolution for your VPC, the Route 53 private hosted zones offer richer functionality by comparison. Route 53 allows customers to modify DNS records via web service calls. Using this API one can automate the creation/removal of records sets and hosted zones. — Jeremy Cowan & Efrain Fuentes I would like to thank and from Amazon Web Services team for compiling a very interesting article — ‘ ’. I highly recommend you to read the above article to know more about Dynamic DNS. Jeremy Cowan Efrain Fuentes Building a Dynamic DNS for Route 53 using CloudWatch Events and Lambda Below diagram depicts various events that will trigger behind the scenes and shows how various systems work together when a new replacement server is taking over the place of a dead server. Please read through the sequence of events section to follow along with the diagram. Sequence of events while spawing a new server The sequence of events: (ip-10.12.230.160) in the replica set is dead Server 03 The other members of replica set see’s the as 'Unreachable' Server 03 Fault tolerance of replica set is now dropped to 0, with only 2 out of 3 member are up and running. A new replacement server (ip-10.12.230.200) is provisioned Server 04 The Ansible script on new server copies various tags from onto itself ( ) Disk 03 Server 04 The script updates hostname of server, finds & mounts the using resource tags Disk 03 Cloud Watch event for ‘EC2 instance state: running’ on is triggered Server 04 AWS Lambda function subscribed to above Cloud Watch event runs the Python code The Python code fetches CNAME / ZONE resource instance tags on and Updates A, PTR, and CNAME records in Route53 (replacing with ) Server 04 Server 03 Server 04 The DNS name, , is now resolved to skamon_demoapp_rs03 Server 04 The replication process starts and keeps all the members synchronized and the fault tolerance of the replica set will be back to 1 Automate: Provisioning the replacement server The final piece of the solution that needs to be automated is automatically when a failure is detected. You can use to detect an impaired Amazon EC2 instances or unhealthy applications and replace the instances without your intervention. If you are new to Auto Scaling, then I strongly advise you to view webinar on ‘spawning a new MongoDB replacement server’ Auto Scaling Michael Hanisch Automating management of Amazon EC2 instances with Auto Scaling To implement the automatic provisioning of replacement hardware Create and configure a new auto scaling group With the EC2 instances of your MongoDB replica set in the group Set the group to always maintain a min/max count = replica set member count Use with all required tools and softwares in the launch configuration AMI Once the auto scaling group is configured as above, when one of the servers is terminated then a new EC2 instance is automatically created. You may configure the health check rules of this group to fit your needs. I want to make sure you understand that the feature is used purely to help ensure that you are running the desired number of Amazon EC2 instances rather than the scalability of your MongoDB database which is achieved through proper use of . ‘AWS Auto Scaling’ MongoDB Sharding If you find the use of ‘AWS Auto Scaling’ feature for a MongoDB database is an interesting use case, then you may also like listed in the next section. Other use case scenarios Use cases & Scope for improvement Other use case scenarios Combining the approach along with the feature you could upgrade the underlying hardware configuration of replica set members, perform the MongoDB version upgrade, etc. To achieve this, you must rebuild or and recreate all EC2 instances of the replica set members in a rolling upgrade fashion. ‘AWS Auto Scaling’ ‘MongoDB rolling update’ Patch an AMI and Update an Auto Scaling Group As much as I like the use case of upgrading the hardware, using this approach for a trivial process such as is an overkill in my opinion. A typical MongoDB version upgrade involves an , which is a lot simpler and less time consuming than spawning an entire EC2 instance with an upgraded AMI. I did come across a client who prefered to patch AMI over the simplicity to keep the overall operations process consistent. So it’s purely your personal preference. MongoDB version upgrade in-place binary version upgrade Scope for improvement While the current solution is adequate to some of you, there is still scope for further improvement. A few improvisations that I thought through are Cross data center support The current solution uses only one auto scaling group. If your MongoDB deployment is geographically distributed across multiple data centers all the time, then you may have to create one auto scaling group per each data center. This will help ensure that there is at least one server up and running per data center. Network Partition If your MongoDB deployment is spread across three data centers then inherently you are covered with MongoDB’s High Availability. Below picture shows a scenario with one of the replica set member is failed along with a network partition scenario. Despite spawning a new MongoDB server the fault tolerance of the replica set would still be 0 because of the network partition. Multiple Data Centers with Network Partition Multiple failures at same time When the Ansible script runs on the replacement server, it searches for available disks using tag names associated with . If two or more replica set members are down at the very same moment, then the scripts from both the replacement servers may try to mount the same disk. I am currently using a random delay before mounting disks as a to overcome the possibility of two replacement servers trying to mount the same disk volume at same time. server_group_name poorman's mechanism AWS credentials The Ansible script makes use of the AWS credentials from the replacement server to search and mount the disk volumes. Therefore the AWS credentials file must be copied onto the EC2 instance and should be secured in such a way that only the user running the cron job has access to it. Eliminate cron jobs The logic baked into the Ansible scripts can be moved into the AWS Lambda function to completely eliminate the need for a cron job invoking Ansible scripts. If you could think of improvisations that are not listed in here then feel free to drop me note in the comments. Closing thoughts I strongly recommend you to review the to diagnose and fix the real issue rather than blindly replace the server in trouble. Having the number of servers up to maintain the fault tolerance is utmost importance, however not fixing the issue that caused this situation in the first place is like kicking the can further down the road. mongod.logs It could be possible that your server went down for reasons that could have been addressed elsewhere. Example: The got killed by . In this scenario, spawning a new MongoDB server is no different than restarting the mongod process. As an alternative, allocating will avoid issues with memory contention and . mongod OOM Killer swap space prevent the OOM Killer on Linux systems from killing mongod Finally, I want to congratulate you for making it this far of the lengthy article. If you are new to some of the technologies listed in here, then I could only imagine how much of an information overload you been through. You all deserve a pat on your shoulders for this awesome feat. If you are planning to use these concepts in your environment or you think of any interesting use cases not listed above or I happened to stir some inspiration in you to customize it further, please drop a comment about it.