2014년 12월 11일 목요일

Geographically Redundant Replica Set

Just needed some clarification on how 'Redundant' my current setup is.. 

We have a 5 node cluster setup as below:

 "members" : [
                {
                        "_id" : 0,
                        "host" : "mongo001",
                        "priority" : 0,
                        "slaveDelay" : 3600
                },
                {
                        "_id" : 1,
                        "host" : "mongo002",
                        "priority" : 10
                },
                {
                        "_id" : 2,
                        "host" : "mongo003",
                        "priority" : 10
                },
                {
                        "_id" : 3,
                        "host" : "mongo004",
                        "priority" : 5
                },
                {
                        "_id" : 4,
                        "host" : "mongo005",
                        "priority" : 5
                }


Nodes 0, 1 & 2 live in DC1, Nodes 3, 4 live in DC2.

We are going to be doing a DR test which will replicate a Complete Data Center blackout in DC1 which will leave only 2 members up. I believe what will happen is both members will remain Secondary as there will be no majority - is this correct?

If so - what would people suggest is the best way to alleviate this - arbiter in the cloud?



It will depend if you are talking about "DR: Disaster Recovery" that is for me different from "High Availability, with Automatic Failover", and all this could vary depending if your cluster is sharded or not.

I will answer for a single replicaset, since you do not mention sharding in your question.

For DR:
 - you can keep the configuration you have, with the priority you have set in DC1, the primary will be located there by default.
 - if you lose DC1 entirely, you will need a manual action to have a primary node in DC2, a good way for this is to add a new node, arbiter, and reconfig the replicaset inDC2.
 > BUT keep in mind this is something manual

For H/A with Automatic Fail Over:
 - this is where you need to add another DC (for example in the cloud) where you deploy an arbiter (or node) that will be visible by the 2 DC and will give the majority.
 - So you can keep the same number of nodes, and the configuration you have in your current DCs
 - Just add an arbiter in the Cloud
 - In this case if DC1 disappears the primary will be elected in DC2

You can find many information here:

Hope that helps



Thanks for the reply..

I agree with what you have said above, but in testing we found that in an ideal scenario we wouldnt want to reconfigure our the replica set to remove the now defunct DC1 nodes - as in reality, we would hope they would be back within 24 hours (max).

We started playing with different scenarios and found the following worked really well for us:

- Lose 3 DC1 nodes
- This leaves a minority cluster with 2 SECONDARY nodes
- Force reconfig by removing voting power of DC1 nodes (.votes = 0)
- Also ensure priority is set to DC2
- This elects a PRIMARY in DC2
- If the outage is long, then add ARBITER to DC2 for odd voting mechanism

When DC1 is back:
- Remove arbiter if added
- Start DC1 nodes with no voting power
- On DC2 Primary, run reconfig to give 3 DC1 nodes voting power (.votes = 1)
- Reelection occurs but stays in DC2
- Re prioritise if DC1 is needed to be PRIMARY.


What does everyone think? any complications in doing this?


댓글 없음:

댓글 쓰기