2014년 12월 28일 일요일

Best Practices using rs.stepDown and replacing Secondaries to minimize data loss

We run mongo in aws so a common scenario is the rolling upgrade where we:

1. Change a server configuration in chef
2. Kill secondary #1 in each replica set
3. Wait for their replacements to come up and fully replicate
4. Kill (hidden) secondary #2 in each replica set
5. Wait for their replacements to come up and fully replicate
6. run rs.stepDown(120) on each primary to allow a new secondary to be elected.

It is my understanding that some data loss is inevitable at step 6.

However, it seems that there is also data loss during steps 2 and 4 when we kill secondaries. I believe our write concern is the default which is 1 so I am confused. There is not a LOT of data lost but I would assume losing a secondary would result in 0 data loss. 

Any suggestions on how to improve this procedure? 



How are you determining/detecting there is data loss and what exactly is the nature and amount of lost data?

I cannot think of a scenario where bringing down a secondary can cause data loss - it's possible you've uncovered a defect.   Also please include the version of MongoDB that you are running/have seen this with.


댓글 없음:

댓글 쓰기