2014년 12월 4일 목요일

Reads timing out when using majority writes but a secondary is down

I have a three-member replica set, and one of the members is a hidden, priority 0 secondary which is in another data center as a passive backup. My application is (or rather was) configured to use a "majority" write concern.

This was all working fine until a few days ago when I took the off-site secondary offline for maintenance. Within a few minutes, reads to the remaining two servers started timing out, and I can't figure out why. Either putting the secondary back online or switching to w=1 fixed the problem.

When troubleshooting it today, I found that reads and writes appear to work fine for a few minutes, but then reads start timing out:

Uncaught exception 'MongoCursorTimeoutException' with message 'nz2.wb.gs:27017: Read timed out after reading 0 bytes, waited for 30.000000 seconds'

I can't isolate which reads are timing out, it appears to be random.

I'm using the PHP driver (version 1.5.8, the latest) with MongoDB 2.6.3 (not the latest, but please let me know if you suspect a newer version will fix it).

Any ideas?



The problem with majority writes on replica set with three nodes is
majority=2 which means that when you take one member out of the set,
the writer won't return until both remaining members acknowledge the
write as successful.   If you are not specifying wtimeout value with
your w:majority then you will end up waiting for acknowledgment until
it comes.   So if the secondary falls behind even by 5 seconds, then
for 5 seconds every single write will be sitting around waiting for
ack and no reader will be progressing because, well, the writers will
eventually block all of them/or use up all the connections in the
connection pool.

Can you check if there was any replication lag during those time
periods?   Do you have wtimeout value set on the write requests?



I did think about that, but the two remaining secondaries are hugely over-provisioned -- they're brand new grunty machines with good SSDs on a very fast network with only a fast switch between them, they just never break a sweat with anything I've ever thrown at them.

I did look at replication lag but according to MMS it's not reliable unless it's over 10 seconds -- is that accurate?

Besides, remember that the off-site secondary is off-site -- it's also a non-grunty machine with spinning disks. If the problem is really that writes are taking too long to acknowledge with majority=2, then surely the local, fast machine would always acknowledge writes several orders of magnitude faster than the commodity machine in another city with spinning disks.

Any other thoughts?



Good points, all - just to confirm, the total replica set members in its initial configuration is three?  Two regular members and one hidden/priority=0?

The reason I ask is you mention remaining two secondaries, but only one secondary (and one primary) would remain if it were as I understood it.



One more thing - it would be helpful to see the logs from the (for example) primary for 30 seconds + up to and including some random read to the primary timing out...



> Good points, all - just to confirm, the total replica set members in its initial configuration is three?  Two regular members and one hidden/priority=0?
Correct -- two regular and one hidden/priority=0.

I never looked at the server logs (sorry), I only looked at the PHP errors. 

2014-11-15T10:59:16.120+1300 [conn65935709] command wgnz.$cmd command: update { update: "Stats_Visit", updates: [ { q: { _id: ObjectId('5465c4603143345cbd800e86') }, u: { RF: "www.boardingkennelsnorthshore.co.nz/large-dog-care", IP: 1985742753, A: false, Site_ID: 43021690, Instance_ID: 43021682, UAID: ObjectId('17ccc622add28e107e31e398'), _id: ObjectId('5465c4603143345cbd800e86') }, multi: false, upsert: true } ], writeConcern: { w: "majority" }, ordered: true, upsert: true, w: "majority" } keyUpdates:0 numYields:0  reslen:146 3273ms
 
So the updates are taking several seconds, but the question is why...


댓글 없음:

댓글 쓰기