2014년 12월 8일 월요일

MongoDB Performance Issue During Background Flush on CentOS

A few days ago we experienced a strange issue. Database writes and logins took longer than a minute and the primary server appeared as down to the secondary replicas. Performance in MongoDB was so bad that our client application was not able to connect to the MongoDB database. There was no unusual database activity and none of the physical resources were stressed on the server. I checked network logs and SAN activity with our storage vendor and nothing appeared out of the norm. All evidence indicates that this was caused by the MongoDB background flush operation. I've seen reports online with this happening on a Windows server, but it doesn't apply in our case since we're running on CentOS 6.5. Any insight would be greatly appreciated.

Our production configuration is as follows. MongoDB 2.4.10, 3 node replca set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver

Logs captured during event:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond XX.XXX.XXX.XX1:27017 at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at MongoDB.Driver.Internal.MongoConnection.Open() 

Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info XX.XXX.XXX.XX2:27017 thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info XX.XXX.XXX.XX3:27017 thinks that we are down

Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: { authenticate: 1, nonce: "XXX", user: "master", key: "XXX" } ntoreturn: 1 keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update DialIQ.AdapterLegStatus query: { _id: "XXX" } update: { _id: "XXX", _t: "AgentLegLockableItem", StateLogs: {}, Client: { ClientId: "XXX", ProviderId: null, UseFakeProvider: false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX", CallRecordingEnabled: false, GhostNumber: "XXX", GhostNumberEnabled: false, MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null, AgentId: "XXX", ConferenceId: "XXX", ExternalRefId: null, DurationSeconds: 0, RecordingUrl: null, StartTime: new Date(1413222610034), EndTime: new Date(-62135596800000), Type: 1, CallType: 1, Disavowed: false, PhoneNumber: "XXX", PhoneExtension: "", CallerId: "XXX", RingTimeout: 16, LastSavedState: 0, ResponseToProvider: null, RecordingEnabled: false, Agent: { AgentId: "XXX", _id: null, Phones: [ { Phone: "XXX", PhoneExtension: "", PhoneType: "Dialer" } ], CallerId: "XXX", RingTimeout: 16, CallLegId: "XXX", ExternalRefId: null, CallPersistenceSeconds:16, SoftPhone: null }, PromptTimeout: 0, WhisperMessage: null, ResultMessage: null, PromtpToAcceptTransfer: false, TransferStyle: 0, PromptMessageToAccept: null } idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms 




In the logs there would also be a line logged for slow flush operation - was that there in the vicinity of these slow operations?



I do see the slow flush operation. Here's the info from the log:

Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms for 77 files



19449ms or 19seconds is very slow given that mongod flushes data files
(asynchronously) every 60 seconds - this suggests your disk IOPS are
not sufficient for the amount of writes you are doing (if this happens
periodically - it could be that some unusual write pattern was
happening at that time?)



That's why I'm at a loss here. A long write operation would be logged, and there's nothing unusually except the length of time these queries took to execute. We are not even close to hitting any thresholds on the SAN. Just to make sure, I had the logs checked by the network engineers & storage vendor, and nothing unusual can be identified.



I don't mean a long running write, I mean more random writes than
normal - you can have some number of updates that (for example) cause
a move of several documents which have large indexed arrays, which
means many index entries have to be updated and that's more random
writes than a "typical" update...



Nothing different than normal on that day. We do have a lot of updates happening all the time. MongoDB is used in a highly transactional manner.



Not sure this is what you are seeing but since you said "highly transaction manner"...

We do the same thing and have seen performance crashes. What we found happening was that the normal write back was occurring but the number of dirty pages on the system was still growing. There is a memory setting in the kernel (/proc/sys/vm/dirty_background_ratio) that once you trip the kernel starts to write dirty pages back to disk, effectively, as fast as it can. For small memory systems the write back is not that big a deal.  For big(ger) memory systems like yours the additional write back of even a few gigs of memory can exhaust all of your remaining disk bandwidth.

Your storage team will say that the SAN and storage controller are asleep because (they are and) the bottle neck is either at the machines SAN interface or (in my case) they pulled all of the replica set member's disk partitions from the same small set of spindles. They don't look at the drive utilization unless you specifically ask and then sometimes don't have a granular enough view to see the individual drives.

Long story short I agree with Asya that you were probably starved for disk I/O. Does your MMS have the platform statistics? You should be able to see the disk I/O in there if I remember correctly...

If that is the case the only real fix is to get a better disk subsystem or spread across more machines/disks.  I would also go to the storage team to get a setup for random reads/writes. Make sure the members are hitting different sets of spindles and they have a reasonable RAID setting for the workload.

We ended giving up on disk entirely and decided to just stay in memory. I don't recommend it unless you think through all of your failure modes but it is fast and has less performance jitter.



Thanks Rob. This is very helpful information. I've already started to pursue re-evaluating the disk IO situation and found some things that can be improved. Since this doesn't seem to be ringing a bell with the MongoDB (thank you Asya), I'll continue down this path.


댓글 없음:

댓글 쓰기