I am facing some performance issues with mongodb for a module in Production. Let me give overview about our set up:
1. We have 3 nodes replica set in Production each node is 32 GB RAM and 4 core CPU RHEL.
2. Our application is Read intensive. Writes are done only as cron job at night hours which sometime may last till morning 9-10 AM.
3. Readpreferrence thru drivers are set as 'PrimaryPreferred'
4. Slowness: Whenever writes are in process, it locks out reads on the nodes and queries takes more than 700 ms and causes application to timeout.
5. Read operations execute queries that uses aggregation framework, to avoid it we even further de-normalized data but have got marginal performance gain?
Can tag based configuration be of any help?? or any other better approach to tackle the issue?
Can you describe the nature of the writes done by the cron? Are they threaded?
What is the size of the data set? Do the reads mainly use a small subset of the data but the writes touch different/all of the document and push the read documents out of memory? (You would see that as a change in the number of faults.)
The fix for the writes starving the reads is to slow down the writes enough to allow the reads to get some time to do work. A few things to try but knowing what types of writes are occurring (inserts?, (single) updates?, multi-updates?, findAndModifies?, deletes?) would help give better suggestions on how to optimize the write operations so they do not have such a large impact.
1) On the write side try changing the write concern to ACKNOWLEDGED or event w:1 (one replica). That will slow the writes to the server enough to allow reads to happen between the writes (assuming the individual writes are fast). The down side is that the cron will now take even longer. You can claim back some of that time back using batched writes (with MongoDB 2.6) just want that the batches don't get so big that you start starving the reads again.
2) If the reads are multi-threaded then reduce the concurrency and, again, replace with batching.
3) Since the updates are on a cron I am going to assume that having all of the readers see all of the updates at once is not a huge issue. If that is the case you can spread the reads using a non-primary read, probably 'nearest' since it will choose from all of the nodes.
Thanks for the response! Let me explain in detail the nature of our operations:
1. Write is done daily as cron job. it is done thru Python script and we formulates data the way our application needs to avoid restructuring of result while responding to API requests. for Write, we use python driver and writeconcern for it is Acknowledge by default. We have not changed it.
2. Cron job doesnt update the records. We daily delete all data and re-load fresh data at night time.
3. Reads are not multithreaded but we have high traffic app that is read intensive.
1. Write is done daily as cron job. it is done thru Python script and we formulates data the way our application needs to avoid restructuring of result while responding to API requests. for Write, we use python driver and writeconcern for it is Acknowledge by default. We have not changed it.2. Cron job doesnt update the records. We daily delete all data and re-load fresh data at night time.
Do you overwrite/update each document? Delete all and then insert as fast as possible? Delete 1 and insert 1? Write into a new collection and then rename the new collection into place and drop the old?
How much change is there in each document every day?
3. Reads are not multithreaded but we have high traffic app that is read intensive.
Sorry - I meant the writes. Is the Python load app threaded?
I would try changing the loading application to use the bulk/batch write API and then increase the write concern to w:majority from Acknowledge. I think that might give you a good balance between getting updates done and also allowing the reads time to finish.
댓글 없음:
댓글 쓰기