2014년 12월 28일 일요일

Wired Tiger cache size. The answer to limiting Mongo's RAM usage?

I've noticed over the short time I've been learning Mongo that an often asked question is "Why is Mongo taking up all my RAM?", which the answer is inevitably, "Because that is the way it works." The next question that comes is, "But can I limit this?" And the answer is, "Not directly through Mongo. No." and obviously playing with the OS is often not the best solution either. The RAM usage isn't a real problem in actuality either, as Mongo will be "pushed back" by the OS, should any other application or service need RAM. But that means a possible contention for RAM, which the user is then responsible for.

But now comes Wired Tiger. It is obviously a whole different take as a storage system than Mongo's "homebrewed" MMAP system. WT doesn't allocate file space up front. It uses compression. It brings document level locking to the table too, which means higher write concurrency. (and I am assuming the "optimistic concurrency control algorithms" in WT are being ported to MMAP too, which means it will also have document level locking too! Yippeee!)

So now to my question and getting back to the first paragraph about memory management. There is a setting for WT called "storage.wiredTiger.engineConfig.cacheSizeGB" The docs says this about it.

Defines the maximum size of the cache that WiredTiger will use for all data. Ensure that storage.wiredTiger.engineConfig.cacheSizeGB is sufficient to hold the entire working set for the mongod instance.


Does this now mean Mongo can be "told" to use only a certain amount of RAM? What if you initialize a big cache size and the OS needs room for other apps or services? Will Mongo relinquish the RAM to the OS? Or will there be some issues with the RAM contention? Or does this solve RAM contention problems? What would happen if the RAM size isn't sufficient for the working set?

This is exciting stuff, because it is so new and mysterious. It shouldn't be too exciting on the mysterious stuff though, just on the new!;)



I've run into the same issues you've described in my similarly short period of time working with MongoDB. My group's usage so far has been benchmarking loading ~1 billion documents into an empty db as we're trying to come up with a way to deliver a very large dataset to less technical folks for an application we'll be delivering. So far on 2.6 the memory issue has been a serious one, as after about 100M documents (which takes anywhere from 3-6 hours) the bulk inserts slow down to about 10M inserts/hour, and any other io operations on any one shard quickly turns that shard into the bottleneck. 

We've tested on a variety of hardware, and whether the environment has 64G or 256G of memory, you can be sure that by the time the collection reaches ~100M documents, that memory is maxed and the system is unusable to anything but mongod.

Enter WiredTiger: the results have improved dramatically. Simply using defaults, the bulk insert gets to ~500M documents in about 9 hours, and perhaps even more surprisingly, the mongod processes are only taking 10G RAM. Experimenting with the cachesize parameter is next on the list, as we can actually afford to allocate more RAM to each mongod than it is taking by default.



FYI, for default, 2.8.0 with wired tiger allocates 1/2 physical RAM
for the cache_size setting.

You can tune it up or down, but it seemed like a good starting point
for everyone.   Obviously running multiple mongod processes on one
machine require tuning this number, but I don't anticipate too many
would need to run multiple mongod's on the same machine unless one of
them was something really small (in which case it won't use majority
of its cache anyway).


댓글 없음:

댓글 쓰기