2014년 12월 4일 목요일

Multiple number of databases

What's the best practice to support a lot of user (let's say much more than million, for the discussion sake), each have not too much data and there is no need to share any data cross users?
Is using a dedicated database per user (with several collections in each) is the right way?
What are the downsides of such approach?



One concern is the amount of disk space each database takes up. Mongo has a pre-allocation system for the database files, which require a larger amount of space to be used up than is actually needed. At the most "inefficient" moment, this could mean over 3GB of unused disk space per user. Multiply this by millions of users and all that empty unused disk space could get quite expensive. You could set up databases to use smaller files sizes, but this is not recommended for production use.

There is also a limit on the number of databases, which can actually be running at one time in a single database instance. It depends on the hardware setup you have, but there is a limit.

I'd say, you'll want to do a mix of both, if the application each user is using is distinct. You can run a single database and have a good number of users in it. You can have up to about 3 million namespaces. So, if say you have 3 collections and one index per collection per user, you can have 500K users in one database. I personally wouldn't stretch these limits though, because, for instance,  you might want to add some feature which will require a new collection and maxing out thenamespaces will cause you to not be able to add collections or indexes later. It is also a general rule in IT not to get close to any limits, as it usually means real and serious problems either with the technology or with the business or with both. So a better direction to take is have say a few thousand customers in one database and increase the number of databases, as your user base grows. At one point, you might need another database server cluster for more users. 

This suggestion is also totally dependant on your use case and what you are trying to accomplish. You might actually want to just have one database and store the user data across different collections. Without knowing more about your application, it is hard to tell. If you don't need to have physical separation of data between the users, this might be the better way to go. 

One thing is certain, Mongo is a highly flexible and remarkable database. It actually almost allows too much freedom, because then smart choices need to be made to use it properly and too often these choices are wrong and then people get bent out of shape saying Mongo is a poor database, which simply isn't true. It is good you are asking questions first, to make the right choices later.



Thanks a lot for your detailed answer.
Indeed it seems like a good solution.
However, since I expect a lot of concurrent writes and MongoDB locks the entire database, users will affect each other although they share no common  data.
Is there any solution to this issue?



Database-level write lock contention is generally not a bottleneck unless you have a very high number of concurrent writes, poor schema, or under-provisioned servers.

If you are planning on scaling up to a large number of users, you need to consider how your infrastructure scales in general. Your end user interaction will presumably be managed through some sort of app or API so there is a level of indirection rather than direct connection to your database backend. You could chose to scale out using a sharded MongoDB deployment or by partitioning users across multiple deployments if there really is no need for a single logical database. There are many different success stories of using MongoDB at scale: http://www.mongodb.com/mongodb-scale.

FYI, one of the headline features for the upcoming MongoDB 2.8 release is improved concurrency. There is also a new pluggable storage API which enables additional storage engines to be implemented: http://blog.mongodb.org/post/102461818738/announcing-mongodb-2-8-0-rc0-release-candidate-and-bug.

MongoDB 2.8's default memory-mapped storage engine (now referred to as "MMAPv1") has collection-level locking; the new WiredTiger storage engine supports document-level locking and some new features like on-disk compression.


댓글 없음:

댓글 쓰기