2015년 1월 1일 목요일

Guidance on Clustering in MongoDB

Scenario : I have a requirement wherein the data volume amounts to 20 TB, which would grow on 10% annually. Please note that all these are all digital assets(ranging from gigs of videos, etc). The availability of information should be 99.4%.

We have decided to go ahead with using mongoDB as the datastore in AWS. We have two servers available on Dev box.

Queries : 
 1) The best architectural approach for this requirement. Is it replica set or sharding or mix of both? I am assuming shard cluster with replica sets would be a good choice . If this approach is wrong, please suggest an alternative.
 2) With currently one available data center and two servers, how many instances of mongos, configs & shards are possible?
 3) Benefit of sharding. Advantage over replicaset?
 4) What all to be done to ensure that bulk upload of digital assets wont cause any hiccups?
 5) How the sharding happens? any configuration on config server to ensure beyond certain GB, data needs to go to the other shrad?

Please help.



Will the digital assets be available for public consumption? Or are they only for private use? Just wondering for my own curiosity.

As for sharding and replicating. In a production environment, you will always have replication. Considering the size of the data, you'll probably want to shard too. But, it also depends on how the system will be used. 

You're also going to need at least 3 machines for a production replica set. So, you are already off by one server.

You can read more about sharding and replicating in the MongoDB manual. 


Therein is explained what the advantages of sharding are. There isn't really a comparison between sharding and replicating. The one you do all the time, replicating. The other you do, when necessary, which is shard.

Also, large files (above 16MB) can only be stored in GridFS, which is a different storage convention within MongoDB. http://docs.mongodb.org/manual/core/gridfs/


댓글 없음:

댓글 쓰기