2015년 1월 1일 목요일

mongos bottleneck when communicating from my mapreduce job to mongodb?

I am still in the architecture phase of my product and I am yet to decide the db for it (It is sure that my processing will be in hadoop but yet to decide on the db).  So, I had a doubt:  According to my limited knowledge, MR jobs depend on the data locality and hence HDFS/HBASE would be good.  But since HBASE not support all my use cases, it is not an option.  Now if I use MongoDB with sharding, 

1) Will I not loose data locality advantage and the data has to be transferred from mongodb node to hadoop node at the time of processing by MR job?
2) Will there be single mongos for communicating my hadoop cluster to mongo shards?  In that case, will the mongos does not become bottleneck and SPOF?
3) Is there a way mongodb runs on the same node as hadoop thereby taking the advantage of data locality?


댓글 없음:

댓글 쓰기