2014년 12월 7일 일요일

Is it better to issue an update for each document, or in batch for collection sharded on hashed bson id?

I have a collection that is sharded on _id:hashed. I have more than a dozen shards and I frequently need to write to millions of documents at once. I currently write to 100 at a time using a query with {_id:{$in:[...]}} but the writes can cause 100%+ locking according to mms, and the performance decreases. I'm going to add more shards but is it better to make one write per document instead of one write per X documents given this shard key?




This is a tricky situation - because of the hashed shard key, there is
no way to target batches of updates to a specific shard - this is one
of the disadvantages of a hashed shard key, no ranges of key values to
take advantage of.

Sending individual updates will be "faster" for mongos to handle, but
would incur a lot more calls from your app to mongos.

Assuming you're using 2.6 there is a possible third option which is to
send batches of single updates (no $in) to mongos.  This way you
minimize the number of calls to mongos and number of round trips, but
allow mongos to distribute the updates in likely more efficient
manner.

Be sure to use option:  ordered: false to allow mongos to parallelize the batch.

http://docs.mongodb.org/manual/reference/command/update/#dbcmd.update

Depending on which driver you are using, there may be more "friendly"
ways of building these patches using bulk ops - here's a reference for
the shell, but your driver should have support for this as well.
http://docs.mongodb.org/manual/reference/method/js-bulk/

P.S. 100% write lock is not a problem by itself - all it means is that
the mongod is spending all its time writing.  The problem is only when
that starves out reads (and/or other writes) and that may be unrelated
to how you are sending the writes over...



Thanks. You say that individual updates will be "faster" for mongos -- does it have an impact on the entirety of the writes? Since I'm updating 100 documents in each write, there's a high probability that it's hitting all the shards. Does each shard hold the write lock for the duration of the entire operation, or just the duration that the shard is writing (i.e., with a dozen shards I'd expect roughly 8-9 documents would be updated per shard)? I'm trying to understand what makes it "faster".




The write lock is only held while a single individual write writes (in
RAM).   I don't think that's going to be your problem here.

The problem is that mongos doesn't re-write the $in list and each
shard gets the full statement even if it only owns a subset of the
keys in it...




Does mongos not rewrite any $in queries? Even if I have an $in with one element, will it still send it to all shards?



If you have $in with one value, it will only need to send it to one shard (if that's the shard key).
If its $in 1,2,3 and all three are on different shards the query has to be send to all the shards.
See/vote SERVER-1007


댓글 없음:

댓글 쓰기