2014년 12월 16일 화요일

sh.shardCollection fails

I tried to enable sharding on an existing collection which is currently 22TB in size and after 4 days it failed with:

mongos> sh.shardCollection("mail.message_data", { "message_identifier": 1 } )
{
        "code" : 13345,
        "ok" : 0,
        "errmsg" : "exception: splitVector command failed: { timeMillis: 141975845, errmsg: \"exception: BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit.\", code: 13548, ok: 0.0 }"
}

We are currently running mongodb 2.4.12



22TB is more than initial sharding command can handle when trying to
calculate split points.  Take a look at this table:

http://docs.mongodb.org/manual/reference/limits/#Sharding-Existing-Collection-Data-Size

With default chunk size of 64MB, depending on your shard key size.
What's the size of your shard key?   Depending on how small or large it is, you may be able to get past initial splitVector by temporarily raising your chunk size.  It looks like the returned document was getting to be about twice what could have been handled, for 22TB of data that suggests the size of message_identifier may be in the 100 bytes or so?   If that's the case, setting the chunk size to 128MB or maybe even a bit higher will allow shardCollection to succeed.   After you probably want to lower the chunk size back to 64MB, eventually all large chunks will end up being split again.

I do want to point out that 22TB is a pretty huge DB to shard -
assuming you have two shards (one new one) it means 11TB of data will need to be migrated to the new shard before you are balanced.   If you are adding, say, three new shards, then over 16TB of data needs to be migrated to new shards before you are fully balanced...



I set the size to 192MB and now when I run it I get


mongos> sh.shardCollection("mail.message_data", { "message_identifier": 1 } )

{
        "code" : 13345,
        "ok" : 0,

        "errmsg" : "exception: splitVector command failed: { timeMillis: 145862074, errmsg: \"exception: BSONObj size: 27840613 (0x65D0A801) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { message_identifier: \"00006145...\", code: 10334, ok: 0.0 }"
}




Any idea what might be going on here?



You might need to make the chunk size significantly larger.   Try double the size you already tried.

I'm assuming that your selected shard key is granular enough to allow splits into specified size chunks.



Is this is chunk size problem?  It is complaining about the 16MB BSON size limit.  The shard key is the message identifier which is always around 85 bytes and there will be no more than 220 documents associated with any identifier.



The problem is that when the collection is initially sharded, the mongod needs to compute the boundaries for all the chunk ranges.   The response (listing all the ranges) must go into a BSON document (response to the split command).

Assuming your key range can be evenly split, I would have thought that 22T collection can be split into ~200MB chunks...



Sharding is enabled on the collection now and it is balancing.  Thanks for your help!




Glad to hear it!   Balancing this much data will take a while!
Thanks for your patience.



댓글 없음:

댓글 쓰기