2014년 12월 11일 목요일

Unbalanced shards and balancing error

I am running MongoDB 2.6.4 with 4 servers and 8 shards (each server has four shards, and +1 replication). I have several mongos routers and three config servers.

While balancing a collection I ended up with this (it was not pre-sharded):
       {  "_id" : "rty",  "partitioned" : true,  "primary" : "rty" }
                rty.users
                        shard key: { "_id" : "hashed" }
                        chunks:
                                rty-sh4  880
                                rty-sh5  880
                                rty-sh7  880
                                rty-sh6  880
                                rty-sh8  880
                                rty-sh1  881
                                rty-sh3  880
                                rty      881


As you can see, the data distribution looks very wrong:
Totals
 data : 218.57GiB docs : 631746259 chunks : 7042
 Shard rty contains 19.7% data, 19.21% docs in cluster, avg obj size on shard : 380B
 Shard rty-sh1 contains 40.58% data, 45.32% docs in cluster, avg obj size on shard : 332B
 Shard rty-sh3 contains 5.86% data, 4.98% docs in cluster, avg obj size on shard : 437B
 Shard rty-sh4 contains 7.39% data, 6.89% docs in cluster, avg obj size on shard : 398B
 Shard rty-sh5 contains 5.89% data, 5.04% docs in cluster, avg obj size on shard : 434B
 Shard rty-sh6 contains 7.32% data, 6.73% docs in cluster, avg obj size on shard : 404B
 Shard rty-sh7 contains 5.89% data, 5.04% docs in cluster, avg obj size on shard : 434B
 Shard rty-sh8 contains 7.33% data, 6.75% docs in cluster, avg obj size on shard : 403B

Going through the logs, I found the following:
27019/ded1326:27017:1409603898:1804289383', sleeping for 30000ms
2014-10-08T06:02:05.959-0500 [LockPinger] cluster10.30.1.162:27019,10.30.1.166:27019,10.30.1.78:27019 pinged successfully at Wed Oct  8 06:02:05 2014 by distributed lock pinger '10.30.1.162:27019,10.30.1.166:27019,10.30.1.78:27019/ded1326:27017:1409603898:1804289383', sleeping for 30000ms
2014-10-08T06:02:07.087-0500 [conn474] SyncClusterConnection connecting to [10.30.1.162:27019]
2014-10-08T06:02:07.088-0500 [conn474] SyncClusterConnection connecting to [10.30.1.166:27019]
2014-10-08T06:02:07.088-0500 [conn474] SyncClusterConnection connecting to [10.30.1.78:27019]
2014-10-08T06:02:07.112-0500 [conn475] ChunkManager: time to load chunks for rty.users: 24ms sequenceNumber: 2265652 version: 3523|1||5406025d06417b544f8da8a4 based on: 3522|1||5406025d06417b544f8da8a4
2014-10-08T06:02:07.566-0500 [Balancer] couldn't find database [rty_test] in config db
2014-10-08T06:02:07.695-0500 [Balancer]          put [rty_test] on: rty-sh4:rty-sh4/10.30.1.170:27019,10.30.1.66:27019
2014-10-08T06:02:07.735-0500 [Balancer] warning: could not move chunk  min: { _id: MinKey } max: { _id: -9223075039127251768 }, continuing balancing round :: caused by :: 10181 not sharded:rty_test.users_test
2014-10-08T06:02:07.735-0500 [Balancer] moving chunk ns: perf_test.test3 moving ( ns: perf_test.test3, shard: rty:rty/10.30.1.122:27017,10.30.1.158:27017, lastmod: 2|50||000000000000000000000000, min: { _id: MinKey }, max: { _id: -8849721541900570023 }) rty:rty/10.30.1.122:27017,10.30.1.158:27017 -> rty-sh3:rty-sh3/10.30.1.122:27019,10.30.1.158:27019
2014-10-08T06:02:08.661-0500 [Balancer] moveChunk result: { ok: 0.0, errmsg: "ns not found, should be impossible" }
2014-10-08T06:02:08.662-0500 [Balancer] balancer move failed: { ok: 0.0, errmsg: "ns not found, should be impossible" } from: rty to: rty-sh3 chunk:  min: { _id: MinKey } max: { _id: -8849721541900570023 }
2014-10-08T06:02:08.852-0500 [conn477] ChunkManager: time to load chunks for rty.users: 104ms sequenceNumber: 2265653 version: 3523|1||5406025d06417b544f8da8a4 based on: (empty)
2014-10-08T06:02:08.872-0500 [Balancer] distributed lock 'balancer/srv0001:27017:1409603898:1804289383' unlocked. 

After that, there are numerous entries about various moves failing with the same error message. Can someone shed some light on how this can happen, what to do to balance it (if there are options other than dump annd re-import because there is over 300M documents) and how to prevent this from happening in the future.



The error messages don't have to do with the namespaces rty.users, which is what you've shown the chunk information for. rty.users looks to be fine from what you've shown here. The balancer tries to even up the number of chunks between the shards, and it looks like it was successful. Are the namespaces perf_test.test3 or rty_test.users_test present in the config database or on any of the shards?



Those were dropped on purpose as a result of testing and I thought the log entries could point to something being wrong with the config servers, so I pasted the messages even though it's not related to the same namespace. You are saying that it's normal to have such discrepancy between amount of data in shards?



The balancer balances based on number of chunks, and chunks can be different sizes, so it can happen that the amount of data on shards differs, especially if some chunks are forced to be large by the nature of the shard key. It is odd to have this happen so extremely, and also for it to happen for a hashed shard key. Where did the statistics of the distribution of data come from? The documents in the shards with "too much" data are noticeably smaller on average than the docs in the other shards. Maybe you are counting more than just the docs in the sharded collection, including some from unsharded collections whose primary shard is rty or rty-sh1?



The situation is like this:
rty:PRIMARY> show dbs;
admin   (empty)
local           24.066GB
rty             61.924GB
rty2            15.946GB
<few very small databases>


rty-sh5:PRIMARY> show dbs;
admin   (empty)
local  12.072GB
rty    25.941GB
rty2   15.946GB

If we ignore 'local' difference, you will see that rty is much larger on the 'rty' shard than on 'rty-sh5'. We have repaired the databases on 'rty' shard after the balancing, as they were even bigger.


댓글 없음:

댓글 쓰기