Mongodb User Forum News: Slow writes in sharded environment (via mongoimport)

We're running a shard cluster with 5 nodes and 1 configserver on 5 different VMs. Configserver and Router is running on one of the shard server (This setup is just for testing). Our data looks like standard http access log and we shard it by {ts:"hashed"} (ts is unix timestamp)

sroot@xxx:~$ mongoimport -d test -c transactions --upsert < transactions_2014-11-01.json

connected to: 127.0.0.1

2014-11-03T11:33:03.145+0100 Progress: 8398932/497500910 1%

2014-11-03T11:33:03.145+0100 14500 3625/second

2014-11-03T11:33:06.003+0100 Progress: 12543313/497500910 2%

2014-11-03T11:33:06.003+0100 21700 3100/second

2014-11-03T11:33:09.871+0100 Progress: 15947107/497500910 3%

2014-11-03T11:33:09.871+0100 27600 2760/second

2014-11-03T11:33:12.447+0100 Progress: 18826564/497500910 3%

2014-11-03T11:33:12.447+0100 32600 2507/second

2014-11-03T11:33:15.250+0100 Progress: 21765679/497500910 4%

2014-11-03T11:33:15.250+0100 37700 2356/second

2014-11-03T11:33:18.014+0100 Progress: 24764702/497500910 4%

2014-11-03T11:33:18.014+0100 42900 2257/second

Which is around 2K/sec

While both in standalone and RS we used to get 10K/sec. Just to add cpu, iostat and networking stat doesn't slow any load on any of the shard server and when we write straight on one of the mongod (shard instance) we get like 12K/sec but then data is not balanced.

I am pretty clueless. I have tried different datasets, different machines, different hardwares but everywhere writes were 5x slower than standalone. Are we missing something? or it might be a mongodb bug?

Thanks in advance,

Deepak Thukral

Smartstream.tv Gmbh

Below are some stats:

sh.status()

{ "_id" : "test", "partitioned" : true, "primary" : "shard0004" }

test.transactions

shard key: { "ts" : "hashed" }

chunks:

shard0003 879

shard0002 878

shard0004 879

shard0000 879

shard0001 878

too many chunks to print, use verbose if you want to force print

mongos> db.serverStatus()

{

"host" : "xxxxxx",

"version" : "2.6.4",

"process" : "mongos",

"pid" : NumberLong(4336),

"uptime" : 293555,

"uptimeMillis" : NumberLong(293554773),

"uptimeEstimate" : 291113,

"localTime" : ISODate("2014-11-03T10:42:01.503Z"),

"asserts" : {

"regular" : 0,

"warning" : 0,

"msg" : 0,

"user" : 0,

"rollovers" : 0

"connections" : {

"current" : 1,

"available" : 51199,

"totalCreated" : NumberLong(107)

"extra_info" : {

"note" : "fields vary by platform",

"heap_usage_bytes" : 4145744,

"page_faults" : 61

"network" : {

"bytesIn" : 91926680821,

"bytesOut" : 10568379,

"numRequests" : 147274168

"opcounters" : {

"insert" : 55711,

"query" : 1413,

"update" : 147138725,

"delete" : 78,

"getmore" : 0,

"command" : 78247

"mem" : {

"bits" : 64,

"resident" : 16,

"virtual" : 162,

"supported" : true

"metrics" : {

"getLastError" : {

"wtime" : {

"num" : 0,

"totalMillis" : 0

}

"ok" : 1

}

You won't fully saturate the cluster unless you parallelize the inserts across multiple threads. mongoimport is presently (2.6) single-threaded, so try running several in parallel or writing a multithreaded script to handle insertions. Also try using mongostat with --discover to examine the activity on the servers - the above statistics aren't very helpful in determining resource limitations.

Also, I'd recommend running with 3 config servers, even in a testing/development setup.

Thanks for replying. We did try parallel import but then our import dropped to 700-1000/sec for each thread, moreover we observed locks and faults.

Here is an output of mongostat while running three instances of mongoimport.

https://gist.githubusercontent.com/iapain/0d8a0d80a1127d039899/raw/b35ebba3f816f7067286f796e29fb5ca4aa63fb3/gistfile1.txt

Still it's unclear to me that why on standalone we get 10K/sec and on sharded env we get 2K/sec.

Regarding config servers, I will try with 3 config server to see if it improves write which I am pretty skeptic.

How many mongos processes were you importing against?

We tested 2-5 mongoimport processes against one mongos process. Shouldn't there be a one mongos process?

No, absolutely not, mongos can become a bottleneck just like any other step - generally it's a good plan to have a single mongos *per* application server (and to run it on the same machine to avoid the extra network hop to keep latency down).

I would recommend trying multiple imports against multiple mongos processes - preferably not all on a single machine unless it's a pretty beefy box (otherwise all the processes are just fighting for the same CPU cycles).

Thanks Asya, now it makes more sense to me.

Mongodb User Forum News

2014년 12월 12일 금요일

Slow writes in sharded environment (via mongoimport)

댓글 없음:

댓글 쓰기