I have a bunch of data that I want to split based on geography.
I have the following setup: \
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId(" 547496dd009cc54d845c2ff1")
}
shards:
{ "_id" : "shard0", "host" : "shard0/mongohost1:27017", "tags" : [ "BAN" ] }
{ "_id" : "shard1", "host" : "shard1/mongohost2:27017", "tags" : [ "CAN", "DEU", "JPN" ] }
{ "_id" : "shard2", "host" : "shard2/mongohost3:27017", "tags" : [ "TAI", "PER" ] }
databases
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "widgets", "partitioned" : true, "primary" : "shard2" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0" }
{ "_id" : "db", "partitioned" : false, "primary" : "shard2" }
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("
}
shards:
{ "_id" : "shard0", "host" : "shard0/mongohost1:27017", "tags" : [ "BAN" ] }
{ "_id" : "shard1", "host" : "shard1/mongohost2:27017", "tags" : [ "CAN", "DEU", "JPN" ] }
{ "_id" : "shard2", "host" : "shard2/mongohost3:27017", "tags" : [ "TAI", "PER" ] }
databases
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "widgets", "partitioned" : true, "primary" : "shard2" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0" }
{ "_id" : "db", "partitioned" : false, "primary" : "shard2" }
My actual data in widgets looks like this:
shard1:PRIMARY> db.nwidget.findOne()
{
"_id" : ObjectId(" 54763621e0c1b5494079ce16"),
"location" : "BAN",
"name" : "BAN-Widget-1-VV-5",
"rt_id" : 1976,
"type" : "widget_type2"
}
shard1:PRIMARY>
shard1:PRIMARY> db.nwidget.findOne()
{
"_id" : ObjectId("
"location" : "BAN",
"name" : "BAN-Widget-1-VV-5",
"rt_id" : 1976,
"type" : "widget_type2"
}
shard1:PRIMARY>
Since all documents in the nwidgets collection will have a location field, it would be nice if mongodb could use this field to decide where to put the record. In this case, I would like this record to live in shard0 but right now, everything is in shard1, regardless of what the Location field contains.
How much total data do you have (bytes)? What does the chunk distribution (from sh.status) look like? Is the balancer on? You can check with sh.getBalancerState(). Have you set shard key tag ranges in addition to tagging the shards? See Manage Shard Tags in the manual.
Right now, I have a piddly amount of data, literally 25 - 30 small documents per "location". But if possible, i would like to see mongoDB split the data between the different shards, based on the location.
Since my first post, I've made the following changes:
1. created ranges the following ranges:
so now, sh.status shows the following:
2. Dumped the data in the primary (jjrs2) and re-imported the data, but included a new field called location_range. The location_range is just a temporary field until I figure out how mongoDB works, and how I want to organize my data. But for now, it is unique across the collection... and divided up to represent different countries, as you can see from the tag ranges defined above.
Problem / Question
I've checked the shards and the only one with the "widgets" database on it, is the primary one - jjrs1. I was hoping that each shard would have a part of the widgets database on it.
To answer your question, the balancer is on... please note the following:
I realize that the ranges I've created are limiting the growth of the system, but this is just a test... and for now, I know I will never exceed these values. However, if you have any other comments on how I can design a good key, please let me know.
Since my first post, I've made the following changes:
1. created ranges the following ranges:
mongos> sh.addTagRange("widgets. nwidgets", { location_range: 151000 }, { location_range: 200000 }, "CAN")
mongos> sh.addTagRange("widgets. nwidgets", { location_range: 251000 }, { location_range: 300000 }, "PER")
mongos> sh.addTagRange("widgets. nwidgets", { location_range: 101000 }, { location_range: 150000 }, "DEU")
mongos> sh.addTagRange("widgets. nwidgets", { location_range: 51000 }, { location_range: 100000 }, "USA")
mongos> sh.addTagRange("widgets. nwidgets", { location_range: 1 }, { location_range: 50000 }, "JPN")
mongos> sh.addTagRange("widgets. nwidgets", { location_range: 201000 }, { location_range: 250000 }, "TAI")
mongos> sh.addTagRange("widgets.
mongos> sh.addTagRange("widgets.
mongos> sh.addTagRange("widgets.
mongos> sh.addTagRange("widgets.
mongos> sh.addTagRange("widgets.
so now, sh.status shows the following:
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId(" 547496dd009cc54d845c2ff1")
}
shards: { "_id" : "shard0", "host" : "jjrs0/mongohost1:27017", "tags" : [ "USA" ] }
{ "_id" : "shard1", "host" : "jjrs1/mongohost2:27017", "tags" : [ "CAN", "DEU", "TAI" ] }
{ "_id" : "shard2", "host" : "jjrs2/mongohost3:27017", "tags" : [ "JPN", "PER" ] }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "widgets", "partitioned" : true, "primary" : "jjrs2" }
{ "_id" : "test", "partitioned" : false, "primary" : "jjrs0" }
{ "_id" : "db", "partitioned" : false, "primary" : "jjrs2" }
mongos>
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("
}
shards: { "_id" : "shard0", "host" : "jjrs0/mongohost1:27017", "tags" : [ "USA" ] }
{ "_id" : "shard1", "host" : "jjrs1/mongohost2:27017", "tags" : [ "CAN", "DEU", "TAI" ] }
{ "_id" : "shard2", "host" : "jjrs2/mongohost3:27017", "tags" : [ "JPN", "PER" ] }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "widgets", "partitioned" : true, "primary" : "jjrs2" }
{ "_id" : "test", "partitioned" : false, "primary" : "jjrs0" }
{ "_id" : "db", "partitioned" : false, "primary" : "jjrs2" }
mongos>
2. Dumped the data in the primary (jjrs2) and re-imported the data, but included a new field called location_range. The location_range is just a temporary field until I figure out how mongoDB works, and how I want to organize my data. But for now, it is unique across the collection... and divided up to represent different countries, as you can see from the tag ranges defined above.
Problem / Question
I've checked the shards and the only one with the "widgets" database on it, is the primary one - jjrs1. I was hoping that each shard would have a part of the widgets database on it.
To answer your question, the balancer is on... please note the following:
mongos> sh.getBalancerState()
true
mongos>
true
mongos>
I realize that the ranges I've created are limiting the growth of the system, but this is just a test... and for now, I know I will never exceed these values. However, if you have any other comments on how I can design a good key, please let me know.
In case this helps, I found out how to query the config database for more information about my setup:
As you can see, shard0 is correct setup to hold records with the BRK tag... but it doesn't look like I set up the range properly.
I'm not sure what I've done wrong.
mongos> use config
switched to db config
mongos> db.shards.find({tags:"BRK"})
{ "_id" : "shard0", "host" : "jjrs0/mongohost1:27017", "tags" : [ "BRK" ] }
mongos> db.tags.find({tags:"BRK"})
mongos>
switched to db config
mongos> db.shards.find({tags:"BRK"})
{ "_id" : "shard0", "host" : "jjrs0/mongohost1:27017", "tags" : [ "BRK" ] }
mongos> db.tags.find({tags:"BRK"})
mongos>
As you can see, shard0 is correct setup to hold records with the BRK tag... but it doesn't look like I set up the range properly.
I'm not sure what I've done wrong.
> I'm not sure what I've done wrong.
Did mongoimport load it into the shard mongod or into mongos (it
shouldn't matter what machine it's run from just which host/port it's
connecting to.
But if you loaded data directly into the shard bypassing mongos that
would explain some of the issue. What do you see when you run
printShardingSizes() command from the mongos mongo shell?
You have specified your tag assignment on a field that doesn't exist in your document: { location_range: 151000 }
What's the shard key for you collection? That's the min/max values that you should be passing to addTagRange. What that command says is "for documents that have shard key values from <min> to <max> put them on shard tagged with <XXX>"
Hi there Asya, thanks for taking the time to respond / help me out.
so... in fact that field does exist. and I was hoping to use that as the shard key... I want to use custom tags.
You can see from the data above, that 1) I do have the location_range field defined... and 2) the BRK data is in shard1 when I was hoping it would be in shard0
Hope this clarifies a little more for you.
so... in fact that field does exist. and I was hoping to use that as the shard key... I want to use custom tags.
me@mongohost2:/tmp$ mongo
MongoDB shell version: 2.6.5
connecting to: test
shard1:PRIMARY> show databases
admin (empty)
local 1.078GB
widgets 0.078GB
shard1:PRIMARY> use widgets
switched to db racktables
shard1:PRIMARY> db.nwidgets.find().pretty()
{
"_id" : ObjectId(" 54777c121519a051323f4309"),
"location_range" : 201001,
"location" : "BRK",
"name" : "B-w-991",
"rt_id" : 806,
"type" : "the best widgets ever"
}
{
"_id" : ObjectId(" 54777c131519a051323f430a"),
"location_range" : 201002,
"location" : "BRK",
"name" : "B-w-ss-21",
"rt_id" : 807,
"type" : "test widget"
}
{
"_id" : ObjectId(" 54777c131519a051323f430b"),
"location_range" : 201003,
"location" : "BRK",
"name" : "B-w-123123",
"rt_id" : 857,
"type" : "another test widget"
}
MongoDB shell version: 2.6.5
connecting to: test
shard1:PRIMARY> show databases
admin (empty)
local 1.078GB
widgets 0.078GB
shard1:PRIMARY> use widgets
switched to db racktables
shard1:PRIMARY> db.nwidgets.find().pretty()
{
"_id" : ObjectId("
"location_range" : 201001,
"location" : "BRK",
"name" : "B-w-991",
"rt_id" : 806,
"type" : "the best widgets ever"
}
{
"_id" : ObjectId("
"location_range" : 201002,
"location" : "BRK",
"name" : "B-w-ss-21",
"rt_id" : 807,
"type" : "test widget"
}
{
"_id" : ObjectId("
"location_range" : 201003,
"location" : "BRK",
"name" : "B-w-123123",
"rt_id" : 857,
"type" : "another test widget"
}
You can see from the data above, that 1) I do have the location_range field defined... and 2) the BRK data is in shard1 when I was hoping it would be in shard0
Hope this clarifies a little more for you.
Is the collection sharded? What's the shard key? What does
sh.status() show? In the first post in this thread it showed that *no collection was sharded*!
Tags can only tell the balancer to balance chunks of a sharded collection based on values (or prefixes of values) of shard keys.
In my repeated attempts to set this thing up, I must have skipped the sh.shardCollection() command the last time I tried. So... I reran that command and now it looks a little better:
mongos> sh.shardCollection("widgets. nwidgets", { location_range: 1} )
{ "collectionsharded" : "widgets.nwidgets", "ok" : 1 }
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId(" 547496dd009cc54d845c2ff1")
}
shards: { "_id" : "shard0", "host" : "shard0/mongohost1:27017", "tags" : [ "BAN" ] }
{ "_id" : "shard1", "host" : "shard1/mongohost2:27017", "tags" : [ "CAN", "DEU", "JPN" ] } { "_id" : "shard2", "host" : "shard2/mongohost3:27017", "tags" : [ "TAI", "PER" ] }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "widgets", "partitioned" : true, "primary" : "shard2" }
widgets.nwidgets
shard key: { "location_range" : 1 }
chunks:
shard2 2
{ "location_range" : { "$minKey" : 1 } } -->> { "location_range" : 1 } on : shard2 Timestamp(1, 1)
{ "location_range" : 1 } -->> { "location_range" : { "$maxKey" : 1 } } on : shard2 Timestamp(1, 2)
tag: JPN { "location_range" : 1 } -->> { "location_range" : 50000 }
tag: USA { "location_range" : 51000 } -->> { "location_range" : 100000 }
tag: DEU { "location_range" : 101000 } -->> { "location_range" : 150000 }
tag: CAN { "location_range" : 151000 } -->> { "location_range" : 200000 }
tag: TAI { "location_range" : 201000 } -->> { "location_range" : 250000 }
tag: PER { "location_range" : 251000 } -->> { "location_range" : 300000 }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0" }
{ "_id" : "db", "partitioned" : false, "primary" : "shard2" }
{ "_id" : "records", "partitioned" : false, "primary" : "shard2" }
But... it looks like all the data is on shard1, which is not really what I'm aiming for...
But the good news is that at least now, I have a widgets database on all my shards....
I will try to add more data to see if that makes a difference in how the data is split up between the three shards, but in the interim, if you can see what else I've missed, I'm all ears.
mongos> sh.shardCollection("widgets.
{ "collectionsharded" : "widgets.nwidgets", "ok" : 1 }
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("
}
shards: { "_id" : "shard0", "host" : "shard0/mongohost1:27017", "tags" : [ "BAN" ] }
{ "_id" : "shard1", "host" : "shard1/mongohost2:27017", "tags" : [ "CAN", "DEU", "JPN" ] } { "_id" : "shard2", "host" : "shard2/mongohost3:27017", "tags" : [ "TAI", "PER" ] }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "widgets", "partitioned" : true, "primary" : "shard2" }
widgets.nwidgets
shard key: { "location_range" : 1 }
chunks:
shard2 2
{ "location_range" : { "$minKey" : 1 } } -->> { "location_range" : 1 } on : shard2 Timestamp(1, 1)
{ "location_range" : 1 } -->> { "location_range" : { "$maxKey" : 1 } } on : shard2 Timestamp(1, 2)
tag: JPN { "location_range" : 1 } -->> { "location_range" : 50000 }
tag: USA { "location_range" : 51000 } -->> { "location_range" : 100000 }
tag: DEU { "location_range" : 101000 } -->> { "location_range" : 150000 }
tag: CAN { "location_range" : 151000 } -->> { "location_range" : 200000 }
tag: TAI { "location_range" : 201000 } -->> { "location_range" : 250000 }
tag: PER { "location_range" : 251000 } -->> { "location_range" : 300000 }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0" }
{ "_id" : "db", "partitioned" : false, "primary" : "shard2" }
{ "_id" : "records", "partitioned" : false, "primary" : "shard2" }
But... it looks like all the data is on shard1, which is not really what I'm aiming for...
shard1:PRIMARY> db.nwidgets.count()
78
shard1:PRIMARY>
shard0:PRIMARY> db.nwidgets.count()
0
shard0:PRIMARY>
shard2:PRIMARY> db.nwidgets.count()
0
shard2:PRIMARY>
78
shard1:PRIMARY>
shard0:PRIMARY> db.nwidgets.count()
0
shard0:PRIMARY>
shard2:PRIMARY> db.nwidgets.count()
0
shard2:PRIMARY>
But the good news is that at least now, I have a widgets database on all my shards....
I will try to add more data to see if that makes a difference in how the data is split up between the three shards, but in the interim, if you can see what else I've missed, I'm all ears.
Is any of the data in ranges that should be on shards other than shard2?
It might take a few minutes - if you ran sh.status() right after your sharded the collection, I'd give it a few minutes (and of course you have to have the balancer running).
On shard1, i ran "db.nwidgets.find().pretty()" and took some snippets for you to consider:
Thanks for your continued assistance.
{
"_id" : ObjectId(" 54777c1a1519a051323f435b"),
"location_range" : 251031,
"location" : "PER",
"name" : "PER-WGD-815-1",
"rt_id" : 2687,
"type" : "test"
}
{
"_id" : ObjectId(" 54777c241519a051323f435c"),
"location_range" : 151001,
"location" : "CAN",
"name" : "CAN-WHS-991-1",
"rt_id" : 878,
"type" : "test"
}
{
"_id" : ObjectId(" 54777c131519a051323f4318"),
"location_range" : 201016,
"location" : "TAI",
"name" : "TAI-124-2",
"rt_id" : 2768,
"type" : "test"
}
{
"_id" : ObjectId(" 54777c131519a051323f4319"),
"location_range" : 201017,
"location" : "TAI",
"name" : "TAI-124-1",
"rt_id" : 2769,
"type" : "test"
}
{
"_id" : ObjectId(" 54777c131519a051323f431a"),
"location_range" : 201018,
"location" : "TAI",
"name" : "TAI-124-AW-2",
"rt_id" : 2770,
"type" : "test"
}
As it stands right now, I have approximately 20 documents for each location in the database. I let the system run over the weekend and just checked the record count on each shard this morning. Everything is as I left it - meaning - all documents are still only on shard1. "_id" : ObjectId("
"location_range" : 251031,
"location" : "PER",
"name" : "PER-WGD-815-1",
"rt_id" : 2687,
"type" : "test"
}
{
"_id" : ObjectId("
"location_range" : 151001,
"location" : "CAN",
"name" : "CAN-WHS-991-1",
"rt_id" : 878,
"type" : "test"
}
{
"_id" : ObjectId("
"location_range" : 201016,
"location" : "TAI",
"name" : "TAI-124-2",
"rt_id" : 2768,
"type" : "test"
}
{
"_id" : ObjectId("
"location_range" : 201017,
"location" : "TAI",
"name" : "TAI-124-1",
"rt_id" : 2769,
"type" : "test"
}
{
"_id" : ObjectId("
"location_range" : 201018,
"location" : "TAI",
"name" : "TAI-124-AW-2",
"rt_id" : 2770,
"type" : "test"
}
Thanks for your continued assistance.
Would the fact that I imported my data into the database using "mongoimport" command from shard1's command line be the issue?
I've been searching around to see how to import from a mongos command prompt / from the query router instead... but I haven't found anything.
I've been searching around to see how to import from a mongos command prompt / from the query router instead... but I haven't found anything.
Did mongoimport load it into the shard mongod or into mongos (it
shouldn't matter what machine it's run from just which host/port it's
connecting to.
But if you loaded data directly into the shard bypassing mongos that
would explain some of the issue. What do you see when you run
printShardingSizes() command from the mongos mongo shell?
댓글 없음:
댓글 쓰기