2014년 12월 2일 화요일

Indexes on system.namespaces

Why is it illegal to create an index on the system.namespaces collection in 2.6? We've been doing this for a long time, since we do lookups against system.namespaces in various parts of our code to see if collections or indexes already exist. With the index, lookups are very fast and log at < 0ms. Without it we see nscanned as high as 25k in sample workloads, with average duration in the 100s of ms and peak durations as high as 8s. All of our queries are against the name field and are generally exact match lookups or prefix searches. Our nssize is 256, so we do run with larger namespaces.

This caught us off guard when testing 2.6, because when we run a 2.6 binary against our existing data files, the index is there and is still used. Only when we create a new database and attempt to index its namespace collection do we run into problems. Further, if a 2.6 secondary is syncing to a 2.4 primary, an index build on system.namespaces on the primary is happily picked up and built on the secondary without error. Given this, the restriction seems arbitrary. I tested one of our sample workloads against 2.6.5 where all of the existing indexes on system.namespaces were removed, and saw an 11% performance degradation. Is indexing system.namespaces harmful in some way? Are there any workarounds for doing the kind of lookups we're doing now?



You shouldn't be querying against system collection since it's an implementation detail.  It will not be available with other storage engines anyway.

Why do you need to query to see if collection exists?  You can just query for count from it - if it's not there you'll get 0...



Didn't know that system.namespaces might be unavailable in other storage engines. That's definitely good to know. :)

Turns out our biggest use cases are actually checking if indexes exist already, and also checking the total # of indexes on a collection. Some parts of our application logic check if indexes already exists before building them. We also need to know the total # of indexes so we don't exceed the max indexes per collection. We use caching to avoid repeatedly querying for this information, but it's still important that it remains fast when we do go to the db. Without an index on system.namespaces we are left without an efficient way to check if an index already exists. 

We don't want to make unnecessary calls to ensureIndex either. From what we've been able to tell, mongo has to do a collection scan when determining if an index already exists: https://github.com/mongodb/mongo/blob/r2.6.5/src/mongo/db/catalog/index_catalog.cpp#L1044-L1052



There might be ways around this - remind me again, how big are your larger system.namespaces collections (i.e. how many indexes and collections across a single database)?



Our largest system.namespaces collection is around 32000.


댓글 없음:

댓글 쓰기