2014년 12월 4일 목요일

Single collection vs Multiple dynamically created Collections through Java UI

Currently, I have a single collection, storing varied types of documents that grow by 20% everyday. Until now we have not faced any issues, but lately we have started thinking of various performance optimization options. One such though was on using multiple collections (having one collection for one type of document). Since I have multiple collections, I will have indexes on each collection. My queries are:

1. Will this model of having one collection for each document type help, since the single collection grows variably?
2. How will indexing optimize - as I think equal number of indexes will be loaded in memory in both cases.
3. Will it improve query performance?

Note: I have not considered using Sharding yet, both on single as well as multiple collection model.



More information would help to give you an informed answer - what are the different types of documents in this collection and how are you using them other than updating them in a way that grows?   What does the data in them represent?   What about the updates that grow them?   Do you always use the full document that's there?

It's possible that there is a different way you can change your schema/approach - keeping documents in the same collection but storing things over time in different documents rather than updating existing documents with new information but I don't know if that might work without more details.



Answers Inline:



I'm sorry, maybe there is a terminology mix-up here.

There isn't a general "write" operation.  You can "insert" a document, you can "update" a document or you can "remove" a document.   Those are the available write operation.

You can only "insert" a document once - this is because once a document exists, to change its contents you can update it but you cannot insert it again.   If you are using the "save" operation (which is provided by most drivers) that does an insert if the _id of the document does not exist, or if the _id exists then it does an update of that document with the new value.    I guess one other possibility is removing the existing document and inserting a new version of it, but since that's not atomic there is no advantage to that and so I'm going to discount that sequence of writes.

Which of these are you doing?     And what are the different types of documents that you are considering separating into different collections?



Sorry. My bad. Only Inserts, no updates.


댓글 없음:

댓글 쓰기