2015년 1월 7일 수요일

toring xml files in mongodb

I have a use case that I need help from you all. I have xml data roughly around 50 terabytes to start with supplied through some external source and then every day additional xml data will be around 1 - 2 terabytes roughly to be stored to mongodb. Ultimate aim is to query the xml data in some web app and throw an analytics dashboard. Like top 10 products, top 10 routes etc. As mongodb native format is json , is it a viable option to convert this many xml data to json or can I just store the whole xml data as it is with the transformation.

Any suggestions with the storing part and the querying part. Also please let me know if there are say 50 million such documents how much the query time would be to do joins .



You will want to convert the XML to JSON/BSON; otherwise, it won't be possible to do any reasonable queries. It's difficult to say much more than that about your questions because we have so little information. Can you be more specific about

- what the XML looks like
- what the use case is for the dashboard- what specific analytics do you want?

You will want to design the transformed JSON/BSON documents to fit your use case, especially given the large quantity of data that you have. MongoDB doesn't do join. Joins must be done application side. You should design the documents to make large-scale joins unnecessary, or necessary only for rare queries that you're willing to wait for.


댓글 없음:

댓글 쓰기