I am trying to accomplish this in MongoDB. Didn't think it could get this complicated. thought the problem was interesting to solve.
I am trying to get a count of students by scores for various subjects.
the example below shows 2 subjects, in reality we could run this ad-hoc query for 1 or more subjects. (so cannot "can" and should be real-time)
( for e.g. run the query for a bunch of schools and a bunch of subjects )
the grades are always 1-5 no decimal point
use students
db.studentsummary.insert ( {school:'atl1', sname : 'Sean' , sub1: 4, sub2 :5 })
db.studentsummary.insert ( {school:'atl1', sname : 'chris' , sub1: 4, sub2 :3 })
db.studentsummary.insert ( {school:'atl1', sname : 'becky' , sub1: 5, sub2 :4 })
db.studentsummary.insert ( {school:'atl1', sname : 'sam' , sub1: 5, sub2 :4 })
db.studentsummary.insert ( {school:'atl2', sname : 'dustin' , sub1: 2, sub2 :2 })
db.studentsummary.insert ( {school:'atl2', sname : 'greg' , sub1: 3, sub2 :4 })
db.studentsummary.insert ( {school:'atl2', sname : 'peter' , sub1: 5, sub2 :1 })
db.studentsummary.insert ( {school:'atl2', sname : 'brad' , sub1: 2, sub2 :2 })
db.studentsummary.insert ( {school:'atl2', sname : 'liz' , sub1: 3, sub2 :null })
Desired Output:(Would like to see how close we could get to the desired output below)
show how many got a 5; how many got a 4 and so on...
I tried quite a bit - trying to group by each subject and run different pipelines based on the subjects chosen for query and let the front end manage the merge and pivot , performance was unacceptable. what not.
help will be very highly appreciated.
If I'm reading your question correctly, the crux of the problem is counting the number of instances of each value of the triple (school, subject, score). As your data is currently modeled, this will be difficult because you have a piece of data, the subject, stored as a key rather than as a value. Remodelling the data like so
{ "school" : "atl1", "sname" : "Sean" , "exams" : [{ "subject" : "sub1", "score" : 4 }, { "subject" : "sub2", "score" : 5 }] }
you can compute the counts using the following aggregation pipeline
db.studentsummary.aggregate([
{ "$unwind" : "$exams" },
{ "$group" : { "_id" : { "school" : "$school", "subject" : "$exams.subject", "score" : "$exams.score" }, "count" : { "$sum" : 1 } } }
])
thanks for taking the time to looking into the problem.
There are other queries written based on the current schema.
I will see the impact with the new schema and report back here.
But I greatly appreciate your help.
You *can* do this aggregation against the current schema, but it involves knowing in advance what all the subjects are which can be problematic and may not be maintainable.
You can do the reverse of this aggregation to project the opposite schema:
db.studentsummary.aggregate( [
{$group:{_id:"$_id",sub1:{$ addToSet:{"subject":{$literal: "sub1"},score:"$sub1"}},sub2:{ $addToSet:{subject:{$literal:" sub2"},score:"$sub2"}},school: {$first:"$school"},sname:{$ first:"$sname"}}},
{$project:{exams:{$setUnion:[" $sub1","$sub2"]}, school:1,sname:1}},
{ "$unwind" : "$exams" },
{ "$group" : { "_id" : { "school" : "$school", "subject" : "$exams.subject", "score" : "$exams.score" }, "count" : { "$sum" : 1 } } },
{$sort:{school:1,subject:1, score:1,count:1}}
] )
The first step turns "subjectName":"score" pairs into array of single element in format: {subject:"subjectName", score: "score" } documents.
Second step creates a single array of all subject scores. The rest is from Will's message.
Note that you need to have as many transformation fields in $project as you do different subjects and you must know them in advance to write the pipeline.
Much of these queries get built at runtime based on the columns chosen by the user. we will redesign the schema (A significant shift) from where we are at currently and see what the impact is. If other queries get easier and the performance is better. I guess, we will just have to change the schema and reload !
Thanks for the time taken to figure this out and provide a direction to us. it is immense.
Since MongoDB doesn’t support transposing rows to columns directly and have difficulty in realizing enumeration grouping, you should first retrieve the desired data and then perform the operation using programming languages like Java and Php.
But it also gets trouble when realizing set operations. In view of this, you can try using esProc to help MongoDB with your task. For more information , please refer to http://dataccelerator. wordpress.com/2014/11/19/ esproc-helps-with-computation- in-mongodb-cross-summarizing/ .
esProc provides a JDBC interface for the Java program and the reporting tool. The access of an esProc program is similar to that of a database.
댓글 없음:
댓글 쓰기