2014년 12월 1일 월요일

how to improve performance of aggregation command

DBObject project=new BasicDBObject();
project.put("_id ",1); 
project.put("position",1);
project.put("cellid",1);    
project.put("mnc",1);    
project.put("mcc",1);
project.put("tac",1);
project.put("pci",1);
project.put("sinr",1);
project.put("rsrp",1);
project.put("rssi",1);
project.put("rsrq",1);
project.put("pingLatency",1);
project.put("ftpDlAvgSpeed",1);
project.put("ftpDlMaxSpeed",1);
project.put("ftpUlAvgSpeed",1);
project.put("ftpUlMaxSpeed",1);
project.put("providerName",1);
project.put("createdTime",1);
project.put("network",1);
project.put("captureHour",1);
project.put("captureMinute",1);
project.put("captureTime",1);
project.put("siteData",1);
project.put("zoom"+zoomLevel,1);
project.put("make",1);
project.put("model",1);
project.put("deviceOS",1);
DBObject groupby=new BasicDBObject();
groupby.put("_id", new BasicDBObject().append("year", new BasicDBObject("$year","$createdTime")).append("month", new BasicDBObject("$month","$createdTime")).append("day", new BasicDBObject("$dayOfMonth","$createdTime")).append("zoom"+zoomLevel,"$zoom"+zoomLevel ));
groupby.put("sinr" , new BasicDBObject().append( "$avg" , "$sinr"));
groupby.put("tac" , new BasicDBObject().append( "$first" , "$tac"));
groupby.put("mcc" , new BasicDBObject().append( "$first" , "$mcc"));
groupby.put("providerName" , new BasicDBObject().append("$first" , "$providerName") );
groupby.put("network" ,new BasicDBObject().append( "$first" , "$network") );
groupby.put("rsrp" ,new BasicDBObject().append( "$avg" , "$rsrp"));
groupby.put("mnc" ,new BasicDBObject().append( "$first" , "$siteData.mnc")  );
groupby.put("cellid" ,new BasicDBObject().append( "$first" , "$siteData.cellid") );
groupby.put("rsrq" ,new BasicDBObject().append( "$avg" , "$rsrq") );
groupby.put("rssi" ,new BasicDBObject().append( "$avg" , "$rssi") );
groupby.put("pingLatency" ,new BasicDBObject().append( "$avg" , "$pingLatency") );
groupby.put("ftpDlAvgSpeed" ,new BasicDBObject().append( "$avg" , "$ftpDlAvgSpeed") );
groupby.put("ftpDlMaxSpeed" ,new BasicDBObject().append( "$avg" , "$ftpDlMaxSpeed") );
groupby.put("ftpUlAvgSpeed" ,new BasicDBObject().append( "$avg" , "$ftpUlAvgSpeed") );
groupby.put("ftpUlMaxSpeed" ,new BasicDBObject().append( "$avg" , "$ftpUlMaxSpeed") );
groupby.put("count" ,new BasicDBObject().append( "$sum" , 1));
groupby.put("position",new BasicDBObject().append("$first","$position"));
groupby.put("siteData",new BasicDBObject().append("$first","$siteData"));
groupby.put("createdTime",new BasicDBObject().append("$first","$createdTime"));
groupby.put("captureTime",new BasicDBObject().append("$first","$createdTime"));
groupby.put("pci",new BasicDBObject().append("$first","$pci"));
groupby.put("id",new BasicDBObject().append("$first","$_id"));
groupby.put("imsi" ,new BasicDBObject().append( "$first" , "$siteData.cellid"));
groupby.put("imei",new BasicDBObject().append( "$first" , "$siteData.cellid"));
groupby.put("make",new BasicDBObject().append("$first","$make"));
groupby.put("model",new BasicDBObject().append("$first","$model"));
groupby.put("deviceOS",new BasicDBObject().append("$first","$deviceOS"));
DBObject projectSelection=new BasicDBObject();
projectSelection.put("position",1);
projectSelection.put("cellid",1);
projectSelection.put("mnc",1);
projectSelection.put("mcc",1);
projectSelection.put("tac",1);
projectSelection.put("pci",1);
projectSelection.put("sinr",1);
projectSelection.put("rsrp",1);
projectSelection.put("rssi",1);
projectSelection.put("rsrq",1);
projectSelection.put("pingLatency",1);
projectSelection.put("ftpDlAvgSpeed",1);
projectSelection.put("ftpDlMaxSpeed",1);
projectSelection.put("ftpUlAvgSpeed",1);
projectSelection.put("ftpUlMaxSpeed",1);
projectSelection.put("providerName",1);
projectSelection.put("createdTime",1);
projectSelection.put("network",1);
projectSelection.put("captureHour",1);
projectSelection.put("captureMinute",1);
projectSelection.put("captureTime",1);
projectSelection.put("siteData",1);
projectSelection.put("imsi",1);
projectSelection.put("imei",1 );
projectSelection.put("count",1 );
projectSelection.put("make",1);
projectSelection.put("model",1);
projectSelection.put("deviceOS",1);
//aggregate data date wise
DBObject projectData=new BasicDBObject().append("$project", project);
DBObject projectSelectionData=new BasicDBObject().append("$project", projectSelection);
DBObject groupbyData=new BasicDBObject().append("$group", groupby);
List<DBObject> pipeline=Arrays.asList(projectData,groupbyData,projectSelectionData);
AggregationOutput output;
output=repo.getCollection("rawSignalData").aggregate(pipeline);
Iterable<DBObject> listObjects=output.results();
for(DBObject dbobject :listObjects){
dbobject.removeField("_id");
SignalData signalData=repo.getConverter().read(SignalData.class, dbobject);
signalData.setYear(signalData.getCaptureTime().getYear()+1900);
signalData.setMonth(signalData.getCaptureTime().getMonth()+1);
signalData.setDay(signalData.getCaptureTime().getDate());
signalData.setHour(00);
updateAggregatedSignalData(signalData, zoomLevel,"DATE");
}



You're computing average values of fields grouped by year-month-day and zoomLevel. Each aggregation pipeline processes the entire collection of documents and it looks like your storing the resulting statistics somewhere else, perhaps back in MongoDB. You can add an initial $match stage with conditions that can use an index to process a subset of the data faster, or, if you want global statistics about the collection, consider pre-aggregation. You can maintain a second collection with one document per year-month-day and zoomLevel, resembling the following:

{
    "date" : ISODate("2014-11-29T00:00:00.000Z"),
    "zoomLevel" : "zoomLevel0",
    "total_sinr" : 691,
    "total_rsrp" : 383,
    ... // total of others fields for which you want to compute the average per (year-month-day, zoomLevel)
    "count" : 36
}

When you insert a document to the first collection, you update the corresponding document in the second with the new totals. Averages can be computed by retrieving the proper document from the pre-aggregted collection and dividing total_* by count. It's maintaining the statistics incrementally on document insertion/update, rather than computing them from scratch using the entire set of documents.

I'm less certain what the idea is behind the $first operators in the $group, since in the absence of a sort the result of $first is the value of the field in some indeterminate document. If you just want some value for those fields, you could store their values from the first document for each day-month-year, say.



I'm finding it absolutely impossible to read aggregations as Java source, but I can see that your first stage is $project and yet you are not doing any useful projecting - i.e. you are not computing new fields as far as I can see, nor transforming anything.  Get rid of any stage that's useless - it's only going to make your aggregation slower.

If you could re-post it in readable form, maybe there are other things we can spot - Will already pointed out that those $first aggregations look strange - can you explain exactly what you're trying to do?  Sample document would help too.


댓글 없음:

댓글 쓰기