2014년 11월 29일 토요일

Mongos being killed by out of memory problem

I am been working in setting a shard environment and I have notice that in a context of heavy write load the memory usage of the mongos process increases until it's killed due to an out of memory condition.

The platform that I am using is  Ubuntu 12.04 (64 bits).

Nov 21 12:23:14 echo kernel: [214498.702641] Out of memory: Kill process 8442 (mongos) score 923 or sacrifice child 
Nov 21 12:23:14 echo kernel: [214498.703077] Killed process 8442 (mongos) total-vm:24302468kB, anon-rss:14556584kB, file-rss:0kB 
Nov 21 12:23:16 echo kernel: [214500.763752] init: mongos main process (8442) killed by KILL signal


The machine has 16 GB of memory and I wasn't running anything else using a significant amount of memory in the machine.  I am triggering this inserting data using pymongo with scripts running in the local machine running without write concern and I make sure that I am closing the connections before exiting the script. And the sharded data is distributed in three replicasets in different machines. Is this a known problem or I am doing that I am not supposed to do? 



It's hard to say if this is a known problem without knowing what version of mongos this is.

It's possible this is some sort of memory leak, or you are doing something that's causing heavy memory usage - anything interesting or out of the ordinary in mongos logs right before it's killed?



It's the last release at this moment 2.6.5. I think that I am doing something that causes heavy memory usage, but It doesn't seem to release the memory


Something extraordinary. There was something, the number of connections according to the mongos process was constantly growing. Once in a while a connection was closed. But, most of them seem to stay open. 


2014-11-21T12:23:01.360+0000 [mongosMain] connection accepted from 127.0.0.1:56850 #55608 (717 connections now open)
2014-11-21T12:23:01.360+0000 [mongosMain] connection accepted from 127.0.0.1:56851 #55609 (718 connections now open)
2014-11-21T12:23:01.360+0000 [mongosMain] connection accepted from 127.0.0.1:56853 #55610 (719 connections now open)
2014-11-21T12:23:01.360+0000 [mongosMain] connection accepted from 127.0.0.1:56854 #55611 (720 connections now open)
2014-11-21T12:23:01.360+0000 [mongosMain] connection accepted from 127.0.0.1:56855 #55612 (721 connections now open)
2014-11-21T12:23:01.361+0000 [mongosMain] connection accepted from 127.0.0.1:56856 #55613 (722 connections now open)
2014-11-21T12:23:01.362+0000 [mongosMain] connection accepted from 127.0.0.1:56857 #55614 (723 connections now open)
2014-11-21T12:23:01.363+0000 [mongosMain] connection accepted from 127.0.0.1:56858 #55615 (724 connections now open)
2014-11-21T12:23:01.363+0000 [mongosMain] connection accepted from 127.0.0.1:56859 #55616 (725 connections now open)
2014-11-21T12:23:01.364+0000 [mongosMain] connection accepted from 127.0.0.1:56861 #55617 (726 connections now open)
2014-11-21T12:23:01.364+0000 [mongosMain] connection accepted from 127.0.0.1:56860 #55618 (727 connections now open)
2014-11-21T12:23:01.365+0000 [mongosMain] connection accepted from 127.0.0.1:56863 #55619 (728 connections now open)
2014-11-21T12:23:01.365+0000 [mongosMain] connection accepted from 127.0.0.1:56864 #55620 (729 connections now open)
2014-11-21T12:23:01.365+0000 [mongosMain] connection accepted from 127.0.0.1:56865 #55621 (730 connections now open)
2014-11-21T12:23:01.366+0000 [mongosMain] connection accepted from 127.0.0.1:56866 #55622 (731 connections now open)
2014-11-21T12:23:01.367+0000 [mongosMain] connection accepted from 127.0.0.1:56867 #55623 (732 connections now open)
2014-11-21T12:23:01.367+0000 [mongosMain] connection accepted from 127.0.0.1:56868 #55624 (733 connections now open)
2014-11-21T12:23:01.367+0000 [mongosMain] connection accepted from 127.0.0.1:56869 #55625 (734 connections now open)
2014-11-21T12:23:02.970+0000 [mongosMain] connection accepted from 127.0.0.1:56870 #55626 (735 connections now open)
2014-11-21T12:23:02.983+0000 [mongosMain] connection accepted from 127.0.0.1:56871 #55627 (736 connections now open)
2014-11-21T12:23:03.029+0000 [mongosMain] connection accepted from 127.0.0.1:56873 #55628 (737 connections now open)
2014-11-21T12:23:03.047+0000 [mongosMain] connection accepted from 127.0.0.1:56874 #55629 (738 connections now open)
2014-11-21T12:23:03.132+0000 [mongosMain] connection accepted from 127.0.0.1:56875 #55630 (739 connections now open)
2014-11-21T12:23:03.134+0000 [mongosMain] connection accepted from 127.0.0.1:56876 #55631 (740 connections now open)
2014-11-21T12:23:03.381+0000 [mongosMain] connection accepted from 127.0.0.1:56879 #55632 (741 connections now open)
2014-11-21T12:23:03.383+0000 [mongosMain] connection accepted from 127.0.0.1:56880 #55633 (742 connections now open)
2014-11-21T12:23:03.411+0000 [mongosMain] connection accepted from 127.0.0.1:56882 #55634 (743 connections now open)
2014-11-21T12:23:03.423+0000 [mongosMain] connection accepted from 127.0.0.1:56883 #55635 (744 connections now open)
2014-11-21T12:23:03.435+0000 [mongosMain] connection accepted from 127.0.0.1:56884 #55636 (745 connections now open)
2014-11-21T12:23:03.544+0000 [mongosMain] connection accepted from 127.0.0.1:56885 #55637 (746 connections now open)
2014-11-21T12:23:04.711+0000 [mongosMain] connection accepted from 127.0.0.1:56887 #55638 (747 connections now open)
2014-11-21T12:23:04.838+0000 [mongosMain] connection accepted from 127.0.0.1:56889 #55639 (748 connections now open)
2014-11-21T12:23:04.905+0000 [mongosMain] connection accepted from 127.0.0.1:56891 #55640 (749 connections now open)
2014-11-21T12:23:05.036+0000 [mongosMain] connection accepted from 127.0.0.1:56893 #55641 (750 connections now open)
2014-11-21T12:23:05.072+0000 [mongosMain] connection accepted from 127.0.0.1:56895 #55642 (751 connections now open)
2014-11-21T12:23:05.073+0000 [mongosMain] connection accepted from 127.0.0.1:56896 #55643 (752 connections now open)
2014-11-21T12:23:05.126+0000 [mongosMain] connection accepted from 127.0.0.1:56897 #55644 (753 connections now open)
2014-11-21T12:23:05.134+0000 [mongosMain] connection accepted from 127.0.0.1:56898 #55645 (754 connections now open)
2014-11-21T12:23:05.226+0000 [mongosMain] connection accepted from 127.0.0.1:56899 #55646 (755 connections now open)
2014-11-21T12:23:05.257+0000 [mongosMain] connection accepted from 127.0.0.1:56900 #55647 (756 connections now open)
2014-11-21T12:23:05.378+0000 [mongosMain] connection accepted from 127.0.0.1:56902 #55648 (757 connections now open)
2014-11-21T12:23:05.419+0000 [mongosMain] connection accepted from 127.0.0.1:56904 #55649 (758 connections now open)
2014-11-21T12:23:05.564+0000 [mongosMain] connection accepted from 127.0.0.1:56905 #55650 (759 connections now open)
2014-11-21T12:23:06.185+0000 [conn55609] end connection 127.0.0.1:56851 (758 connections now open)
2014-11-21T12:23:06.257+0000 [mongosMain] connection accepted from 127.0.0.1:56907 #55651 (759 connections now open)
2014-11-21T12:23:06.436+0000 [mongosMain] connection accepted from 127.0.0.1:56908 #55652 (760 connections now open)
2014-11-21T12:23:06.436+0000 [mongosMain] connection accepted from 127.0.0.1:56909 #55653 (761 connections now open)
2014-11-21T12:23:06.443+0000 [mongosMain] connection accepted from 127.0.0.1:56910 #55654 (762 connections now open)
2014-11-21T12:23:06.664+0000 [mongosMain] connection accepted from 127.0.0.1:56912 #55655 (763 connections now open)
2014-11-21T12:23:06.688+0000 [mongosMain] connection accepted from 127.0.0.1:56913 #55656 (764 connections now open)
2014-11-21T12:23:06.709+0000 [mongosMain] connection accepted from 127.0.0.1:56914 #55657 (765 connections now open)
2014-11-21T12:23:06.722+0000 [mongosMain] connection accepted from 127.0.0.1:56916 #55658 (766 connections now open)
2014-11-21T12:23:06.742+0000 [mongosMain] connection accepted from 127.0.0.1:56917 #55659 (767 connections now open)
2014-11-21T12:23:06.830+0000 [mongosMain] connection accepted from 127.0.0.1:56918 #55660 (768 connections now open)
2014-11-21T12:23:07.530+0000 [mongosMain] connection accepted from 127.0.0.1:56919 #55661 (769 connections now open)
2014-11-21T12:23:07.838+0000 [mongosMain] connection accepted from 127.0.0.1:56920 #55662 (770 connections now open)
2014-11-21T12:23:07.838+0000 [mongosMain] connection accepted from 127.0.0.1:56922 #55663 (771 connections now open)
2014-11-21T12:23:07.838+0000 [mongosMain] connection accepted from 127.0.0.1:56923 #55664 (772 connections now open)
2014-11-21T12:23:08.224+0000 [conn55618] end connection 127.0.0.1:56860 (771 connections now open)
2014-11-21T12:23:08.560+0000 [mongosMain] connection accepted from 127.0.0.1:56924 #55665 (772 connections now open)
2014-11-21T12:23:08.562+0000 [mongosMain] connection accepted from 127.0.0.1:56925 #55666 (773 connections now open)
2014-11-21T12:23:09.162+0000 [mongosMain] connection accepted from 127.0.0.1:56926 #55667 (774 connections now open)
2014-11-21T12:23:09.254+0000 [mongosMain] connection accepted from 127.0.0.1:56927 #55668 (775 connections now open)
2014-11-21T12:23:09.346+0000 [mongosMain] connection accepted from 127.0.0.1:56928 #55669 (776 connections now open)
2014-11-21T12:23:09.369+0000 [mongosMain] connection accepted from 127.0.0.1:56929 #55670 (777 connections now open)
2014-11-21T12:23:09.387+0000 [mongosMain] connection accepted from 127.0.0.1:56930 #55671 (778 connections now open)
2014-11-21T12:23:09.438+0000 [mongosMain] connection accepted from 127.0.0.1:56931 #55672 (779 connections now open)
2014-11-21T12:23:09.554+0000 [mongosMain] connection accepted from 127.0.0.1:56932 #55673 (780 connections now open)
2014-11-21T12:23:10.086+0000 [mongosMain] connection accepted from 127.0.0.1:56933 #55674 (781 connections now open)
2014-11-21T12:23:10.119+0000 [conn55621] end connection 127.0.0.1:56865 (780 connections now open)


The logic is very simple.


client = pymongo.MongoClient("localhost") 
client.write_concern['w'] = 0
client.test.data.insert(docs)
client.close()

And the version of pymongo is 2.7.1


Something that I should think about?



Is this your entire code?


client = pymongo.MongoClient("localhost")  
client.write_concern['w'] = 0
client.test.data.insert(docs)
client.close()

I don't understand why you're closing connection (and opening it each time).

You should just have a single client object and use that in all of your operations - let the driver deal with connections - if this is how you handle each insert then that's extremely inefficient on top of everything else...

Sorry if I jumped to conclusions about what your entire code is doing.



No, It's not my entire code. It's modified version of a subset

However, the code is effectively opening a connection for every batch of work. 

The reason: I am using python multiprocessing for perform tasks over a big data set, and when every process is done inserts to MongoDB the results. 

I am not sure what's the best approach for that pattern with MongoDB. 

When I did the test that actually produced that killed the database, most of the he



You do not need a separate connection for every batch of work - in fact, that is a bad idea.
Use a single shared MongoClient between all the workers - they will effectively use the connection pool.

I don't know if that's directly part of the problem, but it can't be helping it...



I am going to check if this fixes the problem that I am seeing. However, maybe it's worth to check in MongoDB there is a memory leak when you do an unacknowledged write and you close the connection immediately.



It's certainly possible -though there isn't really a true
unacknowledged write anymore - the write command always returns a
result, but different drivers implement unacknowledged differently (I
think) once the new write commands came along in 2.6...

댓글 없음:

댓글 쓰기