2014년 11월 29일 토요일

MMS account totally horked and completely stuck in spinner hell.

My MMS account seems to be totally horked now and I'm looking for any tips on how to recover.  I've done so many things now that I can't totally recall steps, but here's the gist:

- Create new MMS account and get pushed into wizard flow.  Start to go through and set up replica set in AWS of 3 amazon-linux machines that I've already provisioned and installed mongo on.  I follow the instruction to install the agent on all 3 and the mms wizard shows all as verified.  I'm about to hit the deploy button when I realize it says that it will install mongo.  I confirm with someone else that I shouldn't have installed Mongo already.

- Try to recover - I decide this is AWS, get back to a clean state and just terminate/start up new instances (note that I'm using static IPs for my instances via predefined network interfaces so that the mongo boxes are easily findable by my other apps).  So the new instances show up with the same IPs.   Go back to MMS and page still shows that the agents are already validated and it won't refresh.  So I back out of the wizard completely and go back in - still shows the agents are validated (but no green dot) and it won't give me the instructions for reinstalling the agents.  I find in the deployment view that my servers are all listed in the "servers" tab but can't figure out any way to remove them.  I try finally find the instructions somewhere else and install the agents on the new EC2 instances.  I go back through the wizard and when it gets to the server list, it again shows the agents as validated, but now the dots are green... so I'm thinking good to go.  I note that the next screen that shows what's going to be installed where (monitoring/backup/etc) is the same as before, but I don't see the replica set listed for some reason.  I go back and forth one more time through the wizard making sure I picked replica set but still no difference.  So I go ahead and deploy.

- Deployment says it worked, but the "deployment" view doesn't show any replica set - only the servers in the server tab.  So I go into edit mode and use the popup modal dialog to create a replica set (specifying port 27000-28000 as suggested).  It finds the servers and now shows the replica set.  But status seems to be missing of some sort.  I let it sit for 5 minutes and then attach to the instances to see what's running and to see what Mongo shows.  I find out mongo isn't installed on the boxes at all. So I install mongo.   But when I try to connect with mongo, I get an error about connecting and it fails.  

- I try to start over. I figure I need to unmanage them before I terminate them this time.  So I first try to "remove" instances from my replica set via the deployment edit view and then unmanaged them individually.  I remove 2 of them successfully from that view, but they still show in the server tab.  I try to do the 3rd and it won't even disappear from the topology view.  I try to unmanage the replica set as a whole.  The page changes to show a bunch of spinners by everything.   I wait for 15 minutes, no change.  I finally try to edit configuration and tell it to cancel what it's doing.  The spinners go away, but as soon as I do anything else, they come back.  This happens if I try to "discard changes", or "apply changes", or make a new change.   Nothing works anymore and everything leads back to spinners across the board.

- I try to resolve from the client side. I stop the mongo service, I stop the mms-automation-agent... but still no impact on the spinners or any experience in the MMS dash.  I am totally stuck.  All I want to do is delete everything it thinks it knows and start over, but MMS won't let me do anything.  I believe I got into this state because the wizard seemed to remember state and not work with the new state.  I *hope* if I can just delete all existing references to servers/replica sets/etc, it would probably let me set this up again, but I don't know how.

Has anyone had any experience like this?  Have any ideas on how to get out of this state?



Am facing a spinner hell issue as well. I started a new deployment. 

The EC2 instance running the MMS agent had DNS and firewall incorrectly configured, so MMS couldn't find it. Once I fixed the server issues, the spinner disn't stop, went on for hours. The option to Unmanage is also disabled.



You can NOT point MMS to an exisiting MongoDB deployment.

The machines must be virgins!



You install the agent, then the agent installs MongoDB for you, based on the configuration that you made from the MMS Portal.



That seems to be completely incorrect.
As far as I know, you can. Deployment and Monitoring are completely different functions and MMS started out WITHOUT auto-deployment functions to AWS to begin with.
Mongodb University also teaches you, that you actually can monitor existing systems. 
I do not see where you get the factual statement that you can not?



If you say so, I'm only going on what I read in here, and from experience of trying to get MMS to work.

I see no way of configuring the agent to monitor a current instance of MongoDB.

Oh and then there's this...

https://docs.mms.mongodb.com/faq/

Can MMS Manage an Existing MongoDB Deployment?

If you have an existing MongoDB deployment you cannot directly use MMS to manage this system.



Hmmm. Seems that I need to step back from my earlier position!
I did find that paragraph in the docs earlier, but simply assumed (yes, I know!) that it was outdated information somehow, since from the MongoDB University Course I understood that you COULD use it on existing installations. According to the docs you just mentioned, you can't. Which could be why I and others were not able to reach the defined hosts with the monitoring agent to begin with.
I do think that this is very unclear and as a matter of fact I have not find any other reference to this issue anywhere.
It would have saved me and others a lot of time if this was stated much more clear on the MMS homepage instead of somewhere far away in the docs as a one-liner.
Which now means I have gone to the trouble and effort of creating an account and installing an agent exactly for nothing. Very disappointing. Should have read better.



MMS, Until October 15, 2014, allowed configuring monitoring and backup on existing deployments for all users. On that day, we released a new version of MMS what is centered around deploying MongoDB.

If you have an MMS group created before the Oct 15th release, you can monitor and backup existing deployments. This accounts are called "MMS Classic" groups. Note, to make matters more confusing, and MMS account can access multiple groups, both classic and "new."

So the answer to this question is, it depends. If you go to mms.mongodb.com and create an account, you can only monitor and backup what you deploy. But if you are a long standing user, you can monitor and backup an existing cluster (with no deployment functionality).

The MongoDB University classes are unfortunately now a little dated with respect to this issue. We plan to fix them soon.

Sorry for the confusion. If you have an MMS classic account, it says "Classic" in the header when you are logged in.



This thread has deviated slightly.  First, I don't think I have a Classic account - at least it doesn't show in the header and I didn't create this till a couple weeks ago (when I posted this).

Yes, I had already installed the mongo package on my first attempt (a co-worker had just successfully done this a week before).  But I quickly realized the mistake.  I quickly realized my error and tried to cancel/start-over.  In that process MMS totally bit it and left me in an unrecoverable state.  There appears to be no way to just say "delete everything you think you know and start over" and that makes fixing a problem a non-trivial and change-then-wait-a-day procedure... unacceptable for a production deployment... IE, you make one mistake (trying to add a new node/cluster/whatever) and you get stuck in this completely unusable state for a day.  

We can't afford to have our hands completely tied for a day in our production environment.  Right, I did something wrong at first, but not being able to fix/correct is a real problem.  How is one supposed to recover when everything on the dash turns to spinners?



Thanks for responding!
Well, I do have a "classic" account, but it isn't working anyway...... please see my independent thread a bit down about this.



The general way to get out of "spinner hell", as you put it, is to click the "Edit Configuration" link on the "edit mode" of the "Deployment" view. Confirm it, and then you can make changes again.

Since you want to "delete" everything, the correct thing to do is "Unmanage". The way you do this is you click the arrow under the edit column for your cluster/replica set, and on the sidebar that pops up, click the gear, and then select "Unmanage". This will cause MMS to "let go" – if the processes are running, they will keep running, the data directories will still exist, etc. Confirm and push your changes and your deployment will be empty.

I hope this helps clear things up.


댓글 없음:

댓글 쓰기