canonical-ci-engineering team mailing list archive
-
canonical-ci-engineering team
-
Mailing list archive
-
Message #00541
Re: otto containers left running, lxc-stop hanging
>>>>> Stéphane Graber <stephane.graber@xxxxxxxxxxxxx> writes:
<snip/>
> Did you try "lxc-stop -n <container> -k" which is the upstream supported
> way of forcefully killing a container?
Yes. As mentioned, I even tried lxc-stop -k -t <timeout>
> In theory lxc-stop sends SIGPWR, then waits 30s and sends SIGKILL to
> init.
Ha good. So may be I didn't wait enough on my last test but I'm pretty
sure I did.
Now, while debugging this I indeed tried to kill the init process as at
least compiz and X was listed as defunct.
> If SIGKILL doesn't work, then you have much bigger problems
> (typically kernel related).
That could very well be the case.
But then, I would expect lxc-stop to fail with some error code and
respect the -t timeout. In which case I can fallback to reboot but only
in that case.
> So please try with -k,
I did.
> if that doesn't work,
It didn't.
> please let me access one of those hanging machines so I can
> confirm that it's not an LXC issue and that something in the
> kernel is indeed making one of the tasks unkillable.
With pleasure, but that will have to wait :-/
I had to put the reboot hack in place to restore service, we'll need to
plan an interruption to give you access (I don't think we can reproduce
that on a different host).
And I'll be sprinting this week and be in vacations for the next 2 weeks.
But rest assured I'll get back to you ;)
So thanks a lot for the quick feedback (on the bug too !).
I'm pretty sure you're right about the deeper kernel issue, it matches
my tests last Friday, I couldn't kill the init process and I had issues
killing the other ones so... I had to reboot in the end.
And stay tuned, I'll ping you as soon as I can setup a reproducing env ;)
Vincent
References