canonical-ci-engineering team mailing list archive
-
canonical-ci-engineering team
-
Mailing list archive
-
Message #00485
Re: Autorestarting jenkins slaves
On 11 December 2013 17:17, Vincent Ladeuil <vila+ci@xxxxxxxxxxxxx> wrote:
> >> Now, I stopped counting at 40 when listing all nodes where we want to do
> >> that (see https://app.asana.com/0/8740321118011/9113941145537).
> >>
> >> 40 is too high for a manual fix and deploy strategy :-/
>
> > Can you please elaborate?
>
> There are two approaches right now (using jlnp):
>
> - /usr/local/bin/start-jenkins-slaves which supports multiple slaves on
> the same host but do not use upstart,
>
> - creating upstart services for each slave by copying the needed files
> and putting the right slave name where needed (including in the file
> names).
>
> Since we don't want the former, the later requires creating 40 copies (I
> kind of understand why it's done this way for phones, but 40 ??? /me
> faint).
Couldn't we do this programmatically? Alternatively, couldn't we have
a job that uses the instance field:
http://upstart.ubuntu.com/cookbook/#instance
> > I'm not happy about us having that number of nodes not under
> > centralised provisioning, but there are things we can do to
> > mitigate the problem somewhat, like putting a bzr branch of all
> > the code/config that these things could use under /srv (*not* as
> > mounted remote volume) and symlinking to that.
>
> /srv on which server ? And pulled when from the slave nodes ? Or do you
> mean on all nodes ?
>
> Once again I'd love to have a package for that and some meta-packages
> for servers and slaves which would address the deployment issues but
> that sounds overkill just for that specific issue.
>
> Or are you suggesting to do that without packages but with some
> branch(es) as a first/interim step ?
>
> None of that is lightweight so far which is why I stopped and thought
> about ssh which at least put the burden on the server itself (bar the
> ssh key deployment but that should be a once only thing). But then Larry
> rightly raised the previous issue with that.
A package would work as well. I merely picked /srv because that's what
IS uses. They're at least using Puppet (old services) or Juju (new
services). The format is roughly:
/srv/graphite.engineering.canonical.com/{production, staging}/{bin, etc, var}
/srv/graphite.engineering.canonical.com/tmp/
So the thought is that we'd create a bzr branch that gets written to
/srv/uci-juju-stopgap/ with any *new* (I don't want to go creating a
lot of work here when we're going to move even these older,
pre-Airline services into Juju) config files, binaries, whatever in
it. That then gets symlinks to /etc/init/uci-jenkins.conf and so on.
It's just an idea though, and perhaps it's a bad one. I want us to
shoot for something expedient that tries to avoid the system
customisation mistakes that bit us hard in the 1SS move without going
so far as to build a poor man's Puppet. Honestly, I'd rather invest
any significant amount of time in charming our brittle infrastructure.
Follow ups
References