canonical-ci-engineering team mailing list archive

Thread
Date

Re: [Merge] lp:~doanac/ubuntu-ci-services-itself/ppa-assigner into lp:ubuntu-ci-services-itself

To: canonical-ci-engineering <canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx>
From: vila+ci@xxxxxxxxxxxxx
Date: Wed, 04 Dec 2013 09:57:21 +0100
In-reply-to: <CAOe9oG6WDeO0PE7M3cCYL53xVytQggsMuhBARWZGph5zsWerOg@mail.gmail.com> (Evan Dandrea's message of "Tue, 03 Dec 2013 16:51:22 -0000")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

I'm a bit late for the review so switching to the ML instead. Sorry
about that delay :-/

>>>>> Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx> writes:

<snip/>

    > I know you took some care to explain this to me on the standup
    > yesterday, but may I ask that you walk me through it again in text, so
    > I have something to refer back to?

    > Could we not flip this around and have a pool of workers, each
    > assigned to a specific PPA. The PPA assigner could then put a job on a
    > Rabbit queue that the PPA workers then pull from. This would seemingly
    > get us away from needing a database here, which is another potential
    > point of failure.

I like that design very much :)

I'll go even further and try to find a way where we don't need any
database at all for the ppa assigner.

I think launchpad provides enough fields associated to a ppa to keep
track of any state we need:

- ppa state ('used, 'dirty', 'free')
- owner
- time last touched by the ppa assigner

Then the ppa assigner can be blindly re-started from scratch and start a
worker for each ppa (from a list of ppas that would be the only data the
ppa assigner cares about).

Each worker then handles a very simple state automaton that can:
- assign a ppa (synchronous),
- clean a ppa (asynchronous),

The only critical part seems to be for the ppa assigner to reply to the
request to assign a ppa, synchronously.

All of the above also seems to be testable in two parts:

- test launchpad features (only the worker needs to do that so a single
  ppa  is involved),

- test the ppa assigner (which can be achieved first with fake workers
  and then with real workers for integration tests).

- and I hope Ursula can set us up with a test launchpad instance (ok,
  ok, staging will do, but hey, I can keep asking right ? ;)

Thoughts ? Don't we get a pretty good resiliency story there ?

         Vincent

P.S.: Yeah, yeah, I removed rabbit MQ from the picture, but only because
I consider that a task in the queue is removed only when a worker says:
I did it. So as long as no worker handles a ppa assignment, the task is
still pending and some other worker will handle it later.