← Back to team overview

canonical-ci-engineering team mailing list archive

Re: On resiliency to failure

 

Sure, can you give me 5 or so bullet points of features that the app needs
(endpoints, database stuff etc) so I can relate flask to it?


On 3 December 2013 16:28, Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx> wrote:

> Tristram,
>
> Could you make a case for this on canonical-tech@xxxxxxxxxxxxxxxxxxx?
> In our particular case we need to move quickly, so we don't have a lot
> of time here to wait for a decision on whether Flask would be okay to
> Mark or to ramp up on it. But knowing whether it's an option for
> future components of this system would be super-helpful. We'll be
> adding quite a few more in phase two of the project, come February.
>
> Thanks!
>
> On 3 December 2013 16:17, Tristram Oaten <tristram.oaten@xxxxxxxxxxxxx>
> wrote:
> > Django is based on WSGI, does that imply that WSGI is a preferred
> > technology? I guess not, you can run Python on windows, after all. But I
> > think we should add it, as for simple apps it is lightning fast, and add
> a
> > little Werkzeug library into the mix and you have yourself the bare
> bones of
> > what you need to support an app like this.
> >
> > Perhaps pure WSGI is difficult to find skills for, Flask uses the same
> > templating syntax as Django and you can bring your choice of ORM, and
> > generate an admin with the well-used plugin flask-admin.
> >
> > Django is a great CMS. Use it for anything else and you handcuff your
> > developers.
> >
> >
> > On 3 December 2013 16:07, Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx>
> wrote:
> >>
> >> I had a quick chat with Nick Moffitt and Liam Young of webops/GSA,
> >> then Tristram of the web team, which I think would be useful to all of
> >> you.
> >>
> >> As we are standardising around a model of using Django 1.5 for the
> >> individual components (as defined in lp:ubuntu-ci-services-itself,
> >> docs/style.rst), it's worth thinking about the various ways any one of
> >> these components can fail.
> >>
> >> A broader discussion would be of what happens when a component
> >> completely goes down and cannot be talked to. What does the other end
> >> do in this circumstance to gracefully handle the failed request and
> >> prevent a domino effect? We cannot assume that the REST API we're
> >> talking to will reply, or reply within a given timeout (and we should
> >> always be setting timeouts). I won't cover this here, but you should
> >> definitely be thinking about how to handle it.
> >>
> >> So, how can our little Django worker fail? Well, for a start, the node
> >> it is running on could fall over. That's okay, Django itself is
> >> horizontally scalable. So we create N wsgi servers (gunicorn) hosting
> >> the Django code and put them behind HAProxy with a health check set.
> >>
> >> With a bit of extra work (we cannot just juju upgrade-charm), this
> >> would also let us deploy code worker by worker, checking for a bad
> >> deployment along the way. The online services team is trying to get to
> >> this deployment strategy in place. It's worth talking to bloodearnest
> >> if you head down that road.
> >>
> >> But Django also talks to a Postgres database. How do we handle
> >> Postgres falling over and leaving Django with nothing to talk to?
> >> Pgbouncer helps here. If we put pgbouncer in front of a number of
> >> postgres instances with a set master instance, we can tolerate some
> >> fallover. Of course, pgbouncer then becomes a SPOF. From talking to
> >> Nick it doesn't sound like this has bitten IS often.
> >>
> >> It's definitely worth talking to Stuart Bishop (stub) about how to
> >> best handle postgres in this SOA architecture. He's our in house
> >> database expert.
> >>
> >> Now, replicating postgres like this potentially falls over if we're
> >> using it to store locks. You've got to wait on pgbouncer to
> >> synchronise locks across all the postgres nodes.
> >>
> >> Also keep in mind whether we really need to store anything in a
> >> database at all. If you're talking to Launchpad for your information,
> >> you can probably leave the data there. If you're creating locks, it's
> >> probably worth rethinking whether you can flip that around and rather
> >> than go find a place to put a task, whether you can put it on a big
> >> queue for some workers to grab from.
> >>
> >> Expanding on that, just how much of Django do you really need? I can't
> >> imagine we'll need the administrative interface, the templating
> >> engine, the ORM, or really anything above the routing code in most
> >> cases. It's probably worth disabling the rest.
> >>
> >> Django is pretty heavyweight. Tristram benchmarked it against Flask
> >> and others and came up with some interesting results:
> >> https://workflowy.com/shared/1574979c-4603-a345-a145-a6dbb7174885/
> >>
> >> Unfortunately, the Preferred Technologies page pretty much forces us
> >> to use it, but that doesn't mean we cannot strip it down to just what
> >> we need in each case.
> >>
> >> Attached is the diagram Nick and Liam drew for how we might layout
> >> each component. Keep in mind this is for a single microrservice. We'd
> >> want this layout for each one. You can ignore the bit at the top for
> >> squid. We won't need that on the front of most things. Instead, a
> >> simple Apache in front of HAProxy will suffice.
> >>
> >> For good examples of how to do haproxy in prodstack, both psearch (in
> >> lp:ubuntuone-servers-deploy) and certification (in
> >> lp:~canonical-losas/canonical-is-charms/certification) were
> >> recommended.
> >>
> >> Thanks!
> >>
> >> (Tristram, Liam, and Nick, if I got any of the above wrong, please do
> >> correct me.)
> >
> >
>

References