← Back to team overview

canonical-ci-engineering team mailing list archive

Re: Flaky tests

 

Here's a list I started, with a brief rationale for each:
https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AjwxZmhDIclsdFdiV2c3dFhGYWpfckhqT1N1ZWpUY1E&usp=sharing

Certainly things could change per image here, this is just to give us an
easy way to make a list.


On Mon, Nov 4, 2013 at 10:53 AM, Francis Ginther <
francis.ginther@xxxxxxxxxxxxx> wrote:

> A possible, first step solution would be to add a very simple retry to
> our CI test runner scripts and if the retry passes, the test passes.
> The harder part would be to identify those test cases which failed on
> the original attempt and making those visible. This could be done by
> mining the jenkins data itself and could probably be made smart enough
> to file bugs when it finds tests that passed the retry, although it's
> hard to determine why the tests failed (i.e. maybe it's a flaky test
> or maybe unity8 crashed, etc.).
>
> Francis
>
> On Mon, Nov 4, 2013 at 3:43 AM, Vincent Ladeuil <vila+ci@xxxxxxxxxxxxx>
> wrote:
> >>>>>> Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx> writes:
> >
> >     > Can someone provide the set of tests we're running that are flaky?
> >
> > There is a whole spectrum of flaky tests, and identifying them is an
> > art. That's the theory ;)
> >
> > In practice, there are several ways to automatically identify flake
> > tests.
> >
> > Each failing test could be retried and considered flaky only if it fails
> > again in the same context.
> >
> > Ha, ouch, "same context" is darn hard to guarantee, what if the test is
> > flaky only when some other tests run before it ?
> >
> > What if a flaky test always succeeds when run alone ?
> >
> > What if the flakiness is caused by a specific hardware (and by that I
> > don't mean a brand or a product but a unique piece of hardware) ?
> >
> >
> >     > If we don't have a good way of getting this list, could we wire up
> >     > job retries and log when that occurs?
> >
> > While I've encountered the above ones in real life, those are not the
> > majority, so re-trying the test itself only requires a specific test
> > runner (having all projects use such a test runner is achievable,
> > slowly, but who will refuse to be protected against flaky tests). We can
> > log such occurrences but the log will be disconnected from the test
> > results.
> >
> > Alternatively, a job can be re-run with only the failing tests and it
> > becomes slightly easiest to both report and process the flaky
> > tests. This doesn't require a specific test runner but a subunit test
> > result ;) And a specific test loader to select only the failing tests
> > from a previous run[1].
> >
> > Even jenkins should be able to display such results as long as we tweak
> > the flaky test names (don't ever try to display the details of a test
> > failure in jenkins if the test you're interested in is named like a
> > previous failure, you'll always get the later ;). And even the dashboard
> > could be taught to display green(pass)/yellow(flaky)/red(errors) instead
> > of just a %.
> >
> > Note that unique test names, while not enforced by python's unittest, is
> > a simple constraint we want for multiple reasons, the jenkins one
> > mentioned above ; this is not a bug a far as I'm concerned, there is
> > just no solution to ambiguous names, what if I tell you: 'test X' is
> > failing and you're looking at the code of test X (the other one) ? A
> > unique name is also needed when you want to select a single test to run.
> >
> > Now, there are a few risks associated with automatically re-trying flaky
> > tests:
> >
> > - people will care less about them (in the same way any dev start
> >   ignoring warnings when too many happen making it impossible to notice
> >   the new ones),
> >
> > - the root cause of the flakiness can be in the ci engine, the test
> >   infrastructure or the code itself. The right people should be involved
> >   to fix them and as of today, all causes are inter-mixed so nobody has
> >   a clear view of what *he* should fix.
> >
> > So there is more than automatic retries to get rid of the flaky tests,
> > people should be involved to track and fix them.
> >
> >        Vincent
> >
> >     > Thanks!
> >
> >     > On 31 October 2013 11:23, Julien Funk <julien.funk@xxxxxxxxxxxxx>
> wrote:
> >     >> so, it would be great if we coul get a list of flaky tests re:
> discussion
> >     >> today.  I don't mind rerunning the tests automatically when they
> fail, but I
> >     >> think action should be mandatory on any flaky tests we discover
> and we
> >     >> should maintain a list of them somewhere with a process to deal
> >     >> with them.
> >
> > Yeah, that's the social part ;) At least as important and probably far
> > more than the technical part ;)
> >
> >      Vincent
> >
> >
> > [1]: Test loaders, filters, results and runners already exist in
> > lp:selenium-simple-test (and lp:subunit of course). Teaching the test
> > runner to re-run failing tests could be added easily.
> >
> > --
> > Mailing list: https://launchpad.net/~canonical-ci-engineering
> > Post to     : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~canonical-ci-engineering
> > More help   : https://help.launchpad.net/ListHelp
>
>
>
> --
> Francis Ginther
> Canonical - Ubuntu Engineering - Continuous Integration Team
>
> --
> Mailing list: https://launchpad.net/~canonical-ci-engineering
> Post to     : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~canonical-ci-engineering
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References