canonical-ci-engineering team mailing list archive

Thread
Date

Re: Flaky tests

To: Vincent Ladeuil <vila+ci@xxxxxxxxxxxxx>
From: Francis Ginther <francis.ginther@xxxxxxxxxxxxx>
Date: Mon, 4 Nov 2013 10:53:07 -0600
Cc: canonical-ci-engineering <canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx>, Julien Funk <julien.funk@xxxxxxxxxxxxx>, Didier Roche <didier.roche@xxxxxxxxxxxxx>
In-reply-to: <87eh6wh586.fsf@canonical.com>

A possible, first step solution would be to add a very simple retry to
our CI test runner scripts and if the retry passes, the test passes.
The harder part would be to identify those test cases which failed on
the original attempt and making those visible. This could be done by
mining the jenkins data itself and could probably be made smart enough
to file bugs when it finds tests that passed the retry, although it's
hard to determine why the tests failed (i.e. maybe it's a flaky test
or maybe unity8 crashed, etc.).

Francis

On Mon, Nov 4, 2013 at 3:43 AM, Vincent Ladeuil <vila+ci@xxxxxxxxxxxxx> wrote:
>>>>>> Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx> writes:
>
>     > Can someone provide the set of tests we're running that are flaky?
>
> There is a whole spectrum of flaky tests, and identifying them is an
> art. That's the theory ;)
>
> In practice, there are several ways to automatically identify flake
> tests.
>
> Each failing test could be retried and considered flaky only if it fails
> again in the same context.
>
> Ha, ouch, "same context" is darn hard to guarantee, what if the test is
> flaky only when some other tests run before it ?
>
> What if a flaky test always succeeds when run alone ?
>
> What if the flakiness is caused by a specific hardware (and by that I
> don't mean a brand or a product but a unique piece of hardware) ?
>
>
>     > If we don't have a good way of getting this list, could we wire up
>     > job retries and log when that occurs?
>
> While I've encountered the above ones in real life, those are not the
> majority, so re-trying the test itself only requires a specific test
> runner (having all projects use such a test runner is achievable,
> slowly, but who will refuse to be protected against flaky tests). We can
> log such occurrences but the log will be disconnected from the test
> results.
>
> Alternatively, a job can be re-run with only the failing tests and it
> becomes slightly easiest to both report and process the flaky
> tests. This doesn't require a specific test runner but a subunit test
> result ;) And a specific test loader to select only the failing tests
> from a previous run[1].
>
> Even jenkins should be able to display such results as long as we tweak
> the flaky test names (don't ever try to display the details of a test
> failure in jenkins if the test you're interested in is named like a
> previous failure, you'll always get the later ;). And even the dashboard
> could be taught to display green(pass)/yellow(flaky)/red(errors) instead
> of just a %.
>
> Note that unique test names, while not enforced by python's unittest, is
> a simple constraint we want for multiple reasons, the jenkins one
> mentioned above ; this is not a bug a far as I'm concerned, there is
> just no solution to ambiguous names, what if I tell you: 'test X' is
> failing and you're looking at the code of test X (the other one) ? A
> unique name is also needed when you want to select a single test to run.
>
> Now, there are a few risks associated with automatically re-trying flaky
> tests:
>
> - people will care less about them (in the same way any dev start
>   ignoring warnings when too many happen making it impossible to notice
>   the new ones),
>
> - the root cause of the flakiness can be in the ci engine, the test
>   infrastructure or the code itself. The right people should be involved
>   to fix them and as of today, all causes are inter-mixed so nobody has
>   a clear view of what *he* should fix.
>
> So there is more than automatic retries to get rid of the flaky tests,
> people should be involved to track and fix them.
>
>        Vincent
>
>     > Thanks!
>
>     > On 31 October 2013 11:23, Julien Funk <julien.funk@xxxxxxxxxxxxx> wrote:
>     >> so, it would be great if we coul get a list of flaky tests re: discussion
>     >> today.  I don't mind rerunning the tests automatically when they fail, but I
>     >> think action should be mandatory on any flaky tests we discover and we
>     >> should maintain a list of them somewhere with a process to deal
>     >> with them.
>
> Yeah, that's the social part ;) At least as important and probably far
> more than the technical part ;)
>
>      Vincent
>
>
> [1]: Test loaders, filters, results and runners already exist in
> lp:selenium-simple-test (and lp:subunit of course). Teaching the test
> runner to re-run failing tests could be added easily.
>
> --
> Mailing list: https://launchpad.net/~canonical-ci-engineering
> Post to     : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~canonical-ci-engineering
> More help   : https://help.launchpad.net/ListHelp



-- 
Francis Ginther
Canonical - Ubuntu Engineering - Continuous Integration Team

Follow ups

Re: Flaky tests
From: Paul Larson, 2013-11-04

References

Re: Flaky tests
From: Evan Dandrea, 2013-11-02
Re: Flaky tests
From: Vincent Ladeuil, 2013-11-04