canonical-ci-engineering team mailing list archive

Thread
Date

Re: Flaky tests

To: Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx>
From: Vincent Ladeuil <vila+ci@xxxxxxxxxxxxx>
Date: Mon, 04 Nov 2013 10:43:37 +0100
Cc: Didier Roche <didier.roche@xxxxxxxxxxxxx>, Julien Funk <julien.funk@xxxxxxxxxxxxx>, canonical-ci-engineering <canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAOe9oG5Tx3UYcBroRd9pf5zKC6ENqADd=o7Ck_v-L50zJTuJNA@mail.gmail.com> (Evan Dandrea's message of "Sat, 2 Nov 2013 15:17:17 -0700")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

>>>>> Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx> writes:

    > Can someone provide the set of tests we're running that are flaky?

There is a whole spectrum of flaky tests, and identifying them is an
art. That's the theory ;)

In practice, there are several ways to automatically identify flake
tests.

Each failing test could be retried and considered flaky only if it fails
again in the same context.

Ha, ouch, "same context" is darn hard to guarantee, what if the test is
flaky only when some other tests run before it ?

What if a flaky test always succeeds when run alone ?

What if the flakiness is caused by a specific hardware (and by that I
don't mean a brand or a product but a unique piece of hardware) ?

    > If we don't have a good way of getting this list, could we wire up
    > job retries and log when that occurs?

While I've encountered the above ones in real life, those are not the
majority, so re-trying the test itself only requires a specific test
runner (having all projects use such a test runner is achievable,
slowly, but who will refuse to be protected against flaky tests). We can
log such occurrences but the log will be disconnected from the test
results.

Alternatively, a job can be re-run with only the failing tests and it
becomes slightly easiest to both report and process the flaky
tests. This doesn't require a specific test runner but a subunit test
result ;) And a specific test loader to select only the failing tests
from a previous run[1].

Even jenkins should be able to display such results as long as we tweak
the flaky test names (don't ever try to display the details of a test
failure in jenkins if the test you're interested in is named like a
previous failure, you'll always get the later ;). And even the dashboard
could be taught to display green(pass)/yellow(flaky)/red(errors) instead
of just a %.

Note that unique test names, while not enforced by python's unittest, is
a simple constraint we want for multiple reasons, the jenkins one
mentioned above ; this is not a bug a far as I'm concerned, there is
just no solution to ambiguous names, what if I tell you: 'test X' is
failing and you're looking at the code of test X (the other one) ? A
unique name is also needed when you want to select a single test to run.

Now, there are a few risks associated with automatically re-trying flaky
tests:

- people will care less about them (in the same way any dev start
  ignoring warnings when too many happen making it impossible to notice
  the new ones),

- the root cause of the flakiness can be in the ci engine, the test
  infrastructure or the code itself. The right people should be involved
  to fix them and as of today, all causes are inter-mixed so nobody has
  a clear view of what *he* should fix.

So there is more than automatic retries to get rid of the flaky tests,
people should be involved to track and fix them.

       Vincent

    > Thanks!

    > On 31 October 2013 11:23, Julien Funk <julien.funk@xxxxxxxxxxxxx> wrote:
    >> so, it would be great if we coul get a list of flaky tests re: discussion
    >> today.  I don't mind rerunning the tests automatically when they fail, but I
    >> think action should be mandatory on any flaky tests we discover and we
    >> should maintain a list of them somewhere with a process to deal
    >> with them.

Yeah, that's the social part ;) At least as important and probably far
more than the technical part ;)

     Vincent

[1]: Test loaders, filters, results and runners already exist in
lp:selenium-simple-test (and lp:subunit of course). Teaching the test
runner to re-run failing tests could be added easily.

Follow ups

Re: Flaky tests
From: Francis Ginther, 2013-11-04

References

Re: Flaky tests
From: Evan Dandrea, 2013-11-02