← Back to team overview

canonical-ci-engineering team mailing list archive

Re: Flaky tests

 

Adding the T&T group to this thread as they will be making progress in step
with CI towards conquering flaky tests :)

~J


On Mon, Nov 4, 2013 at 1:36 PM, Paul Larson <paul.larson@xxxxxxxxxxxxx>wrote:

> Here's a list I started, with a brief rationale for each:
>
> https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AjwxZmhDIclsdFdiV2c3dFhGYWpfckhqT1N1ZWpUY1E&usp=sharing
>
> Certainly things could change per image here, this is just to give us an
> easy way to make a list.
>
>
> On Mon, Nov 4, 2013 at 10:53 AM, Francis Ginther <
> francis.ginther@xxxxxxxxxxxxx> wrote:
>
>> A possible, first step solution would be to add a very simple retry to
>> our CI test runner scripts and if the retry passes, the test passes.
>> The harder part would be to identify those test cases which failed on
>> the original attempt and making those visible. This could be done by
>> mining the jenkins data itself and could probably be made smart enough
>> to file bugs when it finds tests that passed the retry, although it's
>> hard to determine why the tests failed (i.e. maybe it's a flaky test
>> or maybe unity8 crashed, etc.).
>>
>> Francis
>>
>> On Mon, Nov 4, 2013 at 3:43 AM, Vincent Ladeuil <vila+ci@xxxxxxxxxxxxx>
>> wrote:
>> >>>>>> Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx> writes:
>> >
>> >     > Can someone provide the set of tests we're running that are flaky?
>> >
>> > There is a whole spectrum of flaky tests, and identifying them is an
>> > art. That's the theory ;)
>> >
>> > In practice, there are several ways to automatically identify flake
>> > tests.
>> >
>> > Each failing test could be retried and considered flaky only if it fails
>> > again in the same context.
>> >
>> > Ha, ouch, "same context" is darn hard to guarantee, what if the test is
>> > flaky only when some other tests run before it ?
>> >
>> > What if a flaky test always succeeds when run alone ?
>> >
>> > What if the flakiness is caused by a specific hardware (and by that I
>> > don't mean a brand or a product but a unique piece of hardware) ?
>> >
>> >
>> >     > If we don't have a good way of getting this list, could we wire up
>> >     > job retries and log when that occurs?
>> >
>> > While I've encountered the above ones in real life, those are not the
>> > majority, so re-trying the test itself only requires a specific test
>> > runner (having all projects use such a test runner is achievable,
>> > slowly, but who will refuse to be protected against flaky tests). We can
>> > log such occurrences but the log will be disconnected from the test
>> > results.
>> >
>> > Alternatively, a job can be re-run with only the failing tests and it
>> > becomes slightly easiest to both report and process the flaky
>> > tests. This doesn't require a specific test runner but a subunit test
>> > result ;) And a specific test loader to select only the failing tests
>> > from a previous run[1].
>> >
>> > Even jenkins should be able to display such results as long as we tweak
>> > the flaky test names (don't ever try to display the details of a test
>> > failure in jenkins if the test you're interested in is named like a
>> > previous failure, you'll always get the later ;). And even the dashboard
>> > could be taught to display green(pass)/yellow(flaky)/red(errors) instead
>> > of just a %.
>> >
>> > Note that unique test names, while not enforced by python's unittest, is
>> > a simple constraint we want for multiple reasons, the jenkins one
>> > mentioned above ; this is not a bug a far as I'm concerned, there is
>> > just no solution to ambiguous names, what if I tell you: 'test X' is
>> > failing and you're looking at the code of test X (the other one) ? A
>> > unique name is also needed when you want to select a single test to run.
>> >
>> > Now, there are a few risks associated with automatically re-trying flaky
>> > tests:
>> >
>> > - people will care less about them (in the same way any dev start
>> >   ignoring warnings when too many happen making it impossible to notice
>> >   the new ones),
>> >
>> > - the root cause of the flakiness can be in the ci engine, the test
>> >   infrastructure or the code itself. The right people should be involved
>> >   to fix them and as of today, all causes are inter-mixed so nobody has
>> >   a clear view of what *he* should fix.
>> >
>> > So there is more than automatic retries to get rid of the flaky tests,
>> > people should be involved to track and fix them.
>> >
>> >        Vincent
>> >
>> >     > Thanks!
>> >
>> >     > On 31 October 2013 11:23, Julien Funk <julien.funk@xxxxxxxxxxxxx>
>> wrote:
>> >     >> so, it would be great if we coul get a list of flaky tests re:
>> discussion
>> >     >> today.  I don't mind rerunning the tests automatically when they
>> fail, but I
>> >     >> think action should be mandatory on any flaky tests we discover
>> and we
>> >     >> should maintain a list of them somewhere with a process to deal
>> >     >> with them.
>> >
>> > Yeah, that's the social part ;) At least as important and probably far
>> > more than the technical part ;)
>> >
>> >      Vincent
>> >
>> >
>> > [1]: Test loaders, filters, results and runners already exist in
>> > lp:selenium-simple-test (and lp:subunit of course). Teaching the test
>> > runner to re-run failing tests could be added easily.
>> >
>> > --
>> > Mailing list: https://launchpad.net/~canonical-ci-engineering
>> > Post to     : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~canonical-ci-engineering
>> > More help   : https://help.launchpad.net/ListHelp
>>
>>
>>
>> --
>> Francis Ginther
>> Canonical - Ubuntu Engineering - Continuous Integration Team
>>
>> --
>> Mailing list: https://launchpad.net/~canonical-ci-engineering
>> Post to     : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~canonical-ci-engineering
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>

References