← Back to team overview

canonical-ci-engineering team mailing list archive

Re: Strange test failure on cyclops-node13

 

Hi,

Apologies for not coming back to you earlier.

This,[1], is the list of entries that were present in /tmp in the node
(cyclops-13) when I first looked into it after the failures. Just for your
information, there were also some leftover xvfb processes (similar to [2]
but more)  owned by pbuilder running on the same node.

To take care of both of those leaks on every cyclops node, we are now
periodically running a clean-up job [3] to remove any leftover content from
/tmp when a node is not running any jobs as well as to kill any leftover
processes owned by pbuilder.

This should take care of this particular issue in the future.

Thanks



[1]: http://paste.ubuntu.com/14003898/
[2]: http://paste.ubuntu.com/14004905/
[3]: http://s-jenkins.ubuntu-ci:8080/job/cleanup-cyclops-nodes/

On Mon, Dec 14, 2015 at 2:27 PM, Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx>
wrote:

> Siva is looking into this and will update you with his progress. Thanks!
> On Mon, Dec 14, 2015 at 00:10, Michi Henning <michi.henning@xxxxxxxxxxxxx>
> wrote:
>
>> Could you fix node13 as a matter of urgency please? It’s been damn near
>> impossible for us to merge anything because we keep falling over the broken
>> builder.
>>
>> Thanks!
>>
>> Michi.
>>
>>
>> On 14 Dec 2015, at 11:41 , James Henstridge <
>> james.henstridge@xxxxxxxxxxxxx> wrote:
>>
>> On 14 December 2015 at 09:26, Michi Henning <michi.henning@xxxxxxxxxxxxx>
>> wrote:
>>
>>
>> On 12 Dec 2015, at 18:17 , James Henstridge <
>> james.henstridge@xxxxxxxxxxxxx>
>> wrote:
>>
>> So, looking at the xvfb-run man page, it sends the X server logs to
>> /dev/null by default:
>>
>>      -e file, --error-file=file
>>             Store output from xauth  and  Xvfb  in  file.   The  default
>> is
>>             /dev/null.
>>
>> We should adjust the test to do something useful with those logs (e.g.
>> print them out if the test fails).  Presumably the cause of the test
>> failure will be obvious once we can see that.
>>
>>
>> Did that. Here is what comes out:
>>
>> 8: Test command:
>>
>> /tmp/buildd/thumbnailer-2.3+16.04.20151102.2bzr314pkg0vivid283/tests/qml/run_test.sh
>>
>> "/tmp/buildd/thumbnailer-2.3+16.04.20151102.2bzr314pkg0vivid283/obj-arm-linux-gnueabihf/stderr.log"
>>
>> "/tmp/buildd/thumbnailer-2.3+16.04.20151102.2bzr314pkg0vivid283/obj-arm-linux-gnueabihf/plugins"
>> 8: Test timeout computed to be: 1500
>> 8: QXcbConnection: Could not connect to display :109
>> 8: Aborted
>> 8: _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
>> 8: _XSERVTransMakeAllCOTSServerListeners: server already running
>> 8: (EE)
>> 8: Fatal server error:
>> 8: (EE) Cannot establish any listening sockets - Make sure an X server
>> isn't
>> already running(EE)
>> 8: _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
>> 8: _XSERVTransMakeAllCOTSServerListeners: server already running
>> 8: (EE)
>>
>>
>> Here's the relevant code from xvfb-run script:
>>
>> SERVERNUM=99
>>
>> # Find a free server number by looking at .X*-lock files in /tmp.
>> find_free_servernum() {
>>    # Sadly, the "local" keyword is not POSIX.  Leave the next line
>> commented in
>>    # the hope Debian Policy eventually changes to allow it in /bin/sh
>> scripts
>>    # anyway.
>>    #local i
>>
>>    i=$SERVERNUM
>>    while [ -f /tmp/.X$i-lock ]; do
>>        i=$(($i + 1))
>>    done
>>    echo $i
>> }
>>
>>
>> So to get these results, there must have been /tmp/.X99-lock to
>> /tmp/.X108-lock files on the system, and the /tmp/.X11-unix/X109
>> socket either existed or couldn't be created due to permission issues.
>>
>> The code looks like it could be prone to race conditions if other jobs
>> were trying to start of Xvfb servers at the same time, but that
>> wouldn't explain the repeated failures.  It seems more likely that
>> some previous test run (or multiple runs) has left garbage behind
>> under /tmp.
>>
>> James.
>>
>>
>> --
>> Mailing list: https://launchpad.net/~canonical-ci-engineering
>> Post to     : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~canonical-ci-engineering
>> More help   : https://help.launchpad.net/ListHelp
>>
>
> --
> Mailing list: https://launchpad.net/~canonical-ci-engineering
> Post to     : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~canonical-ci-engineering
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References