canonical-ci-engineering team mailing list archive
-
canonical-ci-engineering team
-
Mailing list archive
-
Message #01235
Re: Strange test failure on cyclops-node13
Hi,
Apologies for not coming back to you earlier.
This,[1], is the list of entries that were present in /tmp in the node
(cyclops-13) when I first looked into it after the failures. Just for your
information, there were also some leftover xvfb processes (similar to [2]
but more) owned by pbuilder running on the same node.
To take care of both of those leaks on every cyclops node, we are now
periodically running a clean-up job [3] to remove any leftover content from
/tmp when a node is not running any jobs as well as to kill any leftover
processes owned by pbuilder.
This should take care of this particular issue in the future.
Thanks
[1]: http://paste.ubuntu.com/14003898/
[2]: http://paste.ubuntu.com/14004905/
[3]: http://s-jenkins.ubuntu-ci:8080/job/cleanup-cyclops-nodes/
On Mon, Dec 14, 2015 at 2:27 PM, Evan Dandrea <evan.dandrea@xxxxxxxxxxxxx>
wrote:
> Siva is looking into this and will update you with his progress. Thanks!
> On Mon, Dec 14, 2015 at 00:10, Michi Henning <michi.henning@xxxxxxxxxxxxx>
> wrote:
>
>> Could you fix node13 as a matter of urgency please? It’s been damn near
>> impossible for us to merge anything because we keep falling over the broken
>> builder.
>>
>> Thanks!
>>
>> Michi.
>>
>>
>> On 14 Dec 2015, at 11:41 , James Henstridge <
>> james.henstridge@xxxxxxxxxxxxx> wrote:
>>
>> On 14 December 2015 at 09:26, Michi Henning <michi.henning@xxxxxxxxxxxxx>
>> wrote:
>>
>>
>> On 12 Dec 2015, at 18:17 , James Henstridge <
>> james.henstridge@xxxxxxxxxxxxx>
>> wrote:
>>
>> So, looking at the xvfb-run man page, it sends the X server logs to
>> /dev/null by default:
>>
>> -e file, --error-file=file
>> Store output from xauth and Xvfb in file. The default
>> is
>> /dev/null.
>>
>> We should adjust the test to do something useful with those logs (e.g.
>> print them out if the test fails). Presumably the cause of the test
>> failure will be obvious once we can see that.
>>
>>
>> Did that. Here is what comes out:
>>
>> 8: Test command:
>>
>> /tmp/buildd/thumbnailer-2.3+16.04.20151102.2bzr314pkg0vivid283/tests/qml/run_test.sh
>>
>> "/tmp/buildd/thumbnailer-2.3+16.04.20151102.2bzr314pkg0vivid283/obj-arm-linux-gnueabihf/stderr.log"
>>
>> "/tmp/buildd/thumbnailer-2.3+16.04.20151102.2bzr314pkg0vivid283/obj-arm-linux-gnueabihf/plugins"
>> 8: Test timeout computed to be: 1500
>> 8: QXcbConnection: Could not connect to display :109
>> 8: Aborted
>> 8: _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
>> 8: _XSERVTransMakeAllCOTSServerListeners: server already running
>> 8: (EE)
>> 8: Fatal server error:
>> 8: (EE) Cannot establish any listening sockets - Make sure an X server
>> isn't
>> already running(EE)
>> 8: _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
>> 8: _XSERVTransMakeAllCOTSServerListeners: server already running
>> 8: (EE)
>>
>>
>> Here's the relevant code from xvfb-run script:
>>
>> SERVERNUM=99
>>
>> # Find a free server number by looking at .X*-lock files in /tmp.
>> find_free_servernum() {
>> # Sadly, the "local" keyword is not POSIX. Leave the next line
>> commented in
>> # the hope Debian Policy eventually changes to allow it in /bin/sh
>> scripts
>> # anyway.
>> #local i
>>
>> i=$SERVERNUM
>> while [ -f /tmp/.X$i-lock ]; do
>> i=$(($i + 1))
>> done
>> echo $i
>> }
>>
>>
>> So to get these results, there must have been /tmp/.X99-lock to
>> /tmp/.X108-lock files on the system, and the /tmp/.X11-unix/X109
>> socket either existed or couldn't be created due to permission issues.
>>
>> The code looks like it could be prone to race conditions if other jobs
>> were trying to start of Xvfb servers at the same time, but that
>> wouldn't explain the repeated failures. It seems more likely that
>> some previous test run (or multiple runs) has left garbage behind
>> under /tmp.
>>
>> James.
>>
>>
>> --
>> Mailing list: https://launchpad.net/~canonical-ci-engineering
>> Post to : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~canonical-ci-engineering
>> More help : https://help.launchpad.net/ListHelp
>>
>
> --
> Mailing list: https://launchpad.net/~canonical-ci-engineering
> Post to : canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~canonical-ci-engineering
> More help : https://help.launchpad.net/ListHelp
>
>
Follow ups
References
-
Moving mako testing to krillin
From: Francis Ginther, 2015-11-06
-
Re: Moving mako testing to krillin
From: Francis Ginther, 2015-11-10
-
Move persistent-cache-cpp and persistent-cache-cpp-devel to xenial
From: Michi Henning, 2015-12-09
-
Re: Move persistent-cache-cpp and persistent-cache-cpp-devel to xenial
From: Francis Ginther, 2015-12-10
-
Re: Move persistent-cache-cpp and persistent-cache-cpp-devel to xenial
From: Michi Henning, 2015-12-10
-
Strange test failure on cyclops-node13
From: Michi Henning, 2015-12-12
-
Re: Strange test failure on cyclops-node13
From: Michi Henning, 2015-12-12
-
Re: Strange test failure on cyclops-node13
From: Michi Henning, 2015-12-12
-
Re: Strange test failure on cyclops-node13
From: James Henstridge, 2015-12-12
-
Re: Strange test failure on cyclops-node13
From: Michi Henning, 2015-12-14
-
Re: Strange test failure on cyclops-node13
From: James Henstridge, 2015-12-14
-
Re: Strange test failure on cyclops-node13
From: Michi Henning, 2015-12-14
-
Re: Strange test failure on cyclops-node13
From: Evan Dandrea, 2015-12-14