← Back to team overview

jiocloud-devops team mailing list archive

[Bug 1468176] Re: ct1 controller was restarted but we can't trace who started it

 

Some contextual info attached for completeness:

anshup [4:46 PM] 
soren: something (puppet?) is restarting contrail-collector.. ideas?

soren [4:46 PM] 
Puppet would very likely do it.

soren [4:46 PM]
It would say so in syslog.

anshup [4:47 PM] 
checking

anshup [4:52 PM]
soren: looks like it gets restarted during package upgrades.. whats the right way to disable it?

soren [4:52 PM] 
Uninstall it.

soren [4:52 PM]
The philosophy is: If you install it, it's because you want it running.

amar [4:54 PM] 
in syslog it got restarted at

amar [4:54 PM]
Jun 19 06:38:21 ct1-production logger: contrail-collector start/running, process 44937
Jun 23 08:21:33 ct1-production logger: contrail-collector start/running, process 48287

amar [4:54 PM]
19 and today

anshup [4:54 PM] 
amar: yeah, look at the preceding lines, package upgrade was happening at that time..

amar [4:55 PM] 
but it is not done by puppet it seems

amar [4:55 PM]
will have to check puppet logs

amar [4:57 PM]
tasks done by puppet contains puppet-user in log

anshup [4:57 PM] 
File_line[sensitive_service_contrail-collector]: !ruby/object:Puppet::Resource::Status
     resource: File_line[sensitive_service_contrail-collector]
     file: /usr/share/puppet/modules/rjil/manifests/system/sensitive_services/activator.pp
     line: 14
     evaluation_time: 0.000183772
     change_count: 0
     out_of_sync_count: 0
     tags: 
       - file_line
       - sensitive_service_contrail-collector
       - "rjil::system::sensitive_services::activator"
       - rjil
       - system
       - sensitive_services
       - activator
       - contrail-collector
       - class
       - "rjil::system::sensitive_services"
       - "rjil::base"
       - base
       - node
       - ctd
     time: 2015-06-23 08:22:04.843132 +00:00

anshup [4:57 PM]
amar: ^^ from puppet reports..

anshup [4:58 PM]
so it was running before this..? since change count is 0?

amar [4:58 PM] 
yup
anshup 
[5:01 PM] 
Added Untitled in operation_issues  
Jun 23 08:21:02 ct1-production logger: Unpacking contrail-dns (1.21+3288+38d2a21) over (1.21+3287+03bcb5c) ...
Jun 23 08:21:03 ct1-production logger: Preparing to unpack .../contrail-control_1.21+3288+38d2a21_amd64.deb ...
Jun 23 08:21:03 ct1-production kernel: [10183944.712468] init: contrail-control main process (44731) killed by TERM si
gnal
Jun 23 08:21:03 ct1-production logger: contrail-control stop/waiting
Jun 23 08:21:03 ct1-production logger: Unpacking contrail-control (1.21+3288+38d2a21) over (1.21+3287+03bcb5c) ...
Jun 23 08:21:03 ct1-production logger: Preparing to unpack .../contrail-analytics_1.21+3288+38d2a21_amd64.deb ...
Jun 23 08:21:03 ct1-production kernel: [10183945.214814] init: contrail-analytics-api main process (44899) killed by T
ERM signal
Jun 23 08:21:03 ct1-production logger: contrail-analytics-api stop/waiting
+ 84 more lines...
8KB Plain Text • New window • View raw • Add comment

anshup [5:02 PM] 
Jun 23 08:20:01 ct1-production logger: Yes, there is an update pending

anshup [5:02 PM]
so looks like it was triggered due to the upgrades

amar [5:02 PM] 
how the service was stopped earlier, manually or through puppet

amar [5:02 PM]
?

anshup [5:03 PM] 
manually

anshup [5:03 PM]
afaik

anshup [5:03 PM]
soren: in this case I guess it would be better to just do service ensure stopped than uninstall?

amar [5:05 PM] 
checking the whole flow of sensitive services

amar [5:10 PM] 
"/etc/sensitive_services" contains the service name which should not be started during the initial bootstrappig package install

amar [5:11 PM]
and activator deletes the line from "/etc/sensitive_services"

amar [5:11 PM]
so in ct1 there is nothing in "/etc/sensitive_services"

anshup [5:11 PM] 
yes

amar [5:12 PM] 
so what does puppet report is telling us

amar [5:13 PM]
nothing is changed for the resource sensitive_service_contrail-collector

amar [5:13 PM]
which is fine

amar [5:14 PM]
but how contrail-collector started

anshup [5:15 PM] 
packages were upgraded and service restarted. right?

anshup [5:15 PM]
since there is nothing stopping it from starting

amar [5:15 PM] 
yup that could happen

anshup [5:16 PM] 
so we can remove collector from service list before its passed to activator (edited)

amar [5:21 PM] 
we can do that , or we can remove collector itself :simple_smile:

amar [5:24 PM]
erb and activator are using same array

amar [5:26 PM]
@anshup^^

anshup [5:27 PM] 
yup.. so we can do a pop before passing it to collector

soren [6:18 PM] 
anshup: Sure, making puppet disable and even uninstall the service if it's not enabled is a fine idea.

** Changed in: jio
       Status: New => Fix Released

** Changed in: jio
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Reliance
Jio DevOps, which is a bug assignee.
https://bugs.launchpad.net/bugs/1468176

Title:
  ct1 controller was restarted but we can't trace who started it

Status in Jio:
  Fix Released

Bug description:
  We probably need some sort of monitoring in place in order to track
  these events/commands being executed. Placeholder to discuss.

To manage notifications about this bug go to:
https://bugs.launchpad.net/jio/+bug/1468176/+subscriptions


References