jiocloud-devops team mailing list archive
-
jiocloud-devops team
-
Mailing list archive
-
Message #00080
[Bug 1468176] Re: ct1 controller was restarted but we can't trace who started it
Some contextual info attached for completeness:
anshup [4:46 PM]
soren: something (puppet?) is restarting contrail-collector.. ideas?
soren [4:46 PM]
Puppet would very likely do it.
soren [4:46 PM]
It would say so in syslog.
anshup [4:47 PM]
checking
anshup [4:52 PM]
soren: looks like it gets restarted during package upgrades.. whats the right way to disable it?
soren [4:52 PM]
Uninstall it.
soren [4:52 PM]
The philosophy is: If you install it, it's because you want it running.
amar [4:54 PM]
in syslog it got restarted at
amar [4:54 PM]
Jun 19 06:38:21 ct1-production logger: contrail-collector start/running, process 44937
Jun 23 08:21:33 ct1-production logger: contrail-collector start/running, process 48287
amar [4:54 PM]
19 and today
anshup [4:54 PM]
amar: yeah, look at the preceding lines, package upgrade was happening at that time..
amar [4:55 PM]
but it is not done by puppet it seems
amar [4:55 PM]
will have to check puppet logs
amar [4:57 PM]
tasks done by puppet contains puppet-user in log
anshup [4:57 PM]
File_line[sensitive_service_contrail-collector]: !ruby/object:Puppet::Resource::Status
resource: File_line[sensitive_service_contrail-collector]
file: /usr/share/puppet/modules/rjil/manifests/system/sensitive_services/activator.pp
line: 14
evaluation_time: 0.000183772
change_count: 0
out_of_sync_count: 0
tags:
- file_line
- sensitive_service_contrail-collector
- "rjil::system::sensitive_services::activator"
- rjil
- system
- sensitive_services
- activator
- contrail-collector
- class
- "rjil::system::sensitive_services"
- "rjil::base"
- base
- node
- ctd
time: 2015-06-23 08:22:04.843132 +00:00
anshup [4:57 PM]
amar: ^^ from puppet reports..
anshup [4:58 PM]
so it was running before this..? since change count is 0?
amar [4:58 PM]
yup
anshup
[5:01 PM]
Added Untitled in operation_issues
Jun 23 08:21:02 ct1-production logger: Unpacking contrail-dns (1.21+3288+38d2a21) over (1.21+3287+03bcb5c) ...
Jun 23 08:21:03 ct1-production logger: Preparing to unpack .../contrail-control_1.21+3288+38d2a21_amd64.deb ...
Jun 23 08:21:03 ct1-production kernel: [10183944.712468] init: contrail-control main process (44731) killed by TERM si
gnal
Jun 23 08:21:03 ct1-production logger: contrail-control stop/waiting
Jun 23 08:21:03 ct1-production logger: Unpacking contrail-control (1.21+3288+38d2a21) over (1.21+3287+03bcb5c) ...
Jun 23 08:21:03 ct1-production logger: Preparing to unpack .../contrail-analytics_1.21+3288+38d2a21_amd64.deb ...
Jun 23 08:21:03 ct1-production kernel: [10183945.214814] init: contrail-analytics-api main process (44899) killed by T
ERM signal
Jun 23 08:21:03 ct1-production logger: contrail-analytics-api stop/waiting
+ 84 more lines...
8KB Plain Text • New window • View raw • Add comment
anshup [5:02 PM]
Jun 23 08:20:01 ct1-production logger: Yes, there is an update pending
anshup [5:02 PM]
so looks like it was triggered due to the upgrades
amar [5:02 PM]
how the service was stopped earlier, manually or through puppet
amar [5:02 PM]
?
anshup [5:03 PM]
manually
anshup [5:03 PM]
afaik
anshup [5:03 PM]
soren: in this case I guess it would be better to just do service ensure stopped than uninstall?
amar [5:05 PM]
checking the whole flow of sensitive services
amar [5:10 PM]
"/etc/sensitive_services" contains the service name which should not be started during the initial bootstrappig package install
amar [5:11 PM]
and activator deletes the line from "/etc/sensitive_services"
amar [5:11 PM]
so in ct1 there is nothing in "/etc/sensitive_services"
anshup [5:11 PM]
yes
amar [5:12 PM]
so what does puppet report is telling us
amar [5:13 PM]
nothing is changed for the resource sensitive_service_contrail-collector
amar [5:13 PM]
which is fine
amar [5:14 PM]
but how contrail-collector started
anshup [5:15 PM]
packages were upgraded and service restarted. right?
anshup [5:15 PM]
since there is nothing stopping it from starting
amar [5:15 PM]
yup that could happen
anshup [5:16 PM]
so we can remove collector from service list before its passed to activator (edited)
amar [5:21 PM]
we can do that , or we can remove collector itself :simple_smile:
amar [5:24 PM]
erb and activator are using same array
amar [5:26 PM]
@anshup^^
anshup [5:27 PM]
yup.. so we can do a pop before passing it to collector
soren [6:18 PM]
anshup: Sure, making puppet disable and even uninstall the service if it's not enabled is a fine idea.
** Changed in: jio
Status: New => Fix Released
** Changed in: jio
Importance: Undecided => Low
--
You received this bug notification because you are a member of Reliance
Jio DevOps, which is a bug assignee.
https://bugs.launchpad.net/bugs/1468176
Title:
ct1 controller was restarted but we can't trace who started it
Status in Jio:
Fix Released
Bug description:
We probably need some sort of monitoring in place in order to track
these events/commands being executed. Placeholder to discuss.
To manage notifications about this bug go to:
https://bugs.launchpad.net/jio/+bug/1468176/+subscriptions
References