canonical-ci-engineering team mailing list archive

Thread
Date

Re: proposal for next sprint

To: Francis Ginther <francis.ginther@xxxxxxxxxxxxx>
From: Thomi Richards <thomi.richards@xxxxxxxxxxxxx>
Date: Tue, 5 May 2015 08:49:13 +1200
Cc: canonical-ci-engineering <canonical-ci-engineering@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAB2r3jLBETXK=7NEoTA32iJ5nigOz=oz3ybRqjjiUjLOWHa5oQ@mail.gmail.com>

Hi,

Thanks for the reply Francis,

On Tue, May 5, 2015 at 7:37 AM, Francis Ginther <
francis.ginther@xxxxxxxxxxxxx> wrote:

>
> I think there are a number of statistical metrics we should monitoring,
> and this should really be a part of sprint planning. Like all criteria, we
> need to have an idea of what metrics would be useful for the given
> solution. Attempting to come up with all possible metrics up front would
> lead to many that would only add noise. If we would have had some basic
> metrics in place from the beginning (and monitored them) we would have had
> better insight into the impacts of the cloud-config additions and ideally
> had some better performance comparisons with the existing VM solution.
>

Indeed. There's a card for that:

https://trello.com/c/cWbIbcDa/171-we-didn-t-consider-performance-metrics-as-we-developed-the-system

:D

I think we all understood the importance of logging data, but kind of
dropped the ball on stats data. In the future, I think we should think of
"logging & metrics" as being integral parts of developing a new system.
Lesson learned!

>
> Celso has mentioned using ELK plugins for reporting metrics, this could be
> another alternative. I have not looked at this myself.
>

I'd love to get some more information on ELK plugins. I don't have much
experience with elasticsearch, and the little bit I tried to do (backing up
and restoring elasticsearch when we migrated the elk deployment to
production) proved to be tricky.

>
>
<snip>

> I've only had a chance to skim the resources so far. From past experience,
> push metrics worked for everything, but then again, when it's all that was
> available (thinking statsd/graphite) that's all you think about :-).
>
>
Right - there's a good FAQ answer here:
http://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push
?

but it's important to note that it supports both. I imagine we'd want
'pull' for our core services (rabbit, logstash, kibana, and anything behind
a floating IP like adt-cloud-service), and a 'push' for all our ephemeral
services, and anything where we scale out to multiple nodes.

> So I'm curious - does anyone else see this need? What's the correct way to
>> propose work for the next sprint? I think this would be a nice piece of
>> work for someone to work on for the next few weeks. If no one else wants
>> to, I'll certainly volunteer myself...
>>
>
> I really like the utility we've established with logging to ELK. It's
> become quite painless to add logging content with rich meta-data round it.
> If there is a metrics equivalent, I'm all for it.
>
>
awesome.

I'm really keen to hear from the rest of the team as well. Anyone have any
insights here?

-- 
Thomi Richards
thomi.richards@xxxxxxxxxxxxx

Follow ups

Re: proposal for next sprint
From: Paul Larson, 2015-05-06
Re: proposal for next sprint
From: Evan Dandrea, 2015-05-06

References

proposal for next sprint
From: Thomi Richards, 2015-05-04
Re: proposal for next sprint
From: Francis Ginther, 2015-05-04