canonical-ci-engineering team mailing list archive
-
canonical-ci-engineering team
-
Mailing list archive
-
Message #01085
Re: proposal for next sprint
Hi,
Thanks for the reply Francis,
On Tue, May 5, 2015 at 7:37 AM, Francis Ginther <
francis.ginther@xxxxxxxxxxxxx> wrote:
>
> I think there are a number of statistical metrics we should monitoring,
> and this should really be a part of sprint planning. Like all criteria, we
> need to have an idea of what metrics would be useful for the given
> solution. Attempting to come up with all possible metrics up front would
> lead to many that would only add noise. If we would have had some basic
> metrics in place from the beginning (and monitored them) we would have had
> better insight into the impacts of the cloud-config additions and ideally
> had some better performance comparisons with the existing VM solution.
>
Indeed. There's a card for that:
https://trello.com/c/cWbIbcDa/171-we-didn-t-consider-performance-metrics-as-we-developed-the-system
:D
I think we all understood the importance of logging data, but kind of
dropped the ball on stats data. In the future, I think we should think of
"logging & metrics" as being integral parts of developing a new system.
Lesson learned!
>
> Celso has mentioned using ELK plugins for reporting metrics, this could be
> another alternative. I have not looked at this myself.
>
I'd love to get some more information on ELK plugins. I don't have much
experience with elasticsearch, and the little bit I tried to do (backing up
and restoring elasticsearch when we migrated the elk deployment to
production) proved to be tricky.
>
>
<snip>
> I've only had a chance to skim the resources so far. From past experience,
> push metrics worked for everything, but then again, when it's all that was
> available (thinking statsd/graphite) that's all you think about :-).
>
>
Right - there's a good FAQ answer here:
http://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push
?
but it's important to note that it supports both. I imagine we'd want
'pull' for our core services (rabbit, logstash, kibana, and anything behind
a floating IP like adt-cloud-service), and a 'push' for all our ephemeral
services, and anything where we scale out to multiple nodes.
> So I'm curious - does anyone else see this need? What's the correct way to
>> propose work for the next sprint? I think this would be a nice piece of
>> work for someone to work on for the next few weeks. If no one else wants
>> to, I'll certainly volunteer myself...
>>
>
> I really like the utility we've established with logging to ELK. It's
> become quite painless to add logging content with rich meta-data round it.
> If there is a metrics equivalent, I'm all for it.
>
>
awesome.
I'm really keen to hear from the rest of the team as well. Anyone have any
insights here?
--
Thomi Richards
thomi.richards@xxxxxxxxxxxxx
Follow ups
References