Wednesday, October 31, 2012

Agile Metrics


Overview

There are no “standard” agile metrics, because what's easy to measure tends to distract us, and what's important to measure is hard to quantify.  The most important thing about Agile metrics is that we need to have a clear objective for using them--normally start with a hypothesis that, say, if defect count drops then lead time will go down too. Many of the metrics below would be used for a short period of time and then dropped when the objective is reached.

With that being said, I've seen clients dovetail push-and-pull metrics, like lead-time & defect count, so that they can be used for a longer period of time, and that if someone starts gaming one number they get penalized on the other.

The following may also be useful:

·       Lead Time
·       Defect count (at various phases; what’s a bug?)
·       Work in Progress
·       Code coverage
·       Unplanned Changes
·       Velocity (story points or story count per sprint)
·       Return on investment
·       Innovations per sprint
·       Artifacts generated
·       Slack time
·       Failure Load (firefighting time)
·       Iteration Burn-Down
·       Unfinished Stories
·       Customer Satisfaction
·       LOC (lines of code)
·       Un-deployed Stories
·       # Blocks
·       Budget/Schedule Compliance
·       Flow Efficiency (lead time / touch time)
·       Release Burn-Up

Definitions

Definitions and cross-references for all these metrics follow.

Lead Time
Defined:            Time from “concept to cash”, the total time it takes to develop an idea and sell it to a paying customer.
Caution:             It may be difficult to measure actual lead time, and many teams approximate lead time by capturing the time a request enters the development process and capturing the time it reaches the definition of done. This approximation may be a reasonable place to start measuring, but it may cause micro-optimization (changes that actually detract from corporate goals) or reduce customer discovery (learning what the customer would pay more for)
Side Effects:     If we blindly push this metric to a minimum, we may see:
·       increased defect count
·       reduced code coverage
·       increased failure load
Benefits:           customer satisfaction, flow efficiency, un-deployed stories, work in progress


Defect Count
Defined:            Total count of surprises, unexpected behavior, flaws, and shortcomings of the product identified during or after an iteration demo.
Caution:             Aggressive definitions of “defect” help everyone focus on customer satisfaction; anything short of the definition above will
Side Effects:     If we blindly push this metric to a minimum, we may see:
·       reduced velocity
·       reduced innovation
·       more unfinished stories
·       more blocks
Benefits:           code coverage, unplanned changes, customer satisfaction, lines of code

Work In Progress (WIP)
Defined:            Number of items we are actively working on. The higher the WIP, the more multi-tasking hurts our efficiency.
Caution:             While a WIP limit of 1 per person may seem ideal, research suggests it’s closer to 2 per person in the event a block prevents us from working the highest priority item.
Side effects:     If we blindly decrease this metric, we may see:
·       excessive slack time
Benefits:           lead time, defect count, velocity, ROI, unfinished stories, un-deployed stories, blocks, budget/schedule compliance, flow efficiency

Code Coverage
Defined:            Percentage of production code tested by the automated regression suite.
Caution:             Static and dynamic code coverage evaluation tools cannot tell us if the code just happened to be executed or if it was verified for proper behavior. The only strategy for full coverage of behavior is Test-Driven Design / TDD / BDD. Short of automation, we cannot find regressions fast enough to keep up with development.
Side effects:     If we blindly increase this metric, we may see:
·       increased failure load
·       decreased velocity
·       reduced innovation
Benefits:           lead time, defect count

Unplanned Changes
Defined:            Number of unanticipated change requests we were able to include in this product increment. Since Agile is all about being more responsive, this is a metric that shows how adaptive we’ve become.
Caution:             Tracking this metric could be burdensome—what counts as a change request? A font style change on the UI? An increase in scope? Pick a granularity to track and stick with it.
Side effects:     If we blindly increase this metric, we may see:
·       decreased velocity (churn)
·       excessive innovation (lack of focus)
Benefits:           customer satisfaction, return on investment, lead time

Velocity
Defined:            Abstract quantity of work that can be completed in a given iteration. Velocity automatically accounts for regular meeting overhead and business-as-usual activities. Velocity is often reported in units of Story Points, Ideal Days, Ideal Hours, or Story Count. Story Points tend to encompass effort, doubt & complexity, so they’re packed with more information than a simple estimate.
Caution:             For large organizations, it helps to normalize Story Points on approximately 1 Ideal Day to simplify strategic & roadmap level planning. Story Points should not used to evaluate past performance—they’re only intended for forward planning.
Side effects:     If we blindly increase this metric, we may see:
·       reduced customer satisfaction
·       increased failure load
·       reduced artifacts generated
·       reduced innovation
·       reduced slack time
Benefits:           lead time, budget/schedule compliance, flow efficiency, release burn-up

Return On Investment
Defined:            Percent earnings based on revenue, capital investment and operational cost.
Caution:             Many teams don’t have access to this data, or don’t track it long enough to see the impact of their work on ROI. Yet it’s key to justifying investment in software.
Side effects:     If we blindly increase this metric, we may see:
·       reduced innovation
·       reduced customer satisfaction
·       increased failure load
Benefits:           lead time, budget/schedule compliance

Innovations per sprint
Defined:            As an agile team becomes more cross-functional, the whole team gains a greater appreciation for what the customer finds valuable. When this results in feature ideas that the Product Owner selects for the backlog, we consider this a success of the whole team.
Caution:             Innovation must be customer-centric—in Kano’s terms, either a linear feature or an exciter/delighter.
Side effects:     If we blindly increase this metric, we may see:
·       reduced release burn-up
·       excessive unplanned changes
·       increased lead time
Benefits:           customer satisfaction, return on investment

Artifacts Generated
Defined:            Any document or non-source-code electronic file generated as a result of the software development process is an artifact. We may want to track help files generated to get a sense of whether our development is sustainable.
Caution:             Some artifacts were historically created for visibility into a long development cycle. If you can rely on automated customer tests instead, this type of “executable specification” will be demonstrably current.
Side effects:     If we blindly increase this metric, we may see:
·       increased lead time
·       increased work in progress
·       reduced budget/schedule compliance
Benefits:           n/a

Slack Time
Defined:            Buffer, maintenance, or creative work that is tangentially related to prioritized product backlog items. Just as a highway has serious congestion at 80% utilization, we see software teams loaded above 80% see serious performance bottlenecks.
Caution:             Slack time is not vacation or goofing off. It is one of the only steps in an agile SDLC that consistently reduces technical debt.
Side effects:     If we blindly increase this metric, we may see:
·       reduced velocity
·       more unfinished stories
·       more un-deployed stories
Benefits:           lead time, failure load, innovations per sprint, customer satisfaction

Failure Load
Defined:            Percent of time spent fixing defects. Failure load is waste; it’s forcing our customers to pay for features twice. We want to avoid failure load whenever practical. You can’t go fast without high quality!
Caution:             n/a
Side effects:     If we blindly decrease this metric, we may see:
·       reduced velocity
·       reduced innovation
·       more unfinished stories
·       more blocks
Benefits:           code coverage, unplanned changes, customer satisfaction, lines of code

Iteration Burn-Down
Defined:            Bar chart showing hours or story points remaining per day of the iteration. The trajectory of the bars shows whether we’re on schedule or not.
Caution:             Without small enough stories, teams will see a “clumping” effect where most of the work tends to get finished at the end of the iteration. This is not desirable—find ways to get to done earlier so there is time to make unforeseen adjustments.
Side effects:     If we blindly improve this metric, we may see:
·       increased blocks
Benefits:           unfinished stories, work in progress, return on investment, customer satisfaction, budget/schedule compliance

Unfinished Stories
Defined:            Any story that did not reach the “definition of done” in the same iteration in which it was begun is an unfinished story. A product owner may cancel, re-schedule, split, re-scope, or defer such a story.
Caution:             Unfinished stories come from a lack of discipline. There’s always a way to negotiate a good story so that it can be split or completed this iteration.
Side effects:     If we blindly decrease this metric, we may see:
·       n/a
Benefits:           customer satisfaction, lead time, release burn-up

Customer Satisfaction
Defined:            increased customer retention or increased revenue
Caution:             Learning about customer retention is slow, and we need safe sandboxes in which to experiment and learn more quickly (e.g., pilot markets or beta tests).
Side effects:     If we blindly increase this metric, we may see:
·       reduced innovation
·       reduced slack time
Benefits:           return on investment

Lines of Code (LOC)
Defined:            One source-code line; from an agile perspective, a line of code increases the risk of system failure and increases the cost of maintenance. We seek elegance, clean code, and avoid duplication in the code base.
Caution:             Mature software shouldn’t always grow. At some point, re-factoring will keep the LOC count stable while we continue to add features. At the same time, if we make code difficult to read or understand, we’ll introduce additional risk for system maintainers.
Side effects:     If we blindly decrease this metric, we may see:
·       increased defect count
Benefits:           lead time, failure load

Un-deployed Stories
Defined:            stories that have reached a team’s definition of done but are not yet actually earning money or being used by a customer
Caution:             until a paying customer uses our product increment, there is risk that delivery teams will need to get involved in supporting it
Side effects:     If we blindly decrease this metric, we may see:
·       decreased customer satisfaction (a product that changes too often?)
Benefits:           lead time, defect count, work in progress, unplanned changes, innovations

# Blocks
Defined:            The number of impediments that development teams have asked for help on.
Caution:             A large number of blocks may mean teams aren’t being as proactive as they could be, or they don’t have an adequate “definition of ready” before accepting work.
Side effects:     If we blindly decrease this metric, we may see:
·       unfinished stories
·       undeployed stories
Benefits:           lead time

Budget/Schedule Compliance
Defined:            compare the estimate of a strategic or roadmap level portfolio item with the team-level estimates (for completed stories only)
Caution:             until a product increment is considered deployable (a minimally marketable feature), we cannot make any assessment on its cost
Side effects:     If we blindly optimize this metric, we may see:
·       reduced innovation
·       fewer unplanned changes
·       reduced customer satisfaction
Benefits:           increased return on investment, reduced lead time, reduced work in progress

Flow Efficiency
Defined:            flow efficiency = lead time / touch time; that is, the amount of time to go through the whole system divided by the actual amount of time someone is actively working on it.
Caution:             Flow efficiency highlights wait time in the existing process, though we really need to focus on value added time. Use this to identify red flags but only as a secondary method to value-added optimization.
Side effects:     If we blindly decrease this metric, we may see:
·       excessive slack time
·       excessively limited WIP
Benefits:           lead time, return on investment

Feedback

What's missing? What would you change?

5 comments:

john miller said...

Great post! Every metric should come with a warning label, just like medicine you buy. I will be archiving this post for future use! Thank you for sharing this.

-John

Mark Levison said...

Andre - thank you this post is beautiful. I just shared with a CSM class. Their reaction nearly every measure can be gamed and so perhaps we should avoid metrics.

Android Watch Phone said...

You have shared very nice post. I think you have great collection of it. I think we should avoid the metrics and this is the message which you want to convey by your post. Keep it up.

Huet Landry said...

Great post! I like the framework for the metrics definitions.

The Side effects sections contain indicators for when a metric might be "stopped". It might be useful to reword the first line as follows: "If any of the following begin to be noticed, consider removing this metric for a time."

I am not sure why you do not consider source code an artifact. To me, it is one of the primary artifacts.

Under "Defect Count" the "Caution" section appears truncated.

Unknown said...

Your article was one of the inspirations to describe my real-life examples of metrics enriched by decisions based on them. https://www.linkedin.com/pulse/real-life-agility-metrics-visualizations-lead-you-piotr-maksimczyk