Metrics! Bah! We’ve measured in IT for years all sorts of things like: quality, cost, safety, people, customers, availably, reliability, continuity, and so forth. Every major framework has suggested measures but they don’t seem to be able to help me get my teams to satisfy my customers or stakeholders. HELP!!!
This is the dilemma of most IT leaders. How do you know what is good, bad, or great in terms of what is occurring in your organisation? How do you set some measures to help your people and suppliers understand and know how they’re performing against agreed expectations? Let me share a story with you, and it goes back four decades to when I first started in IT at a large Texas bank.
Taking Us Back to Texas
My CIO, ex-IBM, ran almost the entire IT division on one metric: MTBK (Mean Time Between Kicking). Not when he kicked us, but when he got kicked.
You see he walked the halls of the bank, went to branches, and visited suppliers weekly. Every time he heard: “Ah your IT is terrible” or “Your IT is not helping me” or “Why did you make that change?” then he marked it down. When he returned to his office he went to a board and logged the number of ‘kicks’ he’d received.
Over a short period of time he noticed something. When we did IT changes, the number of kicks was higher. The other thing he noticed was that there was never a day with zero ‘kicks’.
The thing is, my CIO, he wanted zero ‘kicks’ for at least one day. So he challenged his team to come up with metrics in their area that would help the IT division have a day of zero “kicks’, aka no bad comments. If we did this he promised a night out at a major restaurant.
He could have told us what the metrics he wanted us to hit were, but instead he coached us to find metrics that aligned across all the teams to help make quality and performance as great as possible. Development looked at how they found defects and improved testing by using a Right First Time metric. Development and PMO realised though that issues happen so they asked the Service Desk to help create a robust incident handling process to minimise downtime or impact of an incident. Infrastructure added performance metrics and used these to improve how applications or services worked such as load balancing, more memory when needed, tuning of databases, etc.
Think about what we were really doing though. Think about the problems we were finding and addressing not just in our silo but across the silos. In my role as Service Desk Manager I could fix nothing. I had to find a way to use my team to work with other areas of IT to develop faster incident response and resolution times and they had to find ways to stop incidents from occurring or at least help my team. Collaboration – not just here “read and do this” communication.
Three months later we had our first zero day. I remember my CIO walking into the computer room and shaking every persons hand. He then went to Dev, Tech Support, the Service Desk, and he called every supplier. The impact was amazing.
After that we never had to be told to aim for a week or a month or a year of zero ‘kicks’ – we did that ourselves. We never made a year, but there were times where we did achieve an entire month free of complaints.
So, what measures did we use?
We used:
- Right first time
- Escalation is bad
- Monitor-Alert-Respond Fast
- No rollback, no roll forward
- Quality before being on time
Some of these you will recognise from suggested metrics in DevOps or ITSM. A few we made up just to make it fun like we had a measure to count the bouncing incident (number of times we escalated and it bounced back for more information). The challenge was to have no more than one bounce.
None of these were ITIL-based, but then this was before ITIL existed. We also were not afraid to prune our KPI tree. Yes, it you think about the main trunk: no bad comments, then all of the branches from all of the areas linked to the trunk to keep it healthy. If a metric no longer mattered or needed to be adjusted (pruned) we did so as a team and since we were all working on the same tree, we had to do this collaboratively.
Our metrics became linked to customer satisfaction. Any metric that could not be linked to customer satisfaction was removed. An example would be: we were only down for 80 minutes so we hit our SLA. We looked harder and saw that 80 minutes at end of month (payday for many people) was not really acceptable. The other metric we had was Mean Time Between Failures. Hey it has been 4 days since we had an issue!!! Who cares you have an issue and it took you hours to get it resolved was the response back from our customers. Out went MBTI to be replaced by Mean Time to Get It Back Up.
The better we became, the less it cost to manage our services. No staff loss. Staff instead were busier as we were able to offer more services that the bank wanted so training time went up. Quality up, performance up, customers satisfied up, employees satisfied up, costs down. Yes this was not easy and yes we still had our bad days. But: we had the trust of the bank that we would get it right.
So how do YOU help your teams to do better, faster, and safer IT?
Do you let them set their own goals and allow them the time, training and environment to achieve them? Do you celebrate success?
Key Performance Indicators (KPIs) are forward looking indicators. They should help teams make immediate decisions. They should act as a guide that what they are doing is good, bad, or great in terms of quality, cost, safety, performance, and satisfaction. Let your teams create these indicators that are KEY to them. Then just watch the magic happen!
Want to learn more?
This is the very topic that I will be discussing at the upcoming itSMF UK Conference 20th-21st November. Join me for what is set to be one of the best itSMF events in many years, and learn a thing or two about improving your metrics to help ensure you actually deliver business value.