Print Page | Contact Us | Report Abuse | Sign In | Join
Calculating the Service Design Figure
Thread Score:
Page 1 of 1
Thread Actions

22/07/2014 at 12:57:26 GMT
Posts: 38
Calculating the Service Design Figure
Reposted from Facebook for Matt Vincent

Hi, I was wondering if you could help or point me in the right direction. I am in the process of designing an Availability Assessment of IT Services and currently we have a section called the IT Service Design Figure. This is basically calculated by taking the SLA of all the components of that service and multiplying those percentages by each other and coming out with an overall Design figure (IT Service OLA 99.50% made up of say 5 components and they each have their own OLA's, calculated in this way 99% X 99.50% x 98% x 98% x 98% etc) The problem with this is that it takes each component in isolation with no resiliency or fail over. The final design figure in % is always lower than the agreed service level, sometimes by a long way if you have lots of components that make up that IT Service. Do you happen to know of a more accurate way of calculating this design figure or know of any reading material which could help me pull out that figure? Any help is very much appreciated.

Thanks,

Matthew



Mark Lillycrop
Marketing and Publishing Manager
ITSMF UK
mark.lillycrop@itsmf.co.uk


24/07/2014 at 09:19:12 GMT
Posts: 3
Some initial thoughts for you...

Firstly you need to determine whether you wish to report against uptime or downtime availability, as this will drive which is the best calculation method for you to use.

You essentially have 3 options for availability reporting:
The first option (DOWNTIME) requires that you determine your resilience capabilities e.g. that you offer single server, active:passive, active:active, multiple active – each of these will drive an availability capability. You would need to do this for each type of component. This is probably the preferred method, but takes time to set up, and also some strategic decisions around what components are on the critical path to deliver service. The theory behind this says for example that rather than 99.5% OLA, where a service uses an active:active configuration, that capability is actually closer to 99.75% and so on. You will also need to consider the capabilities within virtual as opposed to physical delivery. Once you have completed your modelling, you can then continue to use your current calculation method using whatever monitoring processes you use today.

The second option (UPTIME) still assesses according to resilience but starts from the top down. So you would need to determine your patterns of capability. These would turn in to infrastructure -, database -, application -, and web service patterns that could be utilised by any service. Your availability reporting would then be a case of monitoring against pattern capability. All IT services would then need to be designed utilising these patterns. For example, you may have a service that requires top notch infrastructure capability, standard database, standard application and top notch web services; the capability associated with these may be 99.999*99*99*99.99. In reality you would design in monitoring that would document MTBF and uptime capability of service. This option is much easier to automate but does have the constraint that it does not take into account customer perception of service.

The final option (DOWNTIME) is purely to utilise your Incident Management Logging Tool – to ensure that all Severity 1 & 2 Incidents have associated downtime and cause statements. This is a much more raw method of calculation but can take into account customer perception of service (i.e. you can record both the actual downtime compared to when the Incident was logged and then approved for closure by the Customer).

Availability is a complex area and how you progress will depend almost entirely on your strategic vision for defining and reporting against service. I wish you success!!

Karen


02/08/2014 at 07:26:59 GMT
Posts: 15
Very helpful. Calculating the cost of downtime in terms of the effect on a customer's business can be quite scary especially if the service is critical


Last edited 02 August 2014
05/12/2014 at 14:57:22 GMT
Posts: 4

Mark,

I have done quite a lot of research and analysis into service availability. It certainly is not as simple as multiplying together the SLA figures for individual components for all sorts of reasons.

I created a white paper titled "Defining Availability for an IT Service" which was published on the previous version of the UK itSMF web site somewhere (but I can't find it now) as well as a corresponding presentation which you can find on Slideshare at http://www.slideshare.net/stuartrance7/stuart-rance-defining-availability-for-an-it-service

 

EDITED TO ADD

I found the white paper, you can download it from

http://www.itsmf.co.uk/?page=whitepapers_0001



Last edited 05 December 2014
11/12/2014 at 13:11:13 GMT
Posts: 49
I heard Stuart give an excellent presentation on this at ITSM12 (I think). The key point I took away was that there are different ways of measuring and if you limit yourself to one you won't get the real picture.

What I've done since then has been to get Vital Business Functions defined, get the business to weight them for relative significance, and then gear monitoring/reporting around this using a mix of what monitoring tools tell us, what incidents tell us, and what I know from interpretation e.g. if a service is down for an hour, comes back up for 10 mins and then goes back down can you really count it as being up for that 10 mins when no one would have been in a position to do anything useful with it in that time ? Or if all the monitoring is saying it is up but the responses are deteriorating, at what point do you say that it's effectively not available (unacceptable response times are documented to help here) ?

What the result lacks in terms of technical precision it more than makes up for in correlating with customer experience.

Thanks

Richard

Thanks

Richard


16/01/2015 at 22:09:31 GMT
Posts: 0
Stuart and Richard's answers are great but I think they are not addressing the specific question. Understanding how to calculate the availability achieved is important and Stuart's whitepaper remains an excellent resource. And Richard takes into account the more realistic view of the perceived impact to the end user and potential business impact of the related downtime. All very useful and worth understanding.
Karen's response is valuable and detailed. I think the availability of the active:active component in Karen's example would actually be 99.9975% - a typo, I think. Maybe someone can check my working on that ?

But I think the question was to do with calculating the availability the system was designed to achieve.

There's an interesting article here that clearly explains why the serial component availability in an end to end service must be multiplied together to understand the overall availability.
http://www.itsmportal.com/columns/availability-1-calculating-planned-availability-flipping-easy#.VLmAPIHfWrU

Additionally you mention the challenge of taking into account resilience or failover as designed for a specific component. In this case you would calculate the designed availability for the resilient component before plugging it into the rest of your calculation. Say the resilient component is composed of two components with 95% availability each. In this case the designed availability of that resilient component is the non availability figures multiplied together. So 5% of 5%, giving a result of 0.25% and so a designed availability of 99.75%. You would then plug this into your end to end calculation of availability.


Premier Gate
21 Easthampstead Road
Bracknell
Berks RG12 1JS

Tel: 0118 918 6500

Fax: 0118 969 9749

Contact Us