Undertaking effective ITIL-based problem management is essential for any IT organisation that wants to deliver high levels of service availability and consistently high quality IT services.
It is however an unfortunate fact that the great majority of IT organisations fail to realise any noticeable benefits from the time and effort they spend undertaking problem management.
This article provides you with five simple and practical tips to help you to set up and maintain effective problem management.
Tip 1: Focus your problem management efforts on finding permanent solutions
Forget about using problem management to find temporary solutions.
If you want problem management to deliver real and tangible value then you must develop a problem management process that focuses on finding permanent solutions.
Only by focusing your problem management activities on permanent solutions will you be able to substantially reduce your volume of incidents. And it is in the reduction in the volume of incidents that the real value of problem management is to be found.
It is important to be clear about the distinction I am making between permanent and temporary solutions:
- A permanent solution is where the underlying cause of the incident(s) has been addressed in such a way as to prevent the incident from occurring again
- A temporary solution is one which is employed to get the user up and running again (‘as quickly as possible’) but which will not necessarily prevent the incident from re-occurring in the future – e.g. temporarily re-routing a user to a different print queue
Please note I try to avoid using the term ‘workaround’. This is because it confuses everybody. It is confusing because it can be applied to both permanent and temporary solutions. Because of this it is best to avoid this term (stick with permanent or temporary solutions – these are terms that everybody can understand).
So, the key to getting real value from problem management is to build a process that concentrates on finding permanent solutions. Only by doing this will you be able to deliver the value (reduced volume of incidents) that justifies the time, effort and resources involved in the investigation of problems and the subsequent deployment of permanent solutions.
Tip 2: Make sure your problem management process delivers real value
Most organisations that are familiar with ITIL undertake some form of problem management. And with very few exceptions they are wasting their time. This is because they have put in place a problem management process that delivers no real (tangible, meaningful and measurable) value.
How do you know if your problem management process is actually delivering value? Well, first of all you need to forget all about the various (irrelevant) metrics that are often mentioned. For example, the time taken to complete a root cause analysis, the number of problems in the backlog, etc. None of these actually indicate, reflect or measure the value of undertaking problem management.
There is in fact only one truly tangible and meaningful measure that your problem management process is delivering value – and this a reduction in the volume of incidents. If this is happening you have a worthwhile problem management process. If this is not the case… well, sorry to be blunt but you are just wasting your time. It is that simple.
You must regularly (every three months?) review your problem management process to check that it is delivering value and that the volume of incidents is falling. If it is not then you need to stop what you are doing and change. Change your process. Make it better – make it effective.
If you want, you can go a stage further and set some specific targets. For example, reduce Priority 1 incidents by 40% within three months (this is a very valid target because it ensures that you are addressing the incidents with the greatest adverse impact), reduce network printing incidents by 60% over six months, etc.
Now I am fully aware that other factors can impact the volume of incidents, and a lot of organisations use this as an excuse for not using incident volumes as a measure of problem management effectiveness. I do accept that many factors can impact the volume of incidents, and in some circumstances even with good problem management in place you might see incident volumes (temporarily) increasing.
But even with all these potential issues the volume of incidents still remains the single most important indication of the effectiveness of your problem management process. Forget everything else: just make sure you are achieving this and, if not, then do something about it!
Tip 3: Make your service desk/incident manager your problem manager
Yes – I know. This is something that ITIL advises against. And I know and understand why ITIL advises against this. But I believe that in this case ITIL has got it wrong.
There is in fact a very good reason why your service desk/incident manager should also be your problem manager. But before I go into that I want to be very clear on what the role of problem manager actually involves.
Problem manager is not a technical role, and the problem manager does not undertake problem investigations. Problem investigations (both the root cause analysis and the solution identification) are undertaken by an appropriate subject matter expert (SME).
The primary responsibilities of the problem manager are in fact administration, co-ordination and facilitation. The specific activities, among other things, include:
- Reviewing requests for problem investigations to see which are justified
- Developing the business case for each problem investigation in order to justify the resources required for that investigation
- Arranging for the appropriate SME resources to be made available and assigned to a relevant problem investigation
- Producing the terms of reference to be followed by each SME undertaking a problem investigation
- Overseeing the investigations undertaken by the SME
- Validating the root cause analysis and the proposed permanent solution
- Reviewing the effectiveness of the deployed permanent solutions.
In the majority of organisations there is rarely the justification for a full-time resource to act as problem manager. Therefore it is often combined with another role. The first, and very important point, is that you must appoint somebody to this role. You will not be able to undertake effective problem management without somebody acting as (an effective!) problem manager.
Now back to the original point. Why do I disagree with ITIL and recommended that you combine the roles of service desk/incident manager and problem manager?
Well the reason is simple. The primary goal of the problem management process is to reduce the volume of incidents (see Tip 2). So you want to ensure that the problems prioritised for investigation are those that will deliver the greatest benefit – i.e. will lead to the biggest in re-occurring incidents (or will eliminate the most ‘damaging’ incidents). Makes sense so far?
So – who is best placed to decide which problem investigations will deliver the greatest benefit? Who is best placed to decide which problems will justify the resources required for investigation?
It is clear this has to be the person who stands to benefit the most (in the IT organisation) from the in incident volumes, and the person who has the greatest visibility and understanding of current incident trends and volumes.
This is clearly the service desk/incident manager. This is in fact the IT role that has the greatest vested interest in ensuring that you are undertaking effective problem management. And this is why the roles should be undertaken by the same person.
There is no ‘conflict’ between these roles; there are only shared goals and benefits. So combine the roles – it is the only sensible option.
Tip 4: Get on with it – you do not need specialist resources or tools
Some organisations hesitate to put problem management in place because they believe they lack the ‘resources’ to undertake problem management – i.e. the required people and an integrated incident/problem management system.
This is nonsense. You do not need any special resources in order to undertake effective problem management. You can do it perfectly well with the resources you already have.
Unless you are a large third-party IT service provider you do NOT need to set up a dedicated problem management team, full of technical experts to undertake problem investigations. This is in fact completely the wrong thing to do.
The SMEs you allocate to undertake the problem investigations will in fact come from your existing IT departments and teams. The key is to make sure that the time and effort they spend on problem investigations is fully justified. This is a responsibility of the problem manager (see Tip 3). How much time will the SMEs need to allocate to problem investigations? Well, this depends on the nature of and the number of the problems you decide to investigate.
The bottom line is that you can start problem management with the staff you currently have.
One important point to note: please be aware that a significant proportion (the majority?) of problem investigations will be non-technical in nature. The great majority of incidents are caused by failures in process, procedure, human error, etc. Problem investigations are often more about finding out why a procedure failed than why technology failed. So an SME is not necessarily a technical resource.
You do not need to spend money on specialist tools/systems for problem management. There is absolutely no need to dynamically link incident records to problem records, and problem management is not dependent on an integrated incident/problem toolset.
And contrary to what is commonly believed, there isn’t any real benefit in the service desk being able to access the problem records.
So, to be clear – you do not need any special resources to start undertaking effective problem management. Just get on with it.
Tip 5: Don’t worry about distinguishing between pro-active and reactive problem management
Who cares where problem requests come from?
Does it really matter if they come from undertaking regular trend analysis of incidents, or if they are generated automatically as a result of a Priority 1 incident being logged? No – from the perspective of developing an effective process for the investigation of problems the source of those problems is largely irrelevant.
Many organisations set up overly complicated problem management processes because of the perceived need to have separate procedures for proactive and reactive problem management. But the distinction between proactive and reactive problem management is an unnecessary complication. You do not need two separate set of problem management procedures.
Now it is undeniably important to carefully identify how, when and from where potential problem requests originate, and how they are to be submitted to the problem manager for consideration. For example, who has responsibility for analysing the incident trends, how often should they be analysed, how are they to be analysed, etc.? If a problem request is to be raised after a specific event (e.g. a high-priority network failure), then how is this to happen. All of this should be defined and clearly documented within your problem management process documentation.
But from the point where the problem request is submitted to the problem manager the process for considering the justification for that request, accepting and prioritising the investigation, assigning the resources to the investigation, validating the results of the investigation etc., is the same irrespective of the source of that request.
So avoid the unnecessary complication and ignore the irrelevant distinction between pro-active and reactive problem management. Develop a single end-to-end problem management process.
If you want to find out more about problem management, then you might be interested in learning more about our Problem Management Masterclass.