ITIL 4 is the latest release of the ITIL (Information Technology Infrastructure Library) best practices framework in IT Service Management. After a long time of silence the necessary adjustments have been approved to make the publication actual and trustworthy again ! In this article I will bring you up to speed about what’s new in the Incident Management process.
The practices
In ITIL 4 the former processes are no longer a part of the core publications as we used to know, but are now grouped together as practices. 34 practices in total ! Detailed information can be found on the Axelos website after you purchased an ITIL subscription. I couldn’t find any other publication yet. And to be honest, I think it is in everybodies interest to combine the publications from ITIL 3 and ITIL 4 to get the maximum out of ITIL.

What deserves your attention ?
So, what is new in ITIL 4 Incident Management ? Not that much actually, but I have gathered some points that have received more attention in ITIL 4.
Technical debt
The term “technical debt“ is not used that much previously. With technical debt we describe the effort or work that still needs to be done to fully recover from incidents or events that have been resolved by means of a workaround. The workaround could have been implemented to reduce the impact or frequency of an incident, but the chosen resolution was not ideal and needs to be corrected to return to the original or preferred state.
Major event
A major event (or major incident) can be best described as a high priority incident (so with substantial impact for the business) that requires a coordinated approach towards its resolution. Typically it requires the ad-hoc installation of a major event team, led by a major event manager or coordinator. In many cases team members are geographically spread and meetings are held remotely to discuss how the major event will be tackled, and who will take care of the required actions.
One difficult parameter is to find a good description for major events. The distinction between a major event and common incidents with the highest priority should be clear for at least every service desk agent. This should be documented as such.
Swarming
Swarming is an agile technique for solving a complex task, where multiple people with different expertises work together until it becomes clear what competencies are required to resolve the issue. At the end of the swarming activit, often one ore more responsible persons are assigned to fix the issue, based on the advice from the swarm. Knowing this, this technique is very useful when dealing with major events.
I have to admit I initially and wrongly thought that swarming was about temporarily deploying a specialized team to fix a whole lot of similar incidents to lower the backlog for the service desk, or to assist keeping the incident queue under control. My assumption was incorrect, nevertheless it remains a good approach in certain cases ;-).
Practice Success Factors (PSF’s)
Practice Success Factors are replacing the critical success factors of the past. So, this looks to be only a cosmetical change. The PSF’s for incident managent are:
- Detecting incidents early. One important note is that that the Service Desk can also rely on responsible key users for certain services, systems or applications to help in early detection of incidents.
- Resolving incidents quickly and efficiently. Speed is key for sure. Besides of that, it also emphasizes the importance of quality data inside the incidents (see next paragraph).
- Continually improving incident management approaches. This is the CSI part (continuous service improvement) and should regularly happen by reviewing specific sets of incidents (e.g. the incidents that breached their SLA).
The importance of data
The quality of data inside an incident ticket should be:
- Concurrent: information should be registered during the event, and not afterwards. This ensures that no piece of important information is forgotten. It also helps putting together a correct timeline during analysis afterwards if this should be required.
- Complete: make sure the steps executed during investigation are written down detailed enough, so they describe correctly and fully what happened, without any kind of shortcuts or room for assumption.
- Comprehensive: don’t forget to mention why some actions have been executed. Considering about the why can be revealing as well.
Processes
The incident management practice now consists of 2 processes:
- Incident handling and resolution.
- Periodic incident review.
The latter one is mainly to indentify new or update existing incident models and how to communicate this to all stakeholders.
Competencies and profiles
I have understood that ITIL 4 gives you complete freedom when it concerns competencies and roles. Nevertheless, it uses some general profiles.
| Leader (L) | The manager, team lead or leader role. Provides guidance, makes decisions, delegates tasks, monitors and evaulates outcome and activities, motivates agents, sets objectives. |
| Administrator (A) | Takes care of some administrative overhead, like reporting, registering tickets. Can also assign and prioritize incidents. Allowed to take basic improvements. |
| Coordinator / communicator (C) | Responsible for all kinds of communications to stakeholders. As stated, it also coordinates activities between parties. |
| Methods and techniques expert (M) | Designs and implements work techniques, documents procedures, continuous improvements. Knows the processes used, so can analyse work. |
| Technical expert (T) | Has the technical expertise to resolve incidents. |
Team dynamics
In ITIL 4 the agile concepts have become more important. It prefers flat structures over tiered ones. Working together in incident management is more important than before. To avoid teams going “out of sync”, there is a need for:
- No-blame culture: working as a team on incidents must be a safe environment at any time. People need to trust each other and are allowed to fail safe.
- Collective responsibility: the entire team is responsible for maintaining SLA targets. If necessary, they are encouraged to help each other and share knowledge.
- Continous learning: team members should be able to experiment and find new ways of working. Keep learning, on the fly or through common ways of knowledge gathering, and share that knowledge !
Some interesting key metrics
- % of incidents detected by monitoring and event management.
- % of waiting time in the overall incident handling time.
- % of incidents resolved automatically.
- Number of reassignments
If you believe something is missing, or you think of other important aspects in Incident Management, do let me know.
Categorieën:ITIL
Plaats een reactie