Applying Lean to Major Incident Management
Incident Management is arguably the most popular (or most adopted) ITIL process in IT organisations. Even if IT organisations have not adopted ITIL, they will have some form of incident management. Wikipedia states "The first goal of the incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained" (1).
The most demanding incidents to resolve tend to be the incidents with the greatest impact and urgency. These incidents are often classed as major or critical incidents (or P1s, Severity 1s, etc). For these incidents, the service desk or the support groups may assign them to dedicated personnel (commonly called Incident Managers) to centrally manage the incident and communications.
It is not surprising that during such incidents, many stakeholders are in urgent need of communications and updates. In most cases, the key question being asked is "when will the service return to normal"? While a formal major incident management process may be established, including detailed procedures on managing communications, it would not be uncommon for senior IT management to eagerily seek additional information or updates outside the agreed process. In part, this may be due to pressures from higher levels of management striving to show control in such chaotic circumstances.
So let's picture such a scenerio. A support group calls Incident Management seeing a major service disruption has occurred. The incident manager opens a phone conference or desktop collaboration session (virtual meeting) or both with the support personnel who are already involved in the incident. The incident manager may need to call in other support teams to help initially establish the situation and impact. Once the incident is confirmed, initial communications are disseminated and incident investigation/resolution begins. The management of the incident takes a great deal of focus, intepreting the technical conversation, marshallling personnel, testing hypothesises, reassessing business impacts, recording key events & decisions and compiling carefully worded communications. Suddenly the incident manager gets a tap on the shoulder or phone calls start coming in from IT managers seeking progress updates.
Let's take a step back from this scenerio and cast a Lean lens on it. The ultimate goal of Lean "is to provide perfect value to the customer through a perfect value creation process that has zero waste" (3). Lean identifies 7 forms of waste which should be identified and eliminated and they are Transportation, Inventory, Motion, Waiting, Over-processing, Over-production & Defect. In the small example above I see two forms of waste (2):
a. Waiting. Whenever goods are not in transport or being processed, they are waiting. In this scenerio, the goods are service restoration which is waiting because the Incident Manager is distracted providing a duplicate progress update to the manager, and
b. Over-processing. Over-processing occurs any time more work is done on a piece than is required by the customer. In this case, more work is being generating by providing a duplicate progress update to the IT manager, as well as the stakeholders by the agreed channels.
Some simple solutons to this problem may include:
a. the IT managers join the conference call and hear progress first hand (and therefore not distract the engaged personnel) or
b. an additional person joins the conference and acts as the incident secretary to manage all communications for the incident manager.
I hope this provides some insight in how Lean could be used to identify and eliminate waste in critical services like major incident management. If you have similar ideas or thoughts, please do not hesitate to add a comment.