Incident resolution and closure: Waiting for the fat lady to sing

As I have learned during all my years in IT Service Management, it is always good to start at the beginning: “An incident is an unplanned interruption to an IT service or reduction in the quality of an IT service or a failure of a CI that has not yet impacted an IT service.” (ITIL Service Operation book)

The aim of Incident Management is to restore the service to the customer as soon as possible. We restore the service and close the incident, right? Not so fast, hombre.

From the old days of Total Quality Management in Toyota and Dr. Deming’s influence, we know it is not over until the customer confirms it is over. Hence the two-step resolution:

  • An assigned engineer says it is over.
  • Customer contact person (the fat lady, in this case) confirms it is over OR a reasonable amount of time passes since the resolution without the customer complaining.

Here is a more detailed review of the issues that have to be addressed during incident closure and resolution:

Resolution

An assigned operator or engineer declares that the service is restored and puts the incident record into “Resolved” status.

Remember – Who is the owner of the incident, responsible for the incident lifecycle? The Service Desk (SD) is the one. So SD operators have to check if the service is really fully restored. Also, sometimes the process is charged to the customer by the time spent on the incident, so Service Desk checks if the work hours have been entered in the incident record or the corresponding timesheet application.

Also, a nice thing to have is some kind of resolution categorization, at least including “Successful,” “Out of scope,” “Customer error,” “Known error,” “Resolved through another incident,” “Incident opened by error” or similar. The Service Desk enters this info or checks if the assigned person entered it correctly. It will serve as a nice foundation for management reporting.

This is also a good time to check the initial category or affected services set in the diagnosis/ classification phases of the incident. Example: Very often, network downtime looks like a mail service malfunction. The incident is categorized as “Mail service down,” escalated to the mail service team, and then rerouted to the network team, without changing the initial categorization. This introduces entropy into the reporting system, wastes middle and higher management time and results in preventing management from making quality decisions quickly. So, the Service Desk should check every incident category during resolution and correct it if necessary.


Customer satisfaction

There are a number of ways we can rate customer satisfaction during incident closure. The most obvious method is to ask the customer to respond to a set of short questions about every single incident during the resolution phase. Three to five questions rated 1-4 (evading the middle grade) or 1-10 can be sent to the customer via e-mail, telephone conversation or a web application. Since this can be perceived as too intensive by the customer, there are ticketing tools which enable us to set the percentage of the surveys by incidents, for example every third incident or 25% of incidents will be surveyed.

A more general impression of customer satisfaction can be achieved by periodic meetings with customer representatives, i.e. monthly or quarterly meetings. This is a less aggressive method, but gives an overview of less quantitative categories like their overall feelings about the process. Though subjective, these ratings are nevertheless relevant, since they can be steered and reacted to.

Other methods like phone surveys and group interviews can be effective, but they are out of the scope of this post. We will probably address them in a separate article.

Knowledge

If a Knowledge Management (KM) process is implemented, the Service Desk can mark this resolution as a knowledge base article candidate. The Knowledge Manager will periodically review these resolutions and process them adequately. From what I‘ve observed, even if there is no KM process, assigned personnel tend to search through incident database in order to check whether this kind of incident occurred in the past. A good ticketing tool really justifies its return on investment here.

Closure

We expect the customer to confirm the resolution of the incident. This can be done by a phone call with the Service Desk, a reply to an automatic notification from the ticketing application or via a web- based application. This is the second important step in our two-step incident closure. After the customer confirmation, an incident record can safely be put into “Closed” status. The customer should be notified upon closure.

Sadly, some customers do not seem to find this important once they receive their service back. Since the customer is almost always right, it has somehow became an industry standard to inform the customer in the small print of the resolution notification that the resolved incident will be automatically closed if the resolution is not confirmed in, say, 24 or 48 hours.  An automated process can be implemented to close all resolved incidents older then this threshold time daily.

Reopen

Very often, due to their business priorities, busy customers treat automatic resolution notifications as spam, and in some cases (Mail service down) are unable to receive them. Somehow, they are more responsive to a closure notification. In this case, incidents have to be reopened by the Service Desk, and treated as “work in progress.” This can be a slippery slope, since some SLAs require us to treat incident reopening as customer complaints, and process them accordingly.

In these cases, confirmed resolution notifications should be kept track of, and treated as “work as usual.” The important thing is to keep any time after the incident resolution “off the clock,” in order to protect the service organization from SLA breaches.

Another thing, very important to mention here, is that some service organizations tend to process incidents with high priorities in such a way as to lower the priority after the resolution is in sight or has been declared resolved by the assigned personnel. They oversee the process in case anything else needs to be done, through opened incidents of lower priority. This can show our best effort-based intentions to a friendly customer, but at the end of the day, spoils our Mean Time To Resolve (MTTR) metrics and can make us look bad in spite of our best intentions. What does your service organization do in this case?

Download a free sample of our  Incident Management process template to gain deeper knowledge about incident management.