SERVICE PROVIDERS/
SERVICES
Improve service levels with
meaningful measurement

by Anne Brazao

Measuring service levels is one of the most important aspects of service-level management (SLM). If you cannot accurately measure service levels, you will never be able to improve them. Managing something that is not being measured is not only difficult, it is impossible to quantify success without proof points verifying improvements over time.

In any service organization, a bidirectional services relationship exists: measurement of the service levels that the IT organization or service provider is delivering to its end user customers (customer service-level measurement or CSM); and measurement of service levels being delivered by vendors providing equipment or services to the IT or provisioning infrastructure (vendor service-level measurement or VSM). Every service provider—whether within an enterprise IT organization or Internet service providers (ISPs), application service providers (ASPs), or carriers—is the customer of other service or equipment providers. The service level provided by these vendors is an important factor affecting the service levels that a business, in turn, supplies its customers.

Techniques for measuring customer and vendor service levels are quite different. Most SLM products today focus on CSM—measuring the level of service being provided to the end user. Customer service levels are measured in terms of affected users or sites, depending on the nature of the service provided. An outage affecting a single user should not be viewed as the equivalent of an outage affecting a thousand users, even though the extent of downtime may be identical. The primary metric must, therefore, be the service level experienced by the average user of the service. This metric is referred to as total service availability (TSA).

TSA is a weighted availability metric that does not penalize you for deploying redundant components for critical services. If, for example, an enterprise has two routers, each supporting 100 users, and one of the routers fails, the availability is 50%, with or without TSA, because 50% of the equipment is unavailable and 50% of the users will experience downtime.

If, however, the two routers are redundantly deployed for automatic failover and one of the routers fails, “raw” device availability is still 50%, but total service availability is 100%, because no users of the service are affected by the failure. While this is a simple concept, many SLM solutions do not use TSA, so the service metrics they deliver are based on “raw” device availability and not service availability.

Metrics based upon the value of particular business processes are also critical to successful service management. The business impact of one outage vs. another, in terms of revenue, productivity or other business priorities, is also a key factor in service management. What is the relative importance of a service outage for a group of online traders vs. a group of casual computer users? 

UNDERSTANDING CUSTOMER EXPECTATIONS

Measuring the service levels experienced by customers can be challenging. The first challenge is identifying the service being measured. Services are provided at many levels, and, in some cases, the customer’s view of a service is different from the provider’s view. The service must be defined from a common perspective, based on a shared understanding between the customer and provider. The essential definition of a service is “whatever the customer believes that the provider has agreed to supply.” Measurement of the service level must then be defined on that basis.

The next step is to quantify the customer’s service expectations. What are acceptable response times for various transactions? What are the full set of facilities that need to be available for the service to be considered operational?

Once the service being measured is understood, an automated system is required to collect and analyze outage information from a user and service perspective, rather than from a device perspective. Ideally, such a solution would be built on the network management (i.e., network, systems and applications) infrastructure already deployed in the environment, rather than replicating the data collection already being done by such systems.

In order for a service-level measurement system to work, the services to be measured must first be described to the system. A number of attributes need to be described:

  • Basic service information, such as name, description and customer name.
  • POP information. What are the network points of presence from which this service is being provided? How many users does each POP support?
  • Service requirements. What is required in order for this service to be considered operational for a given set of users? The network? Particular servers? Particular application processes? Particular transactions? Particular transaction response times? These particular requirements can be interrelated in logical fashion.
  • Hours of operation. During what period is the customer expecting service levels to be maintained?
  • Service-level thresholds. What are the acceptable service levels for this customer?

In order to achieve reliable service-level measurement, this information must be accurately defined and maintained as changes occur. Because this process can be time-consuming and error-prone, a self-maintaining system is highly desirable. A “rules-based” service-level measurement system allows the creation of “meta” service rules that describe how service definitions are to be created and maintained. Such a system will continually adjust the service definitions as users are added, removed or relocated, or as new POPs are brought online. Service requirements will automatically be applied according to the specified rules.

In addition to the automatic maintenance of service definitions, several other features of a service-level measurement system are mandatory for serious provider environments:

  • Planned outages. The system must allow certain planned downtime to be announced to the customers and not counted in the measured service levels.
  • Administrative outages. Customers may experience outages that the provider was not able to detect and the system must support administrative entries that adjust the service-level numbers accordingly.
  • Service-level agreement (SLA) generation. The meta service rules should contain enough information to produce detailed SLAs for customers. Generating them from the service-level measurement system ensures that the measurements are consistent with the customer agreement.
  • Access to detailed data. Online reports should allow drill-down to as much detail as needed to support the summarized availability numbers. It should allow, when desired, access to each outage that occurred, including planned and administrative overrides.

Customer reports. Service-level reports should isolate the information for each customer, allowing customers to log in and see their own information.

ASSESSING VENDORS

Even if an organization is already measuring and managing its customer service levels, chances are that the same solution it is using for CSM will not support vendor service-level management. It is only seeing half the information needed to be able to improve on the customer service levels being measured. 

Vendor service level is measured on an aggregate basis, by grouping components according to vendor, family or type. This allows a comparative view into the service levels provided by various vendors, as well as the products or product families offered by those vendors. Effective vendor measurement will allow the provision of higher levels of service to customers, because the organization will be able to manage the services more effectively.

Applications are now becoming available to provide a suite of interrelated, complementary service-management modules that build on existing network management infrastructures to provide an effective and complete service-management layer solution. To provide full service-level management, a set of integrated applications is needed to provide a robust rules-based customer service-level measurement system, vendor service-level measurement, capacity assurance, asset tracking and deployment management. Finally, all of these discrete application modules must be fully collaborative, working together to provide a more complete view of all aspects of the service-management process.

Brazao is vice president of marketing at Opticom, Inc., of Andover, MA.