Features

April 2008

Special Focus: Testing & Monitoring

Close the loop on network problems

Integration of change awareness enables correlation of network events with configuration changes.

by Pam Snaith

Preventing network downtime and performance degradation is every IT manager's goal. Causes are not always preventable-major power outages and other external events occur-but a considerable amount of disruption can be prevented. Industry analysts agree that erroneous network configuration changes, often manually entered, cause a significant portion of network downtime and performance degradation, perhaps as much as 80 percent. With so many network faults caused by configuration changes, a key to preventing network outages is to ensure that network management tools are "change aware."

Integration of change awareness into the network service and fault-management solution closes the loop on network problems that stem from configuration changes. Integration enables correlation of network events with configuration changes, giving insight on problematical configuration changes as part of the root-cause analysis. It can also provide a configuration audit trail of any selected network device through the history that is typically retained by network-management tools.

Why are configuration changes so often incorrect?

There are simply so many of them. Device configuration changes are numerous and networks teem with routers and switches that, in the normal course of business, need configuration adjustments. Typically, they are from multiple vendors, each with its own command-line structures. The number of changes can be daunting, and the variety of syntax makes management a challenge.

Awareness of configuration changes will not always prevent downtime, but it does provide the opportunity to make corrections quickly.

Network device configuration is detailed and is often handled by a few experienced technology professionals. Manual input is still common, and even seasoned professionals can make mistakes when keeping track of so many details.

Discrepancies develop between the startup and running configurations. This can happen when the configuration change is correct but is not saved to non-volatile RAM. In the event of a device reboot, the device reverts to the old configuration.

Time delays caused by manual input can cause problems. If five routers need the same change, some of them will be done before others. These incompatibilities may create problems while the task is in process. In addition, there is an opportunity for less-knowledgeable staff to input a change incorrectly if multiple people handle configuration changes.

THE BENEFITS OF CHANGE

Preventing network problems, such as those stemming from configuration changes, is an opportunity to increase network uptime, provide better business services and improve business continuity. There are a few steps to keep in mind when incorporating configuration and change awareness into a network-management solution.

One decision is what solution to deploy-a niche standalone application or one incorporated into the network service and fault-management solution. Here is an opportunity to unify and simplify overall network management and provide new efficiency to the IT staff.

A niche tool only exhibits its excellence in its native environment. With many different vendors and platforms, the amount of serial workflow required is significant and the opportunity to speed up and clean up change processes can be lost.

An integrated solution for network service, fault and configuration management, with a centralized control point, will not only enable better network management in the first place, but also will allow staff to rapidly spot and resolve configuration errors. For example, intelligent thresholds are essential to problem detection and should include proactive alarming on key performance indicators for a particular service, such as voice over IP. With integrated, change-aware network management, configuration changes can be correlated with network events and alarms, resulting in easier corrections and higher availability of critical business services that rely on the IT infrastructure.

Managing configuration changes correctly takes two key capabilities-awareness and automation. Network change and configuration management need to "notice and notify" when changes are made to network devices. Awareness of configuration changes will not always prevent downtime or degradation but it does provide the opportunity to make corrections quickly, such as a fallback to a previous, working configuration. Change awareness, integrated into the network service and fault-management solution, should identify configuration changes in real time, verify them against established correct configurations and notify the correct individual regarding unexpected changes.

Automation provides the opportunity to complement awareness with rapid action. Today's network-management solutions depend on automation to detect developing performance problems and to take immediate action to prevent downtime. While automated actions should be based on business policies established by trusted technical advisors, automating the resulting action eliminates a great deal of risk. Automation improves both proactive and reactive change management.

PROACTIVE AUTOMATION

Proactively, automation can implement scheduled upgrades and deliver immediate notification of unauthorized changes. Stored configurations can be uploaded to multiple devices simultaneously. Changes are automatically tracked.

Reactively, alerts are automatically sent to appropriate individuals when changes have been made to device configurations. This gives them the opportunity to make corrections or take other action to ensure overall network reliability. If problems do occur, automation can roll back network device configurations to their last known good state. Manual corrections could never be as fast.

If configuration changes are accurate and timely, many causes of outages are eliminated. Integration of configuration management within network service and fault management helps to bring network availability to a new level of reliability. Change awareness ties together fault and configuration management, simplifying the growing complexity of managing large infrastructures and bringing network management in line with the importance of the network itself in delivering business services.

Pam Snaith is product marketing manager, infrastructure management, at CA, Islandia, N.Y.

For more information (click here)


The NBA network

by Gnanesh Dholakia

The rapid proliferation of virtualization, optimization and Web services technologies has increased the complexity of IT infrastructures and changed the relationship between infrastructure components, applications and users. The way current tools view the network no longer provides the information that is vital to effective management of business service delivery. Network behavior analysis (NBA) systems can provide an effective way to view the infrastructure.

A number of factors contribute to the challenge of maintaining satisfactory performance and availability on an ongoing basis. Organizational growth, mergers and acquisitions, the increasing prevalence of Internet-savvy users, and the proliferation of rich media mean that network bottlenecks and slowdowns become more frequent, often due to bandwidth-hogging applications.

Available monitoring tools, however, might not be able to keep pace with increasing infrastructure complexity and escalating service-level demands. Status monitoring tools, for example, report on/off status without indicating why a device is off or what effect it is having on service delivery. Performance-monitoring tools tend to focus on identifying symptoms such as latency, increases in round-trip time and jitter, but they do not provide any insight to the cause of these problems.

The context of the problem needs to be understood so that the cause can be identified, affected users can be alerted and the problem resolved, including:

  • whether changes could have caused service degradation or interruption;
  • how that activity differs from typical behavior;
  • what activity led to the problem; and
  • which users, applications, devices, ports and protocols are involved or affected.

Organizations should not necessarily discard the tools in place today and start from scratch. Rather, organizations should look to add a new layer of capabilities that addresses the challenges presented by the increased complexity and service-level requirements.

NBA systems analyze network traffic to provide valuable information about the interactions of-and dependencies between-users, applications and systems. Customers benefit from proactive problem resolution and reduced mean time to repair, while ensuring the availability, performance and security of business services.

NBA systems collect network flow data and enhance it with application and user identification and behavioral analytics to present a complex infrastructure in a business context. Predefined and customizable analyses enable users to identify performance and availability issues before they disrupt business services.

Role-based presentations enable users across IT to access this data in a format tailored to the specific needs and workflows of security, applications and network teams. Usage and dependency data enable informed optimization and change-management decisions.

NBA systems use all of this information to intelligently interoperate with other systems to add value and improve workflow. They learn from other systems, such as identity-management systems and traffic accelerators, to provide business context. They feed network-management systems and security event-management systems and update change and configuration management databases. They allow other systems to understand how business services are delivered across the infrastructure.

NBA systems enable users to manage change in their IT infrastructures. As a result, customers are able to ensure the availability, performance and security of business services, such as voice over IP, Web services and enterprise applications, as well as reduce costs and satisfy regulatory requirements.

Gnanesh Dholakia is director, product marketing, Mazu Networks, Cambridge, Mass.

For more information (click here)