|
SPECIAL FOCUS:
From the September 2006 |
Analysis tools cure IT headaches One of the biggest challenges IT departments face is the constant flood of non-critical or false alerts. by Nicola Sanna
Imagine if an alarm went off at the nearest police station every time someone broke the speed limit. Though this “static threshold” is not meant to be broken, thousands of drivers do every day. In most cases, cars are accelerating just a few miles per hour over the posted speed–breaking the speed limit but not necessarily warranting an emergency alert. In such a scenario, staff at the police station would be scrambling to find the source of thousands of alerts to warn officers in the field. Or worse, they would stop paying attention and miss real accidents. This is happening within IT departments every day, as staff spend too much time and company money chasing non-critical alarms due to static monitoring thresholds being broken. One of the biggest challenges IT departments face involves the constant flood of non-critical or false alerts generated by today’s monitoring tools–tools considered a “must have” for keeping tabs on such areas as system performance, network security and business service quality. As these tools become more complex, and as more information is generated and delivered, even a small irregularity–such as increased network traffic or the addition of new users to the system–can flood an IT department with alarms. Traditionally, the onus has been on system administrators to swim through piles of data pouring in around the clock and struggle to differentiate real alerts from the rest. After finding relevant data, they must then figure out how to respond to what might, in the end, turn out to be nothing but an unimportant storage disk on the fritz. Performance-management tools provide alerts when enterprise systems exceed established performance thresholds, but most rely on manually set static thresholds, which inundate the IT staff with hundreds or thousands of false alerts every day. No wonder many system administrators just stop paying attention altogether.
Flood of alerts unheeded With the customer communication system down, business came to a halt. Everyone from the sales reps to the e-commerce manager to the CFO were screaming, while the IT department pointed fingers and searched for the problem. After six hours of troubleshooting, and a full day of employee frustration and lost business, the system administrator discovered the needle in the haystack: a dozen related alerts pointing to a simple memory leak. Essentially, to the CIO, monitoring the enterprise without real-time analysis tools means that too many alerts from poorly tuned thresholds equivocates to a significant inefficiency of the IT organization. If real problems are not the ones addressed first, IT staff will spend the bulk of its time chasing false alarms. The challenge for today’s CIO is to:
To meet this challenge, a shift from traditional tools that rely on static thresholds to adaptive analysis software can help ensure accurate insight into performance and security problems. With the ability to self-learn a system, network, application or security performance, this real-time capacity facilitates quick and time-appropriate decision making and forecasting of future problems, enabling IT departments to take corrective measures before problems occur.
an evolving environment False alarms are the result of IT departments manually setting performance thresholds for what they perceive as normal. A CPU usage spike on a busy Monday morning at the office could be enough to break a pre-set static threshold, leading to a false alarm if baselines were set to monitor activity during a slow weekend–about the equivalent of an alert going off for a driver going 66 mph in a 65-mph zone. Existing performance-monitoring solutions require labor-intensive manual threshold administration, but with systems changing every second, those thresholds become obsolete within seconds, forcing IT to try to play catch-up setting thresholds to match system performance. Real-time analysis software learns behavior and therefore automates the threshold-setting process based on system dynamics parallel to the changes occurring within the enterprise environment. Actual system performance drives threshold configuration. Real-time analysis software can also base the threshold administration on the dated history of the environment. For instance, if that CPU spike occurred every Monday, thresholds would automatically be set accordingly, or, in the busy Christmas buying season, real-time analysis software would automatically configure thresholds each winter for servers powering credit card machines to match the spike of server usage during that busy period. What does this mean to end-users? It gives customers–especially those in transaction-intensive businesses like financial services and e-commerce–an opportunity to proactively manage the critical IT services that support their businesses. IT managers can see in real-time how infrastructure performance issues affect bottom-line business performance, and make smart decisions accordingly.
Nicola Sanna is president and CEO of Netuitive, Reston, Va. |