|

Monitoring the enterprise without real-time analysis
tools means that too many alerts from poorly tuned thresholds
equivocates to a significant inefficiency of the IT organization. |
Imagine if an alarm went off at the
nearest police station every time someone broke the speed limit. Though this
“static threshold†is not meant to be broken, thousands of drivers do every
day. In most cases, cars are accelerating just a few miles per hour over the
posted speed–breaking the speed limit but not necessarily warranting an
emergency alert.
In such a scenario, staff at the police station would be scrambling to find
the source of thousands of alerts to warn officers in the field. Or worse,
they would stop paying attention and miss real accidents. This is happening
within IT departments every day, as staff spend too much time and company
money chasing non-critical alarms due to static monitoring thresholds being
broken.
One of the biggest challenges IT departments face involves the constant
flood of non-critical or false alerts generated by today’s monitoring
tools–tools considered a “must have†for keeping tabs on such areas as
system performance, network security and business service quality. As these
tools become more complex, and as more information is generated and
delivered, even a small irregularity–such as increased network traffic or
the addition of new users to the system–can flood an IT department with
alarms.
Traditionally, the onus has been on system administrators to swim through
piles of data pouring in around the clock and struggle to differentiate real
alerts from the rest. After finding relevant data, they must then figure out
how to respond to what might, in the end, turn out to be nothing but an
unimportant storage disk on the fritz.
Performance-management tools provide alerts when enterprise systems exceed
established performance thresholds, but most rely on manually set static
thresholds, which inundate the IT staff with hundreds or thousands of false
alerts every day. No wonder many system administrators just stop paying
attention altogether.
Flood of
alerts unheeded
What is the bottom line impact of this reactive approach
on your business? Imagine this scenario: On an average day, eight hundred
alerts are produced by a company’s information systems. Understandably, the
system administrator did not pay much attention to the “noisy†monitoring
console, unless someone complained to the help desk or there was an actual
outage. That is what happened when this company’s customer relationship
management system failed.
With the customer communication system down, business came to a halt.
Everyone from the sales reps to the e-commerce manager to the CFO were
screaming, while the IT department pointed fingers and searched for the
problem. After six hours of troubleshooting, and a full day of employee
frustration and lost business, the system administrator discovered the
needle in the haystack: a dozen related alerts pointing to a simple memory
leak.
Essentially, to the CIO, monitoring the enterprise without real-time
analysis tools means that too many alerts from poorly tuned thresholds
equivocates to a significant inefficiency of the IT organization. If real
problems are not the ones addressed first, IT staff will spend the bulk of
its time chasing false alarms. The challenge for today’s CIO is to:
- put more adaptive systems in place, which can evolve with the
business in real-time;
- retain and utilize the knowledge garnered from existing systems and
staff; and
- minimize the cost of support and maintenance so those funds can be
used for investment in innovation.
To meet this challenge, a shift from traditional tools that rely on static
thresholds to adaptive analysis software can help ensure accurate insight
into performance and security problems. With the ability to self-learn a
system, network, application or security performance, this real-time
capacity facilitates quick and time-appropriate decision making and
forecasting of future problems, enabling IT departments to take corrective
measures before problems occur.
an evolving
environment
Real-time analysis software seamlessly integrates with
existing performance-monitoring tools but takes them further by
automatically conducting its own algorithmic components and configuring
within minutes meaningful information that self-learns the environment. This
enables automatic setting of dynamic thresholds that change and evolve along
with the system environment.
False alarms are the result of IT departments manually setting performance
thresholds for what they perceive as normal. A CPU usage spike on a busy
Monday morning at the office could be enough to break a pre-set static
threshold, leading to a false alarm if baselines were set to monitor
activity during a slow weekend–about the equivalent of an alert going off
for a driver going 66 mph in a 65-mph zone.
Existing performance-monitoring solutions require labor-intensive manual
threshold administration, but with systems changing every second, those
thresholds become obsolete within seconds, forcing IT to try to play
catch-up setting thresholds to match system performance. Real-time analysis
software learns behavior and therefore automates the threshold-setting
process based on system dynamics parallel to the changes occurring within
the enterprise environment.
Actual system performance drives threshold configuration. Real-time analysis
software can also base the threshold administration on the dated history of
the environment. For instance, if that CPU spike occurred every Monday,
thresholds would automatically be set accordingly, or, in the busy Christmas
buying season, real-time analysis software would automatically configure
thresholds each winter for servers powering credit card machines to match
the spike of server usage during that busy period.
What does this mean to end-users? It gives customers–especially those in
transaction-intensive businesses like financial services and e-commerce–an
opportunity to proactively manage the critical IT services that support
their businesses. IT managers can see in real-time how infrastructure
performance issues affect bottom-line business performance, and make smart
decisions accordingly.
Nicola Sanna is president and CEO of Netuitive, Reston, Va.
For more information:
www.rsleads.com/609cn-252
|