Web Feature

 

Delivering capacity and performance

Centralized storage caching and global namespaces work together

by Gary Orenstein

Network-attached storage (NAS) environments have evolved significantly from the early deployments for limited departmental file sharing. Today, NAS has grown to support key enterprise applications such as databases, financial analytics, design automation, simulation, business intelligence and the majority of scalable Web applications.

The rapid evolution of NAS to support enterprise-wide solutions requires consolidating multiple file systems across storage devices. This technique and implementation is known as a global namespace, where multiple NAS devices can be linked together in such a way that servers only need to access one file system, which may be distributed across multiple devices.

Global namespaces can help NAS users conquer capacity management, but they do not directly improve performance for storage systems. In fact, some global namespace implementations can hamper performance due to ballooning directory structures. New solutions based on centralized storage caching, however, can improve the performance of global namespaces with additional I/O operations per second and low latency response. This provides the advantages of a consolidated file system yielding superior storage capacity, without penalties.

The flexibility of a centralized file system enables the rapid addition of new clients. Early NAS devices, however, could only expand to a finite capacity, and additional storage requirements mandated the deployment of a new device. That device, in turn, had to have is own unique file system, requiring separate management and administration. For many IT managers, the proliferation of unique NAS devices led to an unwieldy number of file systems and a delicate balancing act for storage management. This process is akin to operating a computer with a dozen or more individual disk drives, requiring a search through every drive each time a user wants to find a file.

Understandably, larger-scale NAS solutions were held back by an “island-like” management approach. This dilemma led to the development of global namespaces, which provide an abstraction layer to aggregate multiple unique file systems into a single, coherent, shared file system. Global namespaces can be implemented through appliances within a network environment or as part of the NAS storage layer. Typically, parallel or clustered file systems use global namespaces to aggregate large amounts of storage capacity into an easily managed pool.

Performance Challenges

By eliminating the need to micromanage individual file systems, a global namespace removes previous limitations on adding new clients and NAS devices. This provides an unimpeded growth path to expand the client and storage infrastructure.

Global namespaces can simplify NAS client and server expansion.

While global namespaces solve a capacity management issue, they are not directly responsible for improving I/O performance. While aggregating multiple NAS devices together would appear to deliver such a boost, there are factors that create the opposite effect.

As global namespaces grow, the directory information grows. In fact, large directories present a performance challenge in their own right. For example, finding a file now means searching through a larger file system, often referred to as “walking the directory tree,” which adds significant latency. Specifically, additional NFS operations are required at each stage of the process.

Global namespaces also impact performance because they are primarily disk-based. While aggregating disk drives together can increase throughput (or bandwidth), this architecture cannot directly improve two other critical measures of storage system performance: I/O operations per second (IOPS) and latency.

Disks provide the greatest amount of capacity, but due to the mechanical nature of disk spindles, they are limited in the overall amount of IOPS they can deliver. Further, because each request includes head seek time and the rotation of the magnetic media, latency for disk-based requests can be significant.

Caching, on the other hand, makes use of memory to deliver not only throughput, but more importantly, high IOPS and ultra-low latency. For I/O constrained applications, this combination delivers application performance improvements by significantly increasing the number of transactions and dramatically reducing the processing time.

The implementation of a global namespace can provide relief from capacity management headaches, but can also result in the need for performance improvements.

Caching Improves Performance

New centralized storage caching solutions directly boost I/O operations per second and reduce access time (i.e., low latency) by complementing the capacity-management features of global namespaces. This combination is ideal for customers who have large data sets requiring simplified management and the need to frequently access data with ultra low latency in such applications as databases, financial analytics and simulations. Centralized caching is emerging as the high-performance component of the global namespace.

Implementation of this solution involves deploying one or more scalable caching appliances that serve data from high-speed RAM, offloading the conventional access to a slower, mechanical disk. By implementing a solution with caching, all data remains protected on the persistent storage, and IT managers can retain existing storage management, backup, recovery, snapshot, replication and provisioning features.

Most traffic to the application servers is delivered from the caching appliance. For data that has yet to be cached, the appliance will retrieve it from the persistent storage layer upon first request, and then continue to serve the data from cache.

Caching by its very nature is dynamic, and once installed, is a relatively management-free process. Multiple applications accessing different data sets can benefit from a single caching appliance because it continually makes the most recently accessed data available. If the active data set becomes larger than the existing capacity of the appliance, expansion can take place on the fly by adding an additional appliance.

Global namespaces provide a valuable addition to large-scale NAS deployments by streamlining capacity management. This helps improve utilization, reduces manual data movement and allows for easy expansion. Centralized storage caching delivers the performance boost on top of global namespaces for data centers that are both capacity- and performance-constrained. The seamless integration of these two technologies combines to maximize the effectiveness and efficiency of large-scale NAS deployments.

Gary Orenstein is vice president of marketing at Gear6, Mountain View, Calif.
For more information: (click here)