Part 4: Things to Consider Before Upgrading Your NAS

If you’re at all concerned with the scalability of your infrastructure when considering upgrades, you should know that adding new filers, more high speed drives and/or Flash modules to a NAS installation to improve performance is a short-term solution at best. It’s only a matter of time before application demands once again outstrip the NAS infrastructure’s ability to scale performance and you’re back to ripping out old gear and replacing it with new. In contrast, with Avere’s two stage NAS architecture, system scalability is built in. As more clients and new applications are added to the mix (requiring higher IOPS performance) an Avere FXT cluster can be easily expanded with the non-intrusive addition of new nodes. Up to 25 appliances can be added to a cluster, delivering plenty of horsepower without having to touch any other devices already in place. And because the Avere FXT cluster can serve multiple storage servers, there is no need to add Flash to each and every filer – the Avere cluster becomes an extensible fast media layer in front of all of them, serving up performance to hot spots without over provisioning.

Manageability is another hidden cost to upgrading an existing NAS infrastructure. With falling prices and improved durability making new storage media such as Flash SSDs widely available to boost application performance, many companies are tempted to install Flash at tier zero and expect that it will solve their performance problems, albeit it at a relatively high cost. But installing fast-access storage media solves only part of the problem. IT then has to figure out which applications are best served by the new tier of storage, often having to become an expert in the latest storage media read and write rates and application QoS schemes in order to optimize the utilization of the more costly storage media. In comparison, an Avere FXT cluster has the intelligence to dynamically allocate data to the appropriate storage tier and media based upon both data and access characteristics, which balances the cost/performance equation with no administrative overhead.

Rebecca Thompson

Part 3: Things to Consider Before Upgrading Your NAS

With traditional NAS, controller upgrades are part of the typical lifecycle of the system. At the time of the initial purchase, the controllers are selected to deliver on the performance requirements at that time but not much more since that would mean spending more money. A year or two down the road, the controllers are out of gas and something must be done to get more performance for the data.

At this point controller upgrades are the typical course of action. Controller upgrades are so common that people have become numb to the pain. Let me remind people of the problems that come with this and propose a better way.

Controller upgrades with a traditional NAS system involve the following steps:
1. Purchase higher performance controllers (typically a model or two up the traditional NAS vendor’s product line, if a higher model exists)
2. Purchase new disk shelves or Flash-based PCI cards, required to get more performance out of the new controllers
3. Purchase new licenses for all software (e.g. NFS, CIFS, mirroring) at the higher price tier of the new controllers
4. Take the original NAS system offline
5. Remove the old controllers
6. Install the new controllers
7. Add the new disk shelves
8. Bring the upgraded system back online

The above process is expensive, disruptive, and requires more disks, power, and space even if no additional storage capacity is required. As an alternative, consider boosting the performance of your existing NAS with Avere FXT appliances.

Avere FXT appliances boost the performance of all NAS applications by accelerating read, write, and metadata operations without the need to add more disks to your NAS system. FXT appliances scale from 1 to 25 nodes per cluster to match your initial needs and gracefully scale as your needs grow. If more capacity is needed, then this can be accomplished by adding cost-effective SATA drives to your existing NAS system. FXT appliances are simple to install in existing environments and require no changes to existing applications, clients, NAS systems, or data retention procedure such as backup and mirroring. FXT appliances accelerate NAS performance and enable a 5:1 reduction in disks, power, and space when compared to traditional NAS.

With Avere you get the benefits of a clustered storage solution today without the need to migrate your data to a new storage system. For more information on FXT appliances, please visit the Avere website product page.

Jeff Tabor

Part 2: Things to Consider Before Upgrading Your NAS

Integrating Flash storage into NAS systems is a new method of improving performance that is being promoted by vendors of traditional NAS. List pricing for Flash in a NAS system from one of the leading vendors runs from $170/GB to $300/GB. This compares with roughly $2/GB (again, list pricing) for SATA storage. At these prices, customers need to be using Flash very efficiency. Sadly, vendors are not always making this possible.

Typically, Flash is added to a NAS system in one of three ways, as a PCI card, an SSD array, or a caching appliance. Let’s look at the leading vendor in each category and the efficiency of their approach.

NetApp offers Flash as a PCI card they call PAM. Up to five PAM cards with a total of 2.5TB of Flash can be installed in some NetApp controllers and pricing is in the $170-200/GB range. PAM is inefficient for two reasons. First, PAM is read-only so customers still need many hard drives to handle the write workload. Second, PAM can only help the read performance of the controller in which it is installed. This means you may be adding up to five PAM cards to every NetApp controller in your environment. This approach is expensive and highly inefficient in environments with multiple NetApp systems since Flash is added to individual storage “silos” to handle the peak load on the silo but is under utilized much of the time.

EMC offers Flash as an SSD array. Up 16 SSDs and 6.4TB of Flash are supported per array and pricing is in the $200-300/GB range, making a fully populated array very expensive. Beyond the price, EMC’s SSD arrays are inefficient since data movement to Flash is slow and not granular. Data movement is triggered by a policy engine that measures data activity across long periods (e.g. days) and cannot immediately respond to a hot application. Also, data is moved in large volume-level chunks meaning lots of cold data is using expensive Flash storage alongside the hot data.

Avere offers Flash as a caching appliance that sits in front of NAS systems from other vendors. The Avere FXT 2700 has 512GB of Flash per appliance, scales to 25 appliances (13TB) per cluster, and accelerates read and write workloads. The Avere architecture is highly efficient since it provides a consolidated Flash layer that is shared by all the NAS systems. This allows customers to provision the right amount of Flash to deliver the aggregate performance across all the NAS systems. In addition, the FXT 2700 makes the most efficient use of the Flash storage by moving data in real-time and at the finest level of granularity possible, blocks with files.

For more information about the FXT 2700 and other FXT models, please visit the Avere website product page.

Jeff Tabor

Part 1: Things to Consider Before Upgrading Your NAS

In today’s homogeneous-media NAS architectures, users will invariably be asked to add hard disk drives to increase storage capacity or system performance. Beyond the acquisition cost of the hard disk media, users need to consider the total cost of adding hard disk drives to a NAS infrastructure. Beyond the acquisition cost of enterprise-class Fibre Channel or SAS drives, you also have to factor in the hidden costs of power and cooling, the necessary rack space and data center floor space. Furthermore, adding hard disk drives is a costly and largely ineffective method of increasing NAS performance. HDDs are inefficient at providing IOPS, so adding expensive HDDs is not a good IT investment from a cost-performance metric.

The value proposition becomes even worse when companies over provision capacity and short-stroke drives in a futile attempt to achieve higher performance. Instead of trying to extend HDD technology, NAS users will get a better return on their storage investment by moving from a homogeneous solution to a tiered NAS solution. Adding intelligent storage tiering, that makes the best use of newer solid state disks, produces a dramatic gain in IOPS performance, with far lower impact on power, cooling, rack space and data center floor space. This approach eliminates the expense of adding potentially hundreds of enterprise HDDs to the NAS environment, recovers the capacity on existing HDDs that was previously unavailable due to short-stroking, and solves NAS performance issues while enabling IT to leverage low-speed (and low cost) SATA drives for capacity requirements.

The two-stage tiered NAS architecture, using Avere’s FXT family of products, provides for all of these benefits by permitting the addition of solid state media to any existing NAS infrastructure.

Ron Bianchini

Tiering is Dead, Long Live Tiering

The term Dynamic Tiering has really been abused in the network storage industry lately. Everyone talks about tiering in one form or another. Vendors that do not have currently shipping products talk about futures and those vendors that have shipping products do not actually disclose how their tiering works, which applications benefit, nor by how much.

Unfortunately, one of the giants in the industry has even further muddied the waters. According to a TechTarget article last week regarding NetApp’s earnings call, NetApp’s CEO said that “the whole concept of tiered storage is going to go away.” Presumably, this refers to EMC’s Fast technology, since one industry Goliath always needs to beat up on another Goliath. The unfortunate thing for NetApp, is that on the same call, they completely reversed their position on tiering when they touted the success of their form of tiering, which includes their SSD-based performance accelerator. Tiering is dead, long live tiering!

The simple truth is that no single technology has ever proven to be the panacea of data storage. SATA drives have the lowest cost per bit and are great for archival storage. FC or SAS drives offer a compromise of performance and capacity, excelling at large block accesses like those found in large sequentially read files. Solid State Devices, based on Flash, offer unsurpassed performance for random reads and small block sizes.

More importantly, the differences from one technology to another are measured by orders of magnitude, not by mere percentage improvements. Because of this, a solution that can leverage the strengths of all the technologies is guaranteed to out-perform and cost less than a solution that only uses one or two.

Rather than predicting which storage tiers will win and the capacity of those tiers in a solution, the important information needed to judge a tiering solution is how the tiers are used. Most vendors are completely silent on this. Here are three examples of the more egregious mis-steps in dynamic tiering.

The first mis-step is tiering at too large a level of granularity. Consider a solution that tiers at the volume level. If a few files in a volume become active, the entire volume will need to be promoted to a more expensive tier to get the performance needed for the few files. This results in cost inefficiency as extra data is promoted to the expensive storage and performance inefficiency as the entire volume consumes read/write bandwidth of both tiers that are involved in the promotion of the volume.

The second mis-step is not tiering frequently enough. Several vendors have proposed tiering schedules that are measured in terms of days. This is crazy. Consider the file that I am editing for this blog. I might work on this file for a few days and then rarely, if ever, look at the source file again once the blog is posted. If activity is measure across days, by the time this source file is promoted, it should be archived.

The third mis-step is not using the correct media. Most vendors actually completely avoid this question and require the administrator to set policies. In those instances where the vendor does decide, frequently wrong media is chosen.

An example of this third mis-step is to examine the two stage architecture promoted on NTAP’s earnings call – SSD & SATA. In their architecture the SSD-based performance accelerator is apparently only used for read data. All write data is sent to SATA storage. This is terribly inefficient and is even proven in their SPECsfs®08 posting. To achieve the same performance in a NTAP 3160 with the accelerator module, they required almost twice the number of SATA disks than when they run the same benchmark using FC disks (96 SATA disks versus 56 FC disks). Since the SATA disks have over 3x the capacity of the FC disks, they deployed over 6x as much capacity to store the same amount of data. This over-provisioning is a result of not tiering the media properly and is extremely inefficient in terms of space, power and equipment costs.

What is clear from all of the press on “Dynamic Tiering”, is that the term is both extremely overused and misunderstood. Because of the orders of magnitude differences in storage media costs and performance, data storage solutions can clearly benefit from tiering if executed properly.

Ron Bianchini

Adding SSD Performance to NAS-Based Applications

Today we announced the FXT 2700, which is our first product based on SLC Flash technology. In the time that customers have been evaluating this product, we have talked to many organizations interested in testing the benefits of SSD technology. The good news is that for those looking to experiment with the performance benefits of SSD, the FXT 2700 turns out to be an easy and cost effective way to add SSD performance to any NAS-based application.

Using Avere’s 2-stage implementation, the FXT 2700 can be deployed in any vendor’s environment. Deployment and removal requires no down-time. You are not limited to the offerings of a single vendor. You are not limited to expensive internal hardware adapters, nor are you limited to entire arrays of SSD media that must be allocated to specific volumes.

The FXT 2700 only allocates the portion of the namespace to the SSD media that needs the added performance. The SSD media is shared for the entire namespace. It can be deployed in any environment with an existing NFSv3-based NAS server. And the SSD media can be scaled by clustering FXT 2700s which globally share the media across all nodes and all application workloads.

Avere built the FXT 2700 specifically to support two types of workloads, those with very high performance applications and those applications with high levels of random-access data.

For high performance applications, our typical customer is already running a NAS server, usually with costly Fibre Channel media by necessity. Introducing an FXT 2700 cluster in that environment dramatically reduces the workload on the NAS server. In some of these deployments, short stroking is used to eke out even more performance. By reducing the workload on the NAS server, the requirement for short stroking is eliminated, which effectively frees up previously unusable storage space on already spinning media.

For applications with high levels of random-access data, hard drive based NAS servers can be inefficient. Hard drives are excellent at reading data from their platters once the read head is located at or near the proper track. If the read head is not near the proper track, which is usually the case for random workloads, the head seek time reduces the performance of a hard drive from thousands to hundreds of operations per second. Adding the solid state tier of the FXT 2700 in such systems not only improves application performance, but it also allows the hard drives to be used for bulk updates, which increases their efficiency as well.

Through the FXT 2700 launch and customer evaluations, I continue to be fascinated by the many applications that users have chosen to deploy its SSD technology, even outside of our expected workloads. This breadth of applications and workloads that can be attributed to the FXT 2700′s ease of deployment, made possible by Avere’s vendor neutral 2-stage architecture, and by its cost effectiveness, in that the SSD tier is automatically used only where needed.

Ron Bianchini

Avoiding Long Lines at Christmas (and at Your File Server)

The holidays are upon us and traffic is backing up everywhere. I don’t do much of the Christmas shopping at my house so typically I’m not really impacted. This year is different. The Avere office is near one of the largest malls in Pittsburgh and there are lines everywhere I go. Traffic lights are backed up. The parking lot at my favorite burrito shop is packed. Even the drugstore is full of holiday shoppers.

Waiting sucks.

(Beware of segue.)

And waiting is exactly what you’re doing if you’re using a file server with high latency. Check out Jeff Butler’s blog post on this very point from last week.

Latency can kill your company’s productivity and, more importantly, profitability. Amazon found that every 100ms of latency cost them 1% in sales. Google saw a 20% drop in search usage from 500ms additional latency. TABB Group estimates that if a broker’s electronic trading platform is 5 milliseconds behind the competition, it could lose at least 1% of its flow; that’s $4 million in revenues per millisecond.

In a previous blog post, I explained how Avere is providing a high-performance NFS file server that uses, on average, 5x less disks, power, and space than competitive offerings. Avere demonstrated this using SPECsfs2008 results that can be found on the spec.org site. This time let me use these same results to demonstrate our latency advantage.

Like before, let’s compare all the results that achieved greater than 100,000 ops/sec. (Note: SPECsfs2008 reports latency as ORT, or overall response time, in msec.)

Avere FXT 2500 6 Node Cluster: 131,591 ops/sec, 1.38msec ORT
BlueArc Mercury 100 Cluster: 146,076 ops/sec, 3.34msec ORT
Exanet ExaStore 8 Node Cluster: 119,550 ops/sec, 2.07msec ORT
HP BL860c 4 Node Cluster: 134,689 ops/sec, 2.53msec ORT
Huawei Symantec N8500 Clustered NAS: 176,728 ops/sec, 1.67msec ORT
NetApp FAS6080 FCAL Disks: 120,011 ops/sec, 1.95msec ORT

Avere is the clear winner when it comes to latency (i.e. ORT). The average of all the non-Avere latencies above is 2.31msec. This is 67% higher than Avere’s latency of 1.38msec. This difference can have a huge impact on application performance and the productivity and profitability of your business. For example, this can mean completing a large data processing job in an hour on Avere rather than an hour and forty minutes on average with the other solutions. Add this up across all the people and jobs in your company and this can mean getting your product to market months earlier with Avere.

Don’t wait. Check out the product pages on our website for more information.

Now if we could only get some elves to help with our end-of-year shipments.

SPEC® and the benchmark name SPECsfs®2008® are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Dec 21, 2009. Above we compare all SPECsfs2008_nfs.v3 results that achieved greater than 100k ops/sec throughput. For the latest SPECsfs2008 benchmark results, visit http://www.spec.org/sfs2008.

Jeff Tabor

Record-Setting Latency and Why it Matters

Low latency is near and dear to our hearts at Avere Systems. In my last blog post, I talked about our record-breaking SPECsfs2008 performance results in general. In this post, I’m only going to talk about latency. Our results were not only one of the highest throughput results, but at 1.38 ms of overall response time (ORT) it represents one of the lowest latency results ever posted for SPECsfs2008.

Latency is much harder to get your mind around than throughput. Throughput is easy to understand — whoever has a higher number wins. Sometimes, even more importantly, whoever has better price performance ($/operations/second) wins. But there’s a very important reason that the SPEC benchmark disclosure rules state that you must disclose both a throughput number and an overall response time. Latency numbers are meaningless without a corresponding throughput number. Likewise, throughput numbers are meaningless without the attached latency.

So why all the fuss about latency?

Latency can determine application performance and user experience just as much as raw throughput. Let’s start with a simple example. There are two ways to build a system that attains 1000 operations / second.

The first would involve just processing requests serially at a rate of 1 operation / 1 ms (if these were SPECsfs2008 operations, the result would be quoted at 1000 ops/second @ 1 ms). The second approach would involve processing two operations in parallel at a rate of 1 operation / 2 ms (1000 ops/second @ 2 ms). This example is a vastly simplified version of what happens in real life with any system that involves parallel processing (from CPU architecture to storage systems).

Both approaches have advantages.

The first approach has a lower latency. Additionally, this approach only requires one request stream to get performance out of the system. Imagine one client just making requests for data one after another in a serialized stream.

The second approach may be lower cost because getting a single request through the system in 1 ms may be much more expensive than getting two operations in parallel through in 2 ms (maybe faster CPUs are required, more memory, faster disks). Furthermore, it may be easier to take the parallel system and make four operations run in parallel with 2 ms latency. The big drawback here is that two clients need to be sending requests in parallel. Not only do two clients need to be sending requests in parallel, but the two clients need to be active all of the time. Moreover, if you have two users active on the system, their experience with a 2 ms system is approximately 1/2 as good.

2 ms may not seem like a lot of time, but it may add up for a complex NFS operation such as READDIRPLUS. A READDIRPLUS operation involves fetching directory entries and their attributes. Let’s say the directory has 10 entries, and each entry takes 2 ms to fetch their attributes, they may be waiting for 20 ms before a response to their simple directory list (e.g. ‘ls -l’) returns. Every complex system has some mixture of serial and parallel operations. The resulting performance measured by SPEC determines on average how long a single operation requires to flow through the system.

To simplify matters a bit, SFS results are reported with an overall response time (ORT) metric. This number allows us to summarize the latency of a given system. This metric is actually derived by calculating the area under the throughput vs. average response time curve (see below, this graph is also part of every detailed SPECsfs2008 result). For Avere’s 131,591 Ops/Sec @ 1.38 ms ORT result, the full curve shows the details of average response time at different load points:

http://averesystems.files.wordpress.com/2009/12/latencygraph.png?w=600&h=290
Figure 1. Click here for the full-sized image

For the first point on the graph, we executed 1 operation / 0.5 ms. This meant that a single stream could do 2000 operations / second. Roughly 6.5 request streams running in parallel would be required to achieve this result at this load point.  This was a 6-node system, so this also meant that there was on average only about 1 operation executing at the storage servers at a time.  At the other end of the spectrum, we executed one operation in 4.6 ms, which means that a single stream would execute 217 operations per second. To achieve 131,591 operations per second, then roughly 606 parallel request streams would be required. Notice that our tests were run with 18 load generators each running 48 processes for a total of 864 parallel streams.

Latency can dramatically affect application performance. Nothing annoys software developers (myself included) more than slow builds. A software build is generally a stream of serial requests to storage. Even with parallel build processes, there’s a limited amount of parallelism that can be exposed to the storage system. Fortunately, at Avere we have a high-throughput, low-latency storage product that can make our compile jobs go fast (and parallel builds)! This application example is a bit self-serving (but it does provide us with lots of motivation to make the product go fast). In future posts, we’ll talk more about the business impact of low-latency on different application workloads.

The bottom line

When evaluating storage systems for an application, you must consider latency as a decision-making criterion as important as price, capacity and throughput. First, make sure all systems under consideration are meeting your requirements for price, capacity, and throughput, and then compare the latency. When comparing latency numbers, calculate a percentage difference between the numbers. Latency numbers are small and the differences between them are small, but it is the cumulative affect of latency that impacts the performance of your application and this is best measured by the percentage difference.

As you can imagine, comparing latency numbers between two systems is sometimes challenging, but it can easily be used as a tiebreaker if throughput or prices are similar.

SPEC® and the benchmark name SPECsfs®2008 are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Dec. 13, 2009.  For the latest SPECsfs2008 benchmark results, visit http://www.spec.org/sfs2008.

Jeff Butler

ESG Research & Avere on Tiered Storage

Tiered storage is in the news again this week with EMC’s announcement on the availability of FAST.

In prior posts, I’ve shared with you the questions you should ask of any vendor that claims it can support dynamic or automated tiering. More recently, I had a conversation with Terri McClure of ESG Research regarding the technological challenges such as a lack of standards, limited storage system functionality and the pain of data migration between tiers that have limited tiering adoption to date and how Avere Systems addresses each of these challenges with its FXT Series. We recorderd our conversation and it is now available to listen to as a podcast:

ESG Avere Tiered Storage Podcast

Rebecca Thompson

Scalability of Dynamic Storage Tiering: The Missing Link

Less than a month ago I wrote a post titled “Two-Stage NAS Storage: Data Delivery and Data Management”. In that post, I wrote that a dynamically tiered NAS architecture only works if the implementation allows the relative capacities of the tiered media to be scaled independently. I reviewed Avere Systems FXT product series, which consists of a dynamically tiered architecture and is based on a two-stage implementation that allows independent scaling of the performance and capacity tiers.

To illustrate independent performance scaling, Avere posted SPECsfs2008_nfs performance numbers for one-node, two-node and six-node FXT clusters and showed linear scaling of the benchmark results.

Since my original post, there has been much press about dynamic tiering. In every case, the vendors claiming dynamic tiering seem to have missed the point. What they are missing is the ability to scale performance tiers independently from capacity tiers. Consider one vendor, NetApp, and their recent SPECsfs2008 postings as an example. In the past few months, NetApp has posted three more server configurations to SPECsfs2008, bringing their total postings to seven. Each of the seven configurations is unique and there is no way to scale non-disruptively from one configuration to another.

For NetApp, performance scaling requires the addition of PAM cards (DRAM or SSD modules designed to accelerate reads), until the chassis runs out of PAM slots or hits a CPU bottleneck, and then the wholesale replacement of the server. The only way to scale non-disruptively is to add disk drives, which scales both performance and capacity at the same time. This results in huge inefficiencies, unless the target application requires performance and capacity in exactly the ratio provided by the added disk drives.

In the Avere FXT postings, one common architecture is represented in all postings: one-node, two-node and six-node FXT clusters to achieve the performance target and a single Mass server to achieve the required capacity. Performance scales linearly among the configurations by adding FXT nodes – nodes are added and join the cluster automatically and on-line. Capacity is scaled among the configurations by adding high-density disk drives to the Mass. This high level of configurability is where Avere’s two-stage implementation of dynamic tiering really excels. It allows the deployment to scale performance and capacity independently to match the exact requirements of the application.

To illustrate the efficiency of the Avere two-stage implementation consider the performance level achieved on the SPECsfs2008 benchmark divided by the number of disk drives used in the configuration, or Ops/disk. This is an excellent metric of the efficiency of a solution as it indicates the amount of equipment needed to achieve a given performance and capacity level.

This also happens to be the subject of a recent blog post by NetApp. What NetApp’s blog fails to indicate is that the Avere SPECsfs2008 postings achieve the highest number of Ops/drive of any vendor (see Figure 1).

http://averesystems.files.wordpress.com/2009/12/opsdrive3.png?w=500&h=391
Figure 1. Click here for the full-sized image

For a given performance level and capacity requirement, Avere’s solution uses the least amount of equipment, while the application sees the exact same performance and capacity provided by the NAS server. Avere achieves this by using the highest density disk drives for capacity and faster media tiers to deliver performance which allows the system to scale either attribute independently.

Ron Bianchini