Low latency is near and dear to our hearts at Avere Systems. In my last blog post, I talked about our record-breaking SPECsfs2008 performance results in general. In this post, I’m only going to talk about latency. Our results were not only one of the highest throughput results, but at 1.38 ms of overall response time (ORT) it represents one of the lowest latency results ever posted for SPECsfs2008.
Latency is much harder to get your mind around than throughput. Throughput is easy to understand — whoever has a higher number wins. Sometimes, even more importantly, whoever has better price performance ($/operations/second) wins. But there’s a very important reason that the SPEC benchmark disclosure rules state that you must disclose both a throughput number and an overall response time. Latency numbers are meaningless without a corresponding throughput number. Likewise, throughput numbers are meaningless without the attached latency.
So why all the fuss about latency?
Latency can determine application performance and user experience just as much as raw throughput. Let’s start with a simple example. There are two ways to build a system that attains 1000 operations / second.
The first would involve just processing requests serially at a rate of 1 operation / 1 ms (if these were SPECsfs2008 operations, the result would be quoted at 1000 ops/second @ 1 ms). The second approach would involve processing two operations in parallel at a rate of 1 operation / 2 ms (1000 ops/second @ 2 ms). This example is a vastly simplified version of what happens in real life with any system that involves parallel processing (from CPU architecture to storage systems).
Both approaches have advantages.
The first approach has a lower latency. Additionally, this approach only requires one request stream to get performance out of the system. Imagine one client just making requests for data one after another in a serialized stream.
The second approach may be lower cost because getting a single request through the system in 1 ms may be much more expensive than getting two operations in parallel through in 2 ms (maybe faster CPUs are required, more memory, faster disks). Furthermore, it may be easier to take the parallel system and make four operations run in parallel with 2 ms latency. The big drawback here is that two clients need to be sending requests in parallel. Not only do two clients need to be sending requests in parallel, but the two clients need to be active all of the time. Moreover, if you have two users active on the system, their experience with a 2 ms system is approximately 1/2 as good.
2 ms may not seem like a lot of time, but it may add up for a complex NFS operation such as READDIRPLUS. A READDIRPLUS operation involves fetching directory entries and their attributes. Let’s say the directory has 10 entries, and each entry takes 2 ms to fetch their attributes, they may be waiting for 20 ms before a response to their simple directory list (e.g. ‘ls -l’) returns. Every complex system has some mixture of serial and parallel operations. The resulting performance measured by SPEC determines on average how long a single operation requires to flow through the system.
To simplify matters a bit, SFS results are reported with an overall response time (ORT) metric. This number allows us to summarize the latency of a given system. This metric is actually derived by calculating the area under the throughput vs. average response time curve (see below, this graph is also part of every detailed SPECsfs2008 result). For Avere’s 131,591 Ops/Sec @ 1.38 ms ORT result, the full curve shows the details of average response time at different load points:

For the first point on the graph, we executed 1 operation / 0.5 ms. This meant that a single stream could do 2000 operations / second. Roughly 6.5 request streams running in parallel would be required to achieve this result at this load point. This was a 6-node system, so this also meant that there was on average only about 1 operation executing at the storage servers at a time. At the other end of the spectrum, we executed one operation in 4.6 ms, which means that a single stream would execute 217 operations per second. To achieve 131,591 operations per second, then roughly 606 parallel request streams would be required. Notice that our tests were run with 18 load generators each running 48 processes for a total of 864 parallel streams.
Latency can dramatically affect application performance. Nothing annoys software developers (myself included) more than slow builds. A software build is generally a stream of serial requests to storage. Even with parallel build processes, there’s a limited amount of parallelism that can be exposed to the storage system. Fortunately, at Avere we have a high-throughput, low-latency storage product that can make our compile jobs go fast (and parallel builds)! This application example is a bit self-serving (but it does provide us with lots of motivation to make the product go fast). In future posts, we’ll talk more about the business impact of low-latency on different application workloads.
The bottom line
When evaluating storage systems for an application, you must consider latency as a decision-making criterion as important as price, capacity and throughput. First, make sure all systems under consideration are meeting your requirements for price, capacity, and throughput, and then compare the latency. When comparing latency numbers, calculate a percentage difference between the numbers. Latency numbers are small and the differences between them are small, but it is the cumulative affect of latency that impacts the performance of your application and this is best measured by the percentage difference.
As you can imagine, comparing latency numbers between two systems is sometimes challenging, but it can easily be used as a tiebreaker if throughput or prices are similar.
SPEC® and the benchmark name SPECsfs®2008 are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Dec. 13, 2009. For the latest SPECsfs2008 benchmark results, visit http://www.spec.org/sfs2008.

One Trackback
[...] Demand-Driven Storage Blog by Avere Systems Skip to content About « Record-Setting Latency and Why it Matters [...]