Latency is the time period between starting an operation and completing movement of the first data. Latency has become increasingly important in computers in recent years as techniques have come into use that allow data to be streamed at very high rates once the transfer starts. When, as is often the case, isolated items of data are to be used, performance may be influenced very strongly by the latency in accessing/storing the data rather than by the high streaming rates that would apply to additional data moved from/to the same place.

There has been great emphasis on the dazzling data rates of streamed data. Those are certainly important to many situations such as moving graphical data to a video card. However, for many applications performance is significantly affected by latencies in moving data. In many cases such as DRAM access, latencies are improving much less quickly than streaming speeds. In the 20 years after the introduction of the IBM PC, streaming bandwidths to and from memory improved by a factor of more than 500 while latency improved only by a factor of 3 or 4. The rather modest improvement in latency has been obfuscated by vendors changing their categorizations of DRAM performance from latency based numbers in the 1980s and early 1990s to bandwidth based numbers in the late 1990s.

Caching, prefetching, and parallelism in system design all act to decrease the impact of latency in many situations. They also make it very difficult to analyze exactly what the impact of latency is.

Return To Index Copyright 1994-2002 by Donald Kenney.