Sunday, April 23, 2006

Under the covers of a memory access

When someone writes a computer program, they use many statements that look something like A=B+1. This simple operataion retrieves a piece of data from memory location B, adds one to it, and then stores the result in memory location B. Statements of this form are incredibly common in computer programs; they probably make up the majority of the operations performed by programs. With something so simple and common, you would think that it would be a trivial matter for the computer to carry it out. This is not the case at all. In this posting, I will give a very high-level introduction to some of the steps involved in just retrieving a single piece of data from memory. I have made a number of simplifications (such as placing the memory address translation before any of the caches), but hopefully it is detailed enough to give people a taste of the complexities involved.

The first step is to translate the memory address used by the program into a different address, called a physical address. Most modern computers use a special thing called a memory-management unit (MMU) to allow each process on a system to pretend that it is the only thing running. It does this by having the process use a large set of virtual ("fake") addresses which are then translated by the MMU into physical ("real") addresses on demand. Of course, this can be a slow process; the computer must have a set of translation tables for each process that map from virtual to physcical memory addresses, and it must consult this table for every single memory access. These tables are stored in the computer's memory, and thus each memory access potentially turns into TWO memory accesses. Fortunately, a high-speed caching mechanism called the TLB is used to speed up translation table lookups, which makes the virtual-to-physical translation process fast enough to be feasible.

Once the computer has the physical address of the desired item in memory, it then proceeds to look in a set of high-speed caches for it. Although the RAM in a computer is incredibly fast when compared to something like a hard drive, it is actually somewhat slow when compared to the speed of a CPU. Thus, memory caches exist between the processor and memory that are small but fast. Whenever a piece of data is retrieved from RAM, it is subsequently placed into one or more of these caches so that future references to it are much quicker. A lot of effort has been put into making these various caches as fast and efficient as possible, as they can have a dramatic effect on the performance of a computer. In many modern computers, there are two levels of caches: L1, which is very small but also very fast, and L2, which is somewhat larger but also somewhat slower. Some processors even have a third cache (L3) that is even larger than the previous two. Each of these caches must be checked in turn to see if one of them contains the requested data.

If the requested data is not in any of the caches, then the computer is forced to actually retrieve it from main memory. This is a somewhat lengthy process and is one of the big obstacles in speeding up computers; in the time that it takes to retrieve a piece of data from memory, a processor could possibly execute hundreds or thousands of instructions. Unfortunately, the processor many times must simply sit and twiddle its thumbs while the memory system services the request.

There is another complication, though. Sometimes the operating system on the computer will tell a little white lie to the running processes by letting them believe that something is in RAM when it really is on the hard drive. This can happen in a few different circumstances, the most well-known of which is when the operating system "swaps" something out from memory to the drive. This works rather well when a block of data doesn't need to be used soon, but it does complicate memory accesses because the operating system needs to dynamically and transparently move things in and out of memory. Thus, if the requested piece of data has been swapped out to the hard drive, the memory system will essentially wake up the operating system and tell it to put it back into memory. This process is incredibly time consuming and involves thousands and thousands of additional memory accesses, as the operating system must determine where the data is located on the hard drive and communicate with the disk to load it in.

Once the requested piece of data has been found, the memory system returns the data to the processor (placing it into one or more of the caches along the way). After all is said and done, that single memory access has potentially cause numerous additional memory accesses, instruction executions, and even hard disk accesses. It is easy to see why computer and operating system designers work very hard to mitigate the costs associated with memory accesses by taking full advantage of high-speed caching mechanisms and advanced algorithms to predict program behavior. Even compilers use special tricks to try and ameliorate the situation, by doing things like reordering instructions and making the most efficient use of low-level processor resources.

This overview of a computer's memory system has been very brief, and thus is somewhat incorrect int its details. For further information, I would recommend that people read a good computer architecture book (such as Computer Organization and Design) as well as an introductory operating systems book (Operating System Concepts covers many of these details). The Wikipedia also has extensive coverage of many low-level computer architecture details like this.