The computer industry pays close attention whenever Dave Patterson, a computer science professor at the University of California, Berkeley, begins a new research project. In the early 1980s, Patterson and fellow researchers pioneered RISC - reduced instruction-set computing - a technology that revolutionised microprocessor design and allowed small companies such as Sun and SGI to leap-frog Intel. Then, in 1987, Patterson helped develop RAID, a disk drive technology that provides fast and cheap mass storage and is used in almost every large file server. Now Patterson has begun a new project called IRAM, for intelligent RAM. The idea is to put a microprocessor into a memory chip - a move that may radically improve computer performance. Again.
With RAID, Patterson exploited the economies of scale that make it cheaper to combine lots of little hard drives than to try to build one large drive. His IRAM project is motivated by a similar insight. "I wrote a survey article for http://www.sciam.com/WEB/index.html">Scientific American trying to predict the future, and it struck me that we have two industries making semiconductors: the memory industry and the microprocessor industry," he explains. "With the cost of building fabrication lines going up, it won't make sense to build two separate but very similar fabrication plants." Once Patterson started thinking about the possibility of combining memory and microprocessors, he realised that IRAM could solve one of the biggest problems facing computer designers.
That problem was dramatically spelled out in 1994 by two professors at the University of Virginia, who pointed out that based on current trends in memory and microprocessor speed, in less than a decade computer performance will be completely determined by how long it takes to access memory. You could improve processors all you wanted, but computers wouldn't run any faster.
To understand why, take a look at the trend lines. In the last 20 years, microprocessors have improved in speed a hundredfold. Memory chips have increased in capacity by a factor of 256,000. But memory speed has improved by only a factor of 10. The result ? Processors spend more and more of their time waiting for data from memory, and less and less doing useful computation. Computer designers have been trying for some time now to obviate this problem with memory caches and clever tricks. But even these techniques are being overwhelmed by the ever-widening gap between memory and processor speed.
The reason IRAM may be able to demolish the memory wall has to do with how standard RAM chips are built. Main computer memory is usually a variant of RAM known as dynamic RAM. A DRAM chip contains a matrix of memory cells, with each cell holding a single bit. A cell is made up of a capacitor - a tiny device that can hold an electrical charge - and a transistor, which acts like a switch. To read a cell, the transistor is activated, triggering the capacitor. A discharge of current signifies a 1, and the capacitor must be recharged. No discharge represents a 0.To select which bit to read, a computer uses a grid of wires connected to the memory cell matrix. First, a small voltage is applied to the selected row, thus selecting all bits in that row, then the appropriate column wire is triggered. The combination is enough to activate the transistor at that row and column. Doing this takes time, however, and much of the latency involved in memory accesses comes from the time it takes to select the desired bits.
Current DRAM designs are also hamstrung by the low bandwidth between the processor and memory. Many DRAM chips used in PCs have four output lines, so we can read only four sequential bits at a time. To widen this tiny straw, many computer designers hook up eight DRAM chips in parallel for a total memory bandwidth of 32 bits. However, this grows increasingly expensive and still acts as a bottleneck. IRAM solves both these problems. If predictions are accurate, it should be able to reduce memory latency two- or threefold and improve memory bandwidth by a factor of 100. The latency improvement comes simply because the processor and memory are closer together. This eliminates the time required for the signal to travel, and may allow for more sophisticated, hence quicker, addressing schemes.
The incredible increase in bandwidth comes because the processor and memory are on the same chip, making it possible to run lots and lots of wires between them. In contrast, running wires between separate chips is severely restricted by power requirements, as well as cost and space.
Although IRAM ends up helping more with bandwidth than latency, this is exactly what we need for a growing class of applications. Almost anything that involves graphics, such as 3D animation, involves manipulating long streams of bits. IRAM opens up the necessary fire hose of data between the memory and processor.
Nonetheless, IRAM faces significant roadblocks. Chip makers have got rich stamping out dumb memory for years - why change now? "This would be a tough nut to crack," Patterson admits, "if the DRAM industry had a bright and rosy future."However, the DRAM industry is facing a crisis due to slowing demand and a glut of capacity. By offering a new use for fabrication lines, IRAM may provide an escape hatch.
Perhaps what's most exciting about IRAM is that it levels the playing field. As Michael Slater, publisher of Microprocessor Report, points out, established players like Intel are unlikely to begin building IRAM-type devices since they don't know anything about DRAMs. So IRAM may finish off the job that Patterson started with RISC: destroying the Intel and Motorola duopoly.
Steve G. Steinberg (steve@wired.com) is a section editor at Wired US.