dc.description.abstract | Wire-exposed, programmable microarchitectures including Trips [11]], Smart Memories [8], and Raw [13] offer an opportunity to schedule instruction execution and data movement explicitly. This paper proposes stream algorithms, which, along with a decoupled systolic architecture, provide an excellent match for the physical and technological constraints of single-chip tiles architectures. Stream algorithms enable programmed systolic computations for different problem sizes, without incurring the cost of memory accesses. To that end, we decouple memory accesses from computation and move the memory accesses off the critical path. By structuring computations in systolic phases, and deferring memory accesses to dedicated memory processors, stream algorithms can solve many regular problems with varying sizes on a constant-sized tiled array. Contrary to common sense, the compute efficiency of stream algorithms increases as we increase the number of processing elements. In particular, we show that the compute efficiency of stream algorithms can approach 100% asymptotically, that is for large numbers of processors and appropriate problem size. | en_US |