Browsing LCS Technical Memos (1974 - 2003) by Author "Agarwal, Anant"

Analyzing Multiprocessor Cache Behavior Through Data Reference Modeling

Tsai, Jory; Agarwal, Anant (1993-02)

This paper develops a data reference modeling technique to estimate with high accuracy the cache miss ratio in cache-coherent multiprocessors. The technique involves analyzing the dynamic data referencing behavior of ...

APRIL: A Processor Architecture for Multiprocessing

Agarwal, Anant; Lim, Beng-Hong; Kranz, David; Kubiatowicz, John (1991)

Processors in large-scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching processor called APRIL with ...

Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-memory Multiprocessors

Agarwal, Anant; Kranz, David A.; Natarajan, Venkat (1995-09)

This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have looked at hyperplane ...

Automatic Partitioning of Parallel Loops for Cache-coherent Multiprocessors

Agarwal, Anant; Kranz, David; Natarajan, Venkat (1992-12)

This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. The framework introduces the notion of uniformly intersecting ...

Column-associative Caches: A Technique for Reducing the Miss Rate of Direct-mapped Caches

Agarwal, Anant; Pudar, Steven D. (1993-11)

Direct-mapped caches are a popular design choice for high-performance processors; unfortunately, direct-mapped caches suffer systematic interference misses when more than one address map into the same cache set. This paper ...

Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed -Memory Multiprocess

Barua, Rajeev; Kranz, David; Agarwal, Anant (1995-01)

Harnessing the full performance potential of cache-coherent distributed shared memory multiprocessors without inordinate user effort requires a compilation technology that can automatically manage multiple levels of memory ...

FUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor

Mackenzie, Kenneth; Kubiatowicz, John; Agarwal, Anant; Kaashoek, M. Frans (1994-10)

Multimodel multiprocessors provide both shared memory and message passing primitives to the user for efficient communication. In a multiuser machine, translation permits machine resource to be virtualized and protection ...

Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory

Prasanna, G.N. Srinivasa; Agarwal, Anant; Musicus, Bruce R. (1992-10)

This paper presents a hierarchical approach for compiling macro dataflow graphs for multiprocessors with local memory. Macro dataflow graphs comprise several nodes (or macros operations) that must be executed subject to ...

Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory

Prasanna, G.N. Srinivasa; Agarwal, Anant; Musicus, Bruce R. (1992-12)

This paper presents a hierarchical approach for compiling macro dataflow graphs for multiprocessors with local memory. Macro dataflow graphs comprise several nodes (or macros operations) that must be executed subject to ...

Integrating Message-passing and Shared-memory: Early Experience

Kranz, David; Johnson, Kirk; Agarwal, Anant; Kubiatowicz, John; Lim, Beng-Hong (1992-10)

This paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. Because message-passing mechanisms are much more efficient than ...

Low-cost Support for Fine-grain Synchronization in Multiprocessors

Kranz, David; Lim, Beng-Hong; Agarwal, Anant (1992-06)

As multiprocessors scale beyond the limits of a few tens of processors, they must look beyond traditional methods of synchronization to minimize serialization and achieve the high degrees of parallelism required to utilize ...

Maps: a Compiler-Managed Memory System for RAW Machines

Barua, Rajeev; Lee, Walter; Amarasinghe, Saman; Agarwal, Anant (1998-07)

Microprocessors of the next decade and beyond will be built using VLSI chips employing billions of transistors. In this generation of microprocessors, achieving a high level of parallelism at a reasonable clock speed will ...

The Sensitivity of Communication Mechanisms to Bandwidth and Latency

Chong, Frederic T.; Barua, Rajeev; Dahlgren, Fredrik; Kubiatowicz, John D.; Agarwal, Anant

The goal of this paper is to gain insight into the relative performance of communication mechanisms as bisection bandwidth and network latency vary. We compare shared memory with and without prefetching, message passing ...

Stream Algorithms and Architecture

Henry, Hoffman; Strumpen, Volker; Agarwal, Anant (2003-03)

Wire-exposed, programmable microarchitectures including Trips [11]], Smart Memories [8], and Raw [13] offer an opportunity to schedule instruction execution and data movement explicitly. This paper proposes stream algorithms, ...

Virtual Wires: Overcoming Pin Limitations in FPGA-based Logic Emulators

Babb, Jonathan; Tessier, Russell; Agarwal, Anant (1992-11)

Existing FPGA-based logic emulators suffer from limited inter-chip communication bandwidth, resulting in low gate utilization (10 20 percent). This resource imbalance increases the number of chips needed to emulate a ...