CASA: Computer Architecture and System Applications


    We are involved in the design of next generation Computing Systems in cooperation with several European Universities and Companies (ST Microelectronics, NXP, IBM, ARM and others). In this area, the main challenges that we are facing nowadays derive from the awaited shift towards Multi-Core Architectures On-a-Chip, that is happening in this decade and will deeply condition next decades. For example, this requires focusing on power efficient architectures, high performance core and memory design, dedicated and adaptive hardware structures, on-the-fly software optimizations and tools to support the envisioned shift to new programming models. Moreover, the huge multi-billion amount of transistors available on the chip poses a fundamental role to Computer Architecture Research, which traditionally bridges Software requirements (Multimedia Real Time Computation, Medical Imaging, Web and Transactional Applications) and available Hardware resources. Classical architectures have to be revised in order to scale out and cope with new problems such as wire delay, decentralized resources, power and temperature issues. We are contributing to solve the above challenges in the context of the Network of Excellence on High Performance Embedded Architecture and Compilers (HiPEAC) and Scalable Architectures (SARC) Integrated Project. In particular, our current research includes: support for Thread Level Parallelism (TLP) in Multi-Core Architectures, Low-Power Adaptive Computers, and Compiler optimizations for cache/memory performance. We propose to use a new execution model for TLP, named Decoupled Thread Execution (DTA), in order to decouple memory access from thread execution. Threads are started when all their inputs are available in a local store. The effectiveness of this idea leverages on the high number of available threads in current and future computing systems and allows for a predictable execution time, which is very important in Real Time Systems. Moreover, such solution allows for high parallelization and memory latency hiding. Scalability up to hundreds of cores has been verified for both Parallel (SPLASH2) and Embedded System (MiBench) benchmarks. Classical solutions for Multiprocessor Systems have to be updated to the new Multi-Cores chips where unnecessary (Passive) sharing causes unfeasibility of old coherence protocol. Our proposal, Passive Sharing Copy Removal (PSCR) achieves MESI performance and much higher scalability. Another proposal regards adaptive behavior of caches in order to switch-off unused cells that account for a large amount of power consumption in 70nm and future technologies. Adaptivity can be shaped after locality of programs. Compiler optimizations for cache/memory performance: even in traditional simpler designs, applications tend to achieve a scarce usage of the hardware resources of the processor. In particular, our studies on cache hierarchies highlighted that, typically, an application is “served” by less than 25% of the available cache: only 1/4 of the transistors, chip area, power consumption could be enough to deliver to the processor the same memory average access time. We have studied specific optimizations at compiler level, guided by profile information from application execution, which can dramatically increase cache efficiency over 90-95% even with small and simple caches. This allows achieving the same performance with reduced (less complex, less power hungry) hardware or, at given hardware, reaching high performance. On topics related to compiler optimizations, we are collaborating with various European Institutions: Universities of Edinburgh, Ghent and INRIA. Our group co-organizes a yearly international workshop on memory performance issues (MEDEA) and a yearly special issue of ACM Computer Architecture News on the topic. Moreover, the group members: have organized a number of special issues of journals (e.g. Journal of Embedded Systems) on computer architecture topics. Sandro Bartolini is an associate editor of the EURASIP Journal on Embedded Systems.