The advent of multi-core architectures has increased the popularity of chip multi-processors (CMP) and the use of networks-on-chip (NoCs) as a fabric interconnecting cores in high performance computers. Traditionally, the evaluation of the NoCs design space has been carried out with traces, and a less used alternative being full system simulations. Traces do not capture the message dependencies in real applications which makes replaying a trace less accurate than a full system simulation. While full system simulations provide high accuracy, they are hindered by extremely long run times and limitations in the number of cores. Previous attempts at generating traces with message dependencies involve the generation of traces through full-system simulations which are platform dependent and extremely difficult especially for massive multi-core systems (i.e hundreds of cores).
In this work we present a platform independent, dependency tracked event-based NoC evaluation methodology. Since the events track dependencies between multiple threads, the presented methodology is capable of replaying messages across the network in the correct order which ensures accuracy, while it does not require simulating the functionality of a microprocessor, like full system simulators do. In addition, the presented framework can be scaled easily to evaluate future NoCs for massive multi-core CMPs comprising of hundreds of nodes. The methodology is used to explore the design space in the CMP-NoC co-design process.
Check our Software Release: SynchroTrace
Collaborator(s): Mark Hempstead, Tufts (Computer Architecture)
Design and Automation of Low Swing Clocking
Operating the clock network with low swing is one of the techniques that is explored in order to reduce the power consumption attributed to the clock network of an high-performance architecture. Low-swing operation can be adopted at varying levels of a clock tree with different implications. However, low-swing applicability remains limited in practice due to a number of factors including (i) degradation in the skew performance, (ii) degradation in expected power reduction, (iii) degradation in data timing due to slew degradation, (iv) necessitating level shifters of varying sizes, (v) necessitating low-swing FF designs. Furthermore, the automation of low swing clocking has not been addressed. In this research, the effectiveness of exploiting fully/partially low swing clock trees, the design of custom cell blocks needed for low swing operation and the optimal low swing voltage level determination is studied. The design flow is also targeted to be automated in order to address the different performance, architecture and physical constraints.
Sponsor(s): Semiconductor Research Corporation
Collaborator(s): Emre Salman, SUNY-Stony Brook (Circuits and Systems)
Wireless On-Chip Interconnects
Increasing functionality and complexity in design of integrated circuits (ICs) requires careful planning for on-chip resources such as area and power. Critical design decisions are often given based on the availability of these resources within increasingly stringent design budgets. Among these typical IC design budgets, wire interconnects are one of the most expensive items. Significantly impacting the timing, power and area resources, wire interconnects constitute the complex infrastructure to establish communication and synchronization within a conventional, state-of-the-art IC.
In this project, wireless communication principles are investigated in order to replace the resource-demanding, conventional, wire-based interconnect networks within integrated circuits. By implementing one or many transmitter and receiver antennas on the same chip, wireless communication principles will be used to communicate between distant components within a chip. The proposed on-chip wireless communication implementations bear a constant overhead in area and power budgets in order to implement the antennas and surrounding circuitry. However, the increasing size and complexity of conventional wire interconnects (particularly for heavy-duty global interconnects such as clock and power lines) are mitigated, solving one of the major problems in state-of-the-art IC design process. Wireless communication will provide a solution that is highly scalable into the future for the IC communication challenge, as increases in technology scaling and die size dimensions are forecast by the semiconductor industry.
Also see our article titled "What is Wireless Interconnect?" in the February 2012 edition of the ACM SIGDA newsletter to 3000+ recipients.
Sponsor(s): National Science Foundation (ECCS-1232164), Mosis
Collaborator(s): Kapil Dandekar (Wireless Systems)
Clock Tree/Mesh Synthesis
In this research, the utilization of computing power to improve an essential step of integrated circuit (IC) physical design flow, clock network design, is investigated. Clock network design entails a series of computationally intensive, large-scale design and optimization tasks. Automation for conventional, zero skew, buffered clock trees is common. However, high performance clock tree design remains a tedious task with increasing requirements for higher speed through skew scheduling, variation-awareness and constrained power budgets. The lack or inefficacy of the automation for implementing high performance clock networks, especially for low-power, high speed and variation-aware implementations, is the main driver for this research.
In the traditional integrated circuit design flow, the placement and clock network synthesis stages are performed sequentially. It is desirable to combine the placement and clock network synthesis stages to provide a better physical design. In this project, the integration of placement and clock network synthesis is investigated for the purpose of reducing clock power dissipation. Moreover, various types of novel clock distribution architectures are studied.
Ultra Low-Power Adiabatic Circuit Design
Adiabatic switching provides the preservation of energy by circulating the switching energy back into the circuit. The recirculation of energy has significantly limited the frequency of operation. The frequency of operation is dictated by a synchronizing clock signal called the power-clock, which also acts as the power source for the adiabatic logic. Some adiabatic logic families, however, require multiple phases of the power-clock for pipelined operation (alternatively, logic pipelining can be sacrificed). Also impacting the adaptation of adiabatic logic is the recovery path resistance and its impact on the Q of the LC resonator impeding the quality of synchronization and the power recovery. Consequently, adiabatic circuit families have faced difficulties in being adapted in IC design due to: 1. The low switching frequency of the power-clock signals, 2. The difficulty in logic pipelining, primarily due to the power dissipation required to provide the complex clocking schemes with multiple phases.
In this project, novel synchronous circuit implementation methodologies of adiabatic logic design are explored. This methodology enables unprecedented low power operation through charge recovery on the logic and the power-clock network. Ultimately, this research will resolve the well-known shortcomings of adiabatic logic, such as the operating frequency, and help improve the energy efficiency and applicability of adiabatic logic families.
Ph.D. Student(s): Leo Filippini
Collaborator(s): Emre Salman SUNY-Stony Brook, Milutin Stanaćević SUNY-Stony Brook (Circuits, Systems), Lunal Khuon - Drexel Engineering Technology (RF, analog, and biomedical ICs).
Energy Efficient Computing with OptoElectronics
In order to achieve energy efficient computing for systems ranging from datacenters down to mobile electronics, novel devices, techniques, and methodologies are necessary to reduce the terawatts of power consumed by computational devices. We are proposing an effort to bring together researchers from all levels of the device to systems hierarchy (Devices -> Circuits -> Architecture -> Systems -> Data Center) in a vertically integrated approach addressing the (energy) challenges of future computing devices. Our vision is to build upon novel optoelectronic devices capable of computing a bit while consuming attojoules (10E-18 J) of energy, and progress to energy efficient techniques and methodologies for data centers that consume terawatts of power from the electrical grid. Energy efficient innovations at the circuits, systems/interconnect, architecture, and server/mobile/datacenter platform level have the potential to significantly reduce overall power consumption and address this grand challenge in energy needs. Our team is to leverage the energy efficiency of novel optoelectronic elements, and focus research efforts on reducing the total power consumption of electronic devices through energy efficient techniques and methodologies for IC chips, devices, and ultimately data centers that consume terawatts of power from the electrical grid.
Collaborator(s): Bahram Nabet (Photonics), Ioannis Savidis (Circuits and Systems), Naga Kandasamy (HPC), Lunal Khuon - Drexel Engineering Technology (RF, analog, and biomedical ICs).
GPU System Co-design
Similar to CMP-NoC Design challenges, the co-design of hardware and software on GPU systems is explored. Platform independent dependencies of threads are analyzed on GPUs, leading to the analysis of software and hardware co-design principles.
Sponsor(s): Samsung GRO
Collaborator(s): Mark Hempstead (Computer Architecture)
Resonant Clocking Technologies
Achieving high quality synchronization with low power dissipation is a major objective in synchronous VLSI circuit design at high frequency regimes. In order to meet this objective, conventional clock design methodologies are constantly being improved. Also, next-generation alternatives to conventional clocking have been emerging. Resonant clocking technologies provide operating frequencies and power dissipation levels that are unprecedented in the state-of-the-art, bulk-CMOS VLSI IC implementations. These technologies must be characterized for on chip variations, have robust simulation models and be supported by specific design flows in order to be viable in high volume production. This project addresses such challenges in the design and design automation of resonant clocking technologies for high-volume IC production.
With improved nanoscale design characterization and design automation methodologies, resonant clocking technologies can be seamlessly integrated within the mainstream VLSI IC design flow. The broader impacts of this project are in revolutionizing the clock synchronization methodology of digital VLSI synchronous circuits for low-power, multi-GHz operation and providing its sustainability over semiconductor technology scaling. Proposed low-power, multi-GHz high-performance clocking operation will have a major impact on all microelectronic systems, from field-deployable low power sensors to the world's fastest supercomputers.
Sponsor(s): National Science Foundation (CCF-0845270), ACM SIGDA, Mosis
Clock Skew Scheduling
Integrated circuits design at the sub-micron levels, particularly in the transition to 60 and lower technologies, requires paradigm shifts. In order to achieve high-performance, robust and high-yield production, design and manufacturing techniques are being investigated more carefully. A successful design at a sub-60nm technology can be achieved through employing a combination of design principles. Investigation and improvement of each design principle is important and a contributing factor to prolonging the success of Moore's Law in CMOS based IC design.
In this research, an additional design principle---clock skew scheduling---to aid the design of deep sub-micron IC design is investigated. The performance enhancing effects of clock skew scheduling has been known for over 20 years. Designers employ ad hoc tricks to delay clock signals on timing violated paths to satisfy design budgets. Due to the scalability of the conventional application techniques, however, clock skew scheduling typically cannot be used to its full advantage. The common advantages of skew scheduling are known to be fixing timing violations and improving operating frequencies of circuits. In deep sub-micron design era, skew scheduling can effectively be used to improve timing yield and enable low power design alternatives as well. Provided that the increasing computing power of multi-core systems can be applied to remedy the scalability problem and by reformulating the objectives, clock skew scheduling can be used as an additional design principle to enable high-yield IC design at 45nm and lower technologies.
Ph.D. Student(s): Jianchao Lu (graduated)
Quantum-Dot Cellular Automata (QCA) based Nanoarchitectures
It is expected that the physical barrier in the nanoscale implementation of CMOS devices will soon be reached. The development of next generation computation systems will stem from the exploration of nanoscale materials and biological systems. Properties and applications of several nanoscale technologies, such as Quantum-dot Cellular Automata (QCA) investigated in this work, are being explored intensively. Basic design methods and simulators have been developed to show the potential of QCA technology in meeting future computation needs. What is missing in the current agenda of QCA research are studies on layout optimization and system-level architecture design. The challenge in performing these studies is the necessity to address the high levels of pipelining, parallelism, and fault-tolerance required for high performance operation of QCA systems.
The objective of the proposed research is to investigate fault-tolerant QCA architectures using advanced clocking schemes for practical implementation of QCA-based nanocomputers. Towards this end, essential circuit components for such computers and system-level integration of these components will be investigated. In the project, the emphasis is on novel circuit architectures and clocking schemes to perform computations with this emerging technology. Manufacturing challenges will be addressed to capture the fault-tolerance properties for architecture design.
Ph.D. Student(s): None