Publications

Tutorials

For tutorials click here.

 

Papers

The papers are provided for personal use and are subject to copyright of the publishers.

2014

  • J. Wang, J. Beu, R. Bheda, T. Conte, Z. Dong, C. Kersey, M. Rasquinha, G. Riley, W. Song, H. Xiao, P. Xu, and S. Yalamanchili, “Manifold: A Parallel Simulation Framework for Multicore Systems”, Technical Report GIT-CERCS-14-07, Georgia Tech, 2014. [pdf]
  • J. Wang, J. Beu, R. Bheda, T. Conte, Z. Dong, C. Kersey, M. Rasquinha, G. Riley, W. Song, H. Xiao, P. Xu, and S. Yalamanchili, “Manifold: A Parallel Simulation Framework for Multicore Systems”, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2014.  [pdf] [abstract]
    This paper presents Manifold, an open-source parallel simulation framework for multicore architectures. Unlike traditional computer architecture simulators, Manifold is, first and foremost, a component-based software framework. It consists of a parallel simulation kernel, a set of microarchitecture components, and an integrated library of power, thermal, reliability, and energy models. Using the components as building blocks, users can assemble multicore architecture simulation models and perform serial or parallel simulations to study the architectural and/or the physical characteristics of the models. Users can also create new components for Manifold or port existing models. Importantly, Manifold’s component-based design provides the user with the ability to easily replace a component with another for efficient explorations of the design space. It also allows components to evolve independently and making it easy for simulators to incorporate new components as they become available. The distinguishing features of Manifold include i) transparent parallel execution, ii) integration of power, thermal, reliability, and energy models, iii) full system simulation, e.g., operating system and system binaries, and iv) component-based design. In this paper we provide a description of the software architecture of Manifold, and its main elements – a parallel multicore emulator front-end and a parallel component-based back-end timing model. We describe a few simulators that are built with Manifold components to illustrate its flexibility, and present test results of the scalability obtained on full-system simulation of coherent shared-memory multicore models with 16, 32, and 64 cores executing PARSEC and SPLASH-2 benchmarks.
  • Z. Dong, J. Wang, G. Riley, and S. Yalamanchili, “An Efficient Front-End for Timing-Directed Parallel Simulation of Multi-Core System”, The 7th International ICST Conference on Simulation Tools and Techniques (SIMUTools 2014), March 2014. [pdf]

2013

  • Z. Dong, J. Wang, G. Riley, and S. Yalamanchili, “A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems”, IEEE 21st International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’13), August 2013.  [abstract]
    There has been little research that studies the effect of partitioning on parallel simulation of multicore systems. This paper presents our study of this important problem in the context of Null-message-based synchronization algorithms for parallel multicore simulation. This paper focuses on coarse grain parallel simulation where each core and its cache slices are modeled within a single logical process (LP). It is common to encapsulate the entire on-chip interconnection network into a single logical process due to the fine-grained nature of inter-router communication. However, we show that such an organization is an impediment to scalable simulation. This baseline partitioning and two other schemes are investigated. Experiments are conducted on a subset of the PARSEC benchmarks with 16-, 32-, 64- and 128-core models. Results show that the partitioning scheme has a significant impact on simulation performance and parallel efficiency. Beyond a certain system scale, one scheme consistently outperforms the other two schemes, and the performance as well as efficiency gaps with the baseline partitioning scheme increases as the size of the model increases — with up to 4.1 times faster speed and 277% better efficiency for 128-core models. We explain the reasons for this behavior, which can be traced to the features of the Null-message-based synchronization algorithm. Because of this, we believe that, if a component has increasing number of inter-LP interactions with increasing system size, such components should be partitioned into several sub-components to achieve better performance.
  • J. Wang, Z. Dong, S. Yalamanchili, and G. Riley, “Optimizing Parallel Simulation of Multicore Systems Using Domain-Specific Knowledge”, 2013 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS), May 2013.  [abstract]
This paper presents two optimization techniques for the basic Null-message algorithm in the context of parallel simulation of multicore computer architectures. Unlike the general, application-independent optimization methods, these are application-specific optimizations that make use of system properties of the simulation application. We demonstrate in two aspects that the domain-specific knowledge offers great potential for optimization. First, it allows us to send Null-messages much less eagerly, thus greatly reducing the amount of Null-messages. Second, the internal state of the simulation application allows us to make conservative forecast of future outgoing events. This leads to the creation of an enhanced synchronization algorithm called Forecast Null-message algorithm, which, by combining the forecast from both sides of a link, can greatly improve the simulation look-ahead. Compared with the basic Null-message algorithm, our optimizations greatly reduce the number of Null-messages and increase simulation performance significantly as a result. On a subset of the PARSEC benchmarks, a maximum speedup of about 6 is achieved with 17 LPs.

2012

  • J. Wang, J. Beu, S. Yalamanchili, and T. Conte. “Designing Configurable, Modifiable and Reusable Components for Simulation of Multicore Systems.” 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS12), November 2012.   [pdf] [abstract]
    A simulation system for modern multicore archi- tectures is composed of various component models. For such a system to be useful for research purposes, modifiability is a key quality attribute. Users, when building a simulation model, need to have the capability to adjust various aspects of a component, or even replace a component with another of the same type. Software design considerations can determine whether or not a simulation system is successful in providing such capabilities. This paper presents a few design tactics that we adopt in creating configurable, modifiable, and reusable components for Manifold, our parallel simulation framework for multicore systems. The main example component is MCP-cache, a coherence cache model. The ideas behind the tactics are general enough and should be useful to designers of similar systems.
  • W. Song, S. Mukhopadhyay, and S. Yalamanchili, “Architecture Simulation Framework for 3D ICs,” SRC TECHCON, September 2012. [pdf] [abstract]
    The practical limitation of implementing 3-dimensional integrated circuits is the increased thermal stress. The integration of liquid cooling with 3D IC can solve the thermal problems, allowing up to 1,000W/cm2 of power density [1, 2]. Thus, the significant performance improvement can be made through 3D ICs and liquid cooling integration. However, increased transistor and power density will address failure concerns, threatening the lifetime reliability of microprocessors [5]. Therefore, 3D ICs must be designed under the constraints of microarchitecture, energy, thermal, area, and reliability. There had been many efforts to develop accurate and fast modeling of these 3D IC physical properties. We note that the simulation models of these physical phenomena must be designed to interact with each other to correctly capture the tradeoffs among 3D IC design constraints. In this paper, we present the methodology for 3D IC simulations and present a novel simulation framework for the 3D IC design space exploration.
  • W. Song, S. Mukhopadhyay, and S. Yalamanchili, “Reliability Implications of Power and Thermal Constrained Operation of Asymmetric Multicore Processors,” Dark Silicon Workshop, June 2012.
  • W. Song, S. Mukhopadhyay, A. Rodrigues and S. Yalamanchili, “Instruction-Based Energy Estimation Methodology for Asymmetric Manycore Processor Simulations,” IEEE/ICST International Conference on Simulation Tools and Techniques, March 2012.  [pdf] [abstract]
    Processor power is a complex function of device, packaging, microarchitecture, and application. Typical approaches to power simulation require detailed microarchitecture models to collect the statistical switching activity counts of processor components. In manycore simulations, the detailed core models are the main simulation speed bottleneck. In this paper, we propose an instruction-based energy estimation model for fast and scalable energy simulation. Importantly, in this approach the dynamic energy is modeled as a combination of three contributing factors: physical, microarchitectural, and workload properties. The model easily incorporates variations in physical parameters such as clock frequencies and supply voltages. When compared to commonly used cycle-level microarchitectural simulation approach with SPEC2006 benchmarks, the proposed instruction-based energy model incurred a 2.94% average error rate while achieving an average simulation time speedup of 74X for a 16-core asymmetric x86 ISA processor model with multiple clock domains operating at different frequencies.
  • M. Cho, W. Song, S. Yalamanchili, and S. Mukhopadhyay. “Thermal System Identification(TSI): A Methodology for Post-silicon Characterization and Prediction of the Transient Thermal Field in Multicore Chips.” 28th IEEE SEMI_THERM Symposium, March 2012.  [pdf] [abstract]
    This paper presents a methodology for post-silicon thermal prediction to predict the transient thermal field a multicore package for various workload considering chip-to- chip variations in electrical and thermal properties. We use time-frequency duality to represent thermal system in frequency domain as a low-pass filter augmented with a positive feedback path for leakage-temperature interaction. This thermal system is identified through power/thermal measurements on a packaged IC and is used for post-silicon thermal prediction. The effectiveness of the proposed effort is presented considering a 64 core processor in predictive 22nm node and SPEC2006 benchmark applications.
  • C. Kersey, A. Rodrigues, and S. Yalamanchili. “A Universal Parallel Front-End for Execution-Driven Microarchitecture Simulation.” HIPEAC Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, January 2012. [pdf]  [abstract]
    Execution driven microarchitecture simulators tend to de- vote a large portion of their source code to a front-end that performs instruction set level functional simulation, provid- ing the decoded instruction stream to a back-end that per- forms timing simulation. In this paper we introduce the current incarnation of QSim, a universal front-end for exe- cution driven multicore microarchitecture simulators. QSim adapts the popular and portable QEMU full-system emu- lator to a thread safe, instruction set neutral API, running unmodified application binaries in a lightly modified Linux operating system. QSim has been shown to support at least 512 emulated hardware threads, each running in a separate host thread.

2011

  • M. Cho, W. Song, S. Yalamanchili, and S. Mukhopadhyay, “Modeling of the Thermal Field of Many-Core System using Frequency Domain System Identification,” SRC TECHCON, Sept.2011. [pdf] [abstract]
    This paper presents a frequency domain methodology for fast, potentially on-line, analysis of spatiotemporal thermal field of many-core chips. Our approach models the thermal system as a low pass filter where the time-varying 2D power pattern of cores is the system input and 2D spatiotemporal variation in the temperature is the system output. The proposed approach consists of two steps: (a) frequency domain system identification and (b) frequency domain thermal simulation and analysis using the system transfer function. The accuracy of the proposed approach is verified considering a 64 core processor in predictive 22nm node and using power profiles of SPEC2006 benchmarks.
  • W. Song, M. Cho, S. Yalamanchili, S. Mukhopadhyay, and A. Rodrigues “Energy Introspector: Simulation Infrastructure for Power, Temperature, and Reliability Modeling in Manycore Processors,” SRC TECHCON, Sept. 2011. [pdf]  [abstract]
    This paper presents an architecture- independent modeling infrastructure called the Energy Introspector for estimating non-functional aspects of processors such as energy, power, temperature, area, delay, sensor, and reliability. The Energy Introspector supports processor modeling through the integration of various modeling tools. It features structural abstraction of physical and microarchitectural components, standard library format for compatible integration of modeling tools, and synchronization of multiple phenomena models (e.g., time, power, temperature). With the Energy Introspector interface, this paper suggests an analytical processor power estimation method using the instruction-based energy profiling. The analytical power estimation is approximately 4.13 times faster than the cycle-level microarchitecture simulation approach with 2.5% of average error rate. The speed of prediction is especially important in manycore processor modeling, where the simulation time increases drastically as the number of cores increases.

Citation

If you reference Manifold in your publications, please use the following for citation purposes.

@inproceedings{manifold-wang,
  author=”J. Wang and J. Beu and R. Bheda and T. Conte and Z. Dong and C. Kersey and M. Rasquinha and G. Riley and W. Song and H. Xiao and P. Xu and S. Yalamanchili”,
  booktitle=”2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)”,
  title=”Manifold: A Parallel Simulation Framework for Multicore Systems”,
  year=”2014″,
  month=”March”,
}