HMCSim


hmcsim-logo

Introduction

This page is dedicated to those interested in building and using the HMC simulation framework as developed in conjunction with the GC64 architecture research effort. This guide assumes basic knowledge of Unix/Linux command line prompts, building software in the aforementioned environment and the HMC devices themselves. If you’re looking for an introduction into the HMC device, we would suggest reading our publication from LSPP 2014.

Updates

  • 08/31/2017: Merge HMCSim-3.0 Dev to Master
    • HMCSim-3.0 developed branch merged into master
  • 03/17/2017: HMCSim-3.0 Development Progress
    • Added visualization backend to directly output Tecplot visualization files to ingest into 3D viz apps (eg, VisIt).  See the tutorial here.
    • Added power/thermal tracing tutorial here.
    • Added simplified API: tutorial here
    • Additional updates to test scripts
    • Additional tests for simplified API
  • 12/22/2016: HMCSim-3.0 Development Progress
    • Added basic thermal model alongside the power modeling
    • Fixed numerous bugs in handling 4link versus 8link device architectures
    • Added basic DRAM timing model that is configurable by the user
    • Added interface to read sub-component power configuration from a text file
    • Updated the test scripts
  • 11/08/2016: HMCSim-3.0 Development Progress
    • Several bug fixes in packet handling
    • Added power measurement tracing and initialization parameters
    • Added thermal measurement tracing
    • CMC library interfaces now also support power measurement tracing
  • 10/05/2016: HMCSim-2.0 Final Release
    • Several bug fixes in handling error codes in Gen2 packets
    • Several bug fixes in handling CMC commands
    • Added additional sample tests for HMC Mutex CMC commands
  • 04/01/2016: HMCSim-2.0 BETA
    • Several bug fixes in encoding/decoding Gen2 packets
    • Several bug fixes in resolving pre-loaded CMC libraries
    • Adding fullempty/tag-bit CMC library implementations
    • Adding spin-wait and static tree barrier tests for tag-bit CMC operations
  • 01/03/2016: HMCSim-2.0 ALPHA Update 2
    • Numerous fixes to the packet decoding logic (especially on response packets)
    • Fixes to command response codes for P2ADD8
    • Adding mutex examples for CMC
  • 12/28/2015: HMCSim-2.0 ALPHA Release!
    • Adds HMC Gen2.0 device layout and packet format support
    • Adds HMC Gen2.0 256 byte commands
    • Adds HMC Gen2.0 Atomic operation support
    • Adds “Custom Memory Cube” support (AKA, CMC support)
    • WARNING: This is an alpha release and is very volatile.  It is known to build and pass several of the included tests.  Extensive testing is ongoing.
    • User documentation on adding new CMC operations.
    • Submit bugs here: https://github.com/tactcomplabs/gc64-hmcsim/issues
  • 10/29/2015: Merged in patches from Hyesoon Kim at GeorgiaTech.  The patches include the following updates/bug fixes (Full Diff):
    • Fixes to the bank configuration (HMC_MAX_BANKS)
    • Fixes to the bank conflict resolution (better represents real HMC devices)
    • Fixes to the vault decoding in the hmc_util source
    • Fixes to the address resolution in clocking the sim
  • 10/28/2015: Branched the initial HMC-Sim version 1.0 in order to continue development toward the 2.0 specification.  Please email the development team for feature requests!  (Branch Source)

Releases

HMC-Sim Overview

The HMC simulation framework (or HMC-Sim) is designed to provide a very low-level simulation environment for any of the supported HMC 1.0 and 2.0 configurations. The 2.0 support is new and quite volatile.  The simulation environment operates on individual clock boundaries in order to provide sample flow control between registered device buffers, FIFOs and link boundaries. In this manner, the tracing output can display individual stall events on minute hardware components. HMC-Sim also supports the basic ability to chain multiple HMC devices together and instantiate the HMC routing protocol using multiple CUB id’s. At this time, HMC-Sim supports all the current 1.0 packet types, including the JTAG interfaces.

The simulation framework is implemented as a standard C library with an associated API. It is designed to be embedded in larger simulation frameworks (such as the GC64 simulator), or operated using driver applications. The sample applications provided with the source code are examples of the latter. They essentially instantiate basic HMC packets for typical memory I/O (reads, writes, etc).

Retrieving the Source (3.0 Release)

Given that the HMC-Sim source is built as a stand-alone library, it is not necessary to retrieve the entire GC64 source tree. Checking out the code requires the use of Git. Note that the 3.0 version is currently master in the repository.  The code can be checked out read-only using the following command:

$> git clone https://github.com/tactcomplabs/gc64-hmcsim.git

Retrieving the Source (2.0 Release)

Given that the HMC-Sim source is built as a stand-alone library, it is not necessary to retrieve the entire GC64 source tree. Checking out the code requires the use of Git. The code can be checked out read-only using the following command:

$> git clone https://github.com/tactcomplabs/gc64-hmcsim.git

Retrieving the Source (1.0 Release)

Given that the HMC-Sim source is built as a stand-alone library, it is not necessary to retrieve the entire GC64 source tree. Checking out the code requires the use of Git. The code can be checked out read-only using the following command:

$> git clone -b hmcsim-1.0 https://github.com/tactcomplabs/gc64-hmcsim.git

Tested Platforms

  • Mac OSX 13.4.0: gcc 4.2.1; clang 3.3
  • OpenSuSE (linux) 13.1; kernel 3.11.10-25-desktop; gcc 4.8.1

Building the Source

Once you have checked out the source, you will need to build it. By default, the source is built with the GCC compiler. The source is known to also build with the LLVM/Clang compiler as well. The project source tree contains all the necessary makefiles to build the source. If you have “gcc” in your default PATH environment variable, you may build the source as follows:

$> cd gc64-hmcsim
$> make

This should build “libhmcsim.a”.

If you desire to modify the build environment (such as modifying the compiler, the compiler options, etc), you’ll need to edit the “Makefile.inc” file. This contains all the configurable build options. If you seek to simply test the library, you shouldn’t need to modify any of these options.

Building the Sample Code

The source code contains a number of sample driver applications that represent the following memory I/O scenarios:

  • Simple : a simple test driver that can be used as an example
  • Stream : a test driver that mimics the Stream Triad memory I/O pattern
  • GUPS : a test driver that mimics the HPCC RandomAccess I/O pattern
  • HMC_Physrand : a test driver that executes a simple randomized I/O pattern
  • Decode_Physrand : the same test as HMC_Physrand, but exhibits how to decode response packets
  • Several other test directories exist, but are not yet completed

The tests can be built using the top-level makefile as follows (from the gc64-hmcsim directory):

$> make test

Executing the Tests

Each of the test drivers may accept special options depending upon the complexity of the respective code. Each of the tests build an executable in the respective source directory. For example, the “simple” executable is built and located in:

$> ~/gc64-hmcsim/test/simple/

Each of the executables accepts the standard “-h” option that displays the known command line options. For example, executing “sample -h” displays the following:

$> ./simple -h
usage : ./simple -bcdhlnqvx
 -b <num_banks>
 -c <capacity>
 -d <num_drams>
 -h ...print help
 -l <num_links>
 -n <num_devs>
 -q <queue_depth>
 -v <num_vaults>
 -x <xbar_depth>

Notice that the options specify the device parameters associated with one or more HMC devices. Remember, the HMC-Sim internal API logic will perform the necessary checks to ensure that the requested hardware configuration is valid. As such, if you specify an invalid configuration, the application will likely fail. An example of successful test options resembles the following:

$> ./simple -b 16 -c 4 -d 20 -l 4 -n 1 -q 64 -v 32 -x 128
SUCCESS : INITIALIZED HMCSIM
SUCCESS : SET MAXIMUM BLOCK SIZE
SUCCESS : FREE'D HMCSIM

A second example would be to execute the “hmc_physrand” test application. This application test includes a series of driver scripts that drive common configurations. These scripts were also utilized in the tests for the aforementioned LSPP ’14 paper. These scripts are located in:

~/gc64-hmcsim/test/hmc_physrand/scripts/

The “physrand” executable is located at:

~/gc64-hmcsim/test/hmc_physrand/physrand

A simple example of executing an hmc-physrand test would resemble the following:

$> cd hmc_sim/test/hmc_physrand/scripts/
 $> sh ./smalltest.sh
 SUCCESS : INITALIZED HMCSIM
 SUCCESS : INITIALIZED LINK 0
 SUCCESS : INITIALIZED LINK 1
 SUCCESS : INITIALIZED LINK 2
 SUCCESS : INITIALIZED LINK 3
 SUCCESS : INITALIZED MAX BLOCK SIZE
 SUCCESS : INITIALIZED TRACE HANDLERS
 SUCCESS : ZERO'D PACKETS
 SUCCESS : BEGINNING TEST EXECUTION
 ....sending packets
 ...building read request for device : 0
 ...sending packet : base addr=0x000000038b8c9000
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building write request for device : 0
 ...sending packet : base addr=0x0000001eb9856800
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building write request for device : 0
 ...sending packet : base addr=0x00000005e58f0400
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building write request for device : 0
 ...sending packet : base addr=0x000000021052a200
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building write request for device : 0
 ...sending packet : base addr=0x0000001da11b2600
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building read request for device : 0
 ...sending packet : base addr=0x0000001bfd8bce00
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building read request for device : 0
 ...sending packet : base addr=0x00000002e602a000
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building write request for device : 0
 ...sending packet : base addr=0x0000000ab6ba5600
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building read request for device : 0
 ...sending packet : base addr=0x0000000283ab3e00
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...building write request for device : 0
 ...sending packet : base addr=0x000000125fe7a000
 SUCCESS : PACKET WAS SUCCESSFULLY SENT
 ...reading responses
 STALLED : STALLED IN RECEIVING
 SIGNALING HMCSIM TO CLOCK
 ALL_SENT = 10
 ALL_RECV = 0
 ...reading responses
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 STALLED : STALLED IN RECEIVING
 SIGNALING HMCSIM TO CLOCK
 ALL_SENT = 10
 ALL_RECV = 9
 ...reading responses
 SUCCESS : RECEIVED A SUCCESSFUL PACKET RESPONSE
 STALLED : STALLED IN RECEIVING
 SIGNALING HMCSIM TO CLOCK
 ALL_SENT = 10
 ALL_RECV = 10
 SUCCESS : FREE'D HMCSIM

You’ll notice that the directory now contains an HMC-Sim trace file as well. The file (physrand.out) should resemble the following:

 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:0:1:2:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:0:2:2:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:1:0:0:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:1:1:2:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:1:2:0:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:2:0:0:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:2:1:0:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:3:0:2:0
 HMCSIM_TRACE : 0 : XBAR_LATENCY : 0:3:1:2:0
 HMCSIM_TRACE : 0 : BANK_CONFLICT : 0:0:0:0:0x00000002e602a000
 HMCSIM_TRACE : 0 : RD64 : 0:0:0:4:0x000000038b8c9000:1
 HMCSIM_TRACE : 0 : WR64 : 0:0:0:2:0x00000002b9856800:5
 HMCSIM_TRACE : 0 : WR64 : 0:0:0:0:0x000000025fe7a000:5
 HMCSIM_TRACE : 0 : WR64 : 0:0:0:1:0x00000001e58f0400:5
 HMCSIM_TRACE : 0 : WR64 : 0:2:0:1:0x00000001a11b2600:5
 HMCSIM_TRACE : 0 : RD64 : 0:2:0:7:0x0000000283ab3e00:1
 HMCSIM_TRACE : 0 : RD64 : 0:2:0:3:0x00000003fd8bce00:1
 HMCSIM_TRACE : 0 : WR64 : 0:2:0:0:0x000000021052a200:5
 HMCSIM_TRACE : 0 : WR64 : 0:2:0:5:0x00000002b6ba5600:5
 HMCSIM_TRACE : 1 : RD64 : 0:0:0:0:0x00000002e602a000:1

!WARNING! Executing the other test scripts may generate very large trace files (GB’s) and require significant CPU time. This is also true for the other test drivers as well (Gups, Stream, etc).

Interpreting the Trace Files

The HMC-Sim built-in tracing mechanisms can generate rather large and rather verbose trace files. This is very deliberate in order to expose all the necessary internal data held within the framework. As such, we provide a tool that may translate the trace files to GNUPlot data scripts. This tool (hmctognuplot) is located in the following directory:

~/hmc_sim/tools/

It can be built from the top-level hmc_sim directory using the following command:

$> make tools

It can be executed using the following:

$> ./tools/hmctognuplot -F /path/to/tracefile.out

The tool will produce gnuplot output files for each of [bank_conflict, rd64, wr64, xbar_latency, xbar_rqst_stall}.out.

Alternatively, you can interpret the HMC-Sim trace files manually. The trace files are formatted using the following specification:

HMCSIM_TRACE CLOCK_TICK XBAR_LATENCY device link quad vault a request was entered on a link that was not co-located with the destination quad
HMCSIM_TRACE CLOCK_TICK BANK_CONFLICT device quad vault bank address a bank conflict occurred on the target address
HMCSIM_TRACE CLOCK_TICK RDsize device quad vault bank address packet_size_in_flits read memory request
HMCSIM_TRACE CLOCK_TICK WRsize device quad vault bank address packet_size_in_flits write memory request
HMCSIM_TRACE CLOCK_TICK XBAR_RQST_STALL device quad vault slot Crossbar request stall
HMCSIM_TRACE CLOCK_TICK VAULT_RQST_STALL device quad vault slot Vault fifo/queue request stall
HMCSIM_TRACE CLOCK_TICK XBAR_RSP_STALL device quad vault slot Crossbar response queue/fifo stall
HMCSIM_TRACE CLOCK_TICK ROUTE_RQST_STALL device src dest link slot Route request stall
HMCSIM_TRACE CLOCK_TICK ROUTE_RSP_STALL device src dest link slot Route response stall
HMCSIM_TRACE CLOCK_TICK UNDEF_STALL device quad vault slot undefined event

In HMC-Sim version 3.0 and beyond, we have also added support for basic power measurement and thermal tracing.  The power measurement tracing is tracked per device component and tabulated on each clock cycle.  The tracing output records the individual values from each device component as well as the total power utilized over the given time period.  The total power values can be cleared using the new hmcsim_power_clear() function.  The individual device power metrics can be configured using the hmcsim_power_config() functions manually or by reading a configuration file using the hmcsim_read_config() functions.  The thermal measurement trace data is calculated by converting the power for the given device component on the given clock cycle (measured in milliwatts) to BTU’s.  All these function interfaces are only found in HMC-Sim 3.0+.

The power and thermal trace data can be manually interpreted as follows:

HMCSIM_TRACE CLOCK_TICK VAULT_RSP_SLOT_POWER device quad vault slot Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK VAULT_RSP_SLOT_BTU device quad vault slot BTU’s (float)
HMCSIM_TRACE CLOCK_TICK VAULT_RQST_SLOT_POWER device quad vault slot Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK VAULT_RQST_SLOT_BTU device quad vault slot BTU’s (float)
HMCSIM_TRACE CLOCK_TICK ROW_ACCESS_POWER address  Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK ROW_ACCESS_BTU address  BTU’s (float)
HMCSIM_TRACE CLOCK_TICK XBAR_RSP_SLOT_POWER device link slot Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK XBAR_RSP_SLOT_BTU device link slot BTU’s (float)
HMCSIM_TRACE CLOCK_TICK XBAR_RQST_SLOT_POWER device link slot Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK XBAR_RQST_SLOT_BTU device link slot BTU’s (float)
HMCSIM_TRACE CLOCK_TICK VAULT_CTRL_POWER vault  Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK VAULT_CTRL_BTU vault  BTU’s (float)
HMCSIM_TRACE CLOCK_TICK XBAR_ROUTE_EXTERN_POWER srcdev srclink srcslot destdev destlink destslot Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK XBAR_ROUTE_EXTERN_BTU srcdev srclink srcslot destdev destlink destslot BTU’s (float)
HMCSIM_TRACE CLOCK_TICK LINK_LOCAL_ROUTE_POWER device link quad vault slot Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK LINK_LOCAL_ROUTE_BTU device link quad vault slot BTU’s (float)
HMCSIM_TRACE CLOCK_TICK LINK_REMOTE_ROUTE_POWER device link quad vault slot Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK LINK_REMOTE_ROUTE_BTU device link quad vault slot BTU’s (float)
HMCSIM_TRACE CLOCK_TICK LINK_PHY_POWER device link Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK LINK_PHY_BTU device link BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_LINK_PHY_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_LINK_PHY_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_LINK_LOCAL_ROUTE_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_LINK_LOCAL_ROUTE_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_LINK_REMOTE_ROUTE_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_LINK_REMOTE_ROUTE_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_XBAR_RQST_SLOT_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_XBAR_RQST_SLOT_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_XBAR_RSP_SLOT_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_XBAR_RSP_SLOT_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_XBAR_ROUTE_EXTERN_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_XBAR_ROUTE_EXTERN_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_VAULT_RQST_SLOT_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_VAULT_RQST_SLOT_POWER BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_VAULT_RSP_SLOT_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_VAULT_RSP_SLOT_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_VAULT_CTRL_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_VAULT_CTRL_BTU BTU’s (float)
HMCSIM_TRACE CLOCK_TICK T_ROW_ACCESS_POWER Milliwatts (float)
HMCSIM_TRACE CLOCK_TICK T_ROW_ACCESS_BTU BTU’s (float)

References