Graph applications on Vortex

This was a part of an SRC-funded exploration of RISCV-based FPGA Overlays at the CHIPS lab, PES. I worked on application-specific modifications to a RISC-V GPU called Vortex.

Establish optimal hardware configuration

Problem definition

✔ Understanding traditional GPUs and how Vortex is similar/different aka the execution model

✔ Understanding how the applications (here graph applns) map onto the programming model

✔ Finding a method to evaluate a speed-up or slow-down -

Solutions

  1. GPU primer + Introduction to Vortex

  2. Vortex on Intel Devcloud - Documentation

  3. OpenCL programming

  4. Baseline analysis and results

Application-specific hardware

Problem definition

✔ Application specific optimizations - adding a software intrinsic

✖ Hardware modifications

Solutions

  1. Switching to non-opencl code on vortex - Documentation

  2. Non-OpenCL Rodinia BFS code

  3. Kernel dump analyzer script - Static trace analysis showed dominance of memory operations sw and lw

  4. Steps for adding an intrinsic to Vortex

  5. Used NVIDIA Nsight on an RTX A1000 GPU for performance analysis

  6. A potential solution: Implementing a gang-scheduler described in A variable warp size architecture. It would need to use the PC from the IPDOM stack to group and ungroup warps and it might be possible to make the change to the warp scheduler itself.