Graph applications on Vortex

This was a part of an SRC-funded exploration of RISCV-based FPGA Overlays at the CHIPS lab, PES. I worked on application-specific modifications to a RISC-V GPU called Vortex.

Establish optimal hardware configuration

✔ Understanding traditional GPUs and how Vortex is similar/different aka the execution model

✔ Understanding how the applications (here graph applns) map onto the programming model

✔ Finding a method to evaluate a speed-up or slow-down -

✔ Application specific optimizations - adding a software intrinsic

This requires switching to vx code from Opencl, code optimization, and trace analysis

✖ Hardware modifications

Some options would be adding intrinsics based on application analysis, extending the vortex pipeline, or modifying the warp scheduler

Switching to non-opencl code on vortex - Documentation
Non-OpenCL Rodinia BFS code
Kernel dump analyzer script - Static trace analysis showed dominance of memory operations sw and lw
Steps for adding an intrinsic to Vortex
Used NVIDIA Nsight on an RTX A1000 GPU for performance analysis
A potential solution: Implementing a gang-scheduler described in A variable warp size architecture. It would need to use the PC from the IPDOM stack to group and ungroup warps and it might be possible to make the change to the warp scheduler itself.

Shreenithi. Last modified: April 30, 2023. Website built with Franklin.jl and the Julia programming language. This work is licensed under a Creative Commons Attribution 4.0 International License.