This was a part of an SRC-funded exploration of RISCV-based FPGA Overlays at the CHIPS lab, PES. I worked on application-specific modifications to a RISC-V GPU called Vortex.
✔ Understanding traditional GPUs and how Vortex is similar/different aka the execution model
✔ Understanding how the applications (here graph applns) map onto the programming model
✔ Finding a method to evaluate a speed-up or slow-down -
benchmark choice - Rodinia
metric choice - Cycles
baseline stats - Done
Vortex on Intel Devcloud - Documentation
✔ Application specific optimizations - adding a software intrinsic
This requires switching to vx code from Opencl, code optimization, and trace analysis
✖ Hardware modifications
Some options would be adding intrinsics based on application analysis, extending the vortex pipeline, or modifying the warp scheduler
Switching to non-opencl code on vortex - Documentation
Kernel dump analyzer script - Static trace analysis showed dominance of memory operations sw
and lw
Used NVIDIA Nsight on an RTX A1000 GPU for performance analysis
A potential solution: Implementing a gang-scheduler described in A variable warp size architecture. It would need to use the PC from the IPDOM stack to group and ungroup warps and it might be possible to make the change to the warp scheduler itself.