In reply to @treeform "hippo also does nim": In this case, - I wanted to avoid compiling through C++ and nvcc because for a library it breaks custom flags, `-flag` has to become `-Xcompiler flag`. - You force Nvidia/AMD compiler for the whole app and all its dependencies instead of just the GPU modules, and dealing with the path differences was also annoying in the past. - The should ease porting to different backends like WebGPU, Vulkan, OpenCL, Metal that have slightly different programming models than Cuda/HIP