Differentiable GPU kernel autotuner
Autotuner
CUDA
Kernel
Differentiable
vLLM
Transfer Learning
Amazon Internship
PyTorch
SciPy
GPU
[Amazon AWS Internship 2025] I developed a robust, end-to-end GPU kernel autotuner that achieves significantly higher accuracy with little ground-truth data and enables transfer learning reducing kernel tuning time from days to hours.
Overview
Developed a robust, end-to-end differentiable GPU kernel autotuner LLM inference.
