Differentiable GPU kernel autotuner

Autotuner
CUDA
Kernel
Differentiable
vLLM
Transfer Learning
Amazon Internship
PyTorch
SciPy
GPU
[Amazon AWS Internship 2025] I developed a robust, end-to-end GPU kernel autotuner that achieves significantly higher accuracy with little ground-truth data and enables transfer learning reducing kernel tuning time from days to hours.
Published

August 2025

NoteNote

Amazon AWS Internship Summer 2025

Amazon intern workshop day! James Basa on the right.

Amazon intern workshop day! James Basa on the right.

Overview

Developed a robust, end-to-end differentiable GPU kernel autotuner LLM inference.