LeetGPU Solutions and Notes

Notes and solutions in PyTorch, Triton, and CUDA. Runtime shown for T4 GPU.

LeetGPU-5: Matrix Addition
Jan 04, 2026 Implement a program that performs element-wise addition of two matrices containing 32-bit floating point numbers on a GPU. The program should take two input matrices of equal dimensions and produce a single output matrix containing their element-wise sum.
LeetGPU-3: Matrix Transpose
Jan 03, 2026 Write a program that transposes a matrix of 32-bit floating point numbers on a GPU. The transpose of a matrix switches its rows and columns. Given a matrix of dimensions , the transpose will have dimensions . All matrices are stored in row-major format.
LeetGPU-12: Simple Inference
Dec 28, 2025 Run inference on a PyTorch model. Given an input tensor and a trained torch.nn.Linear model, compute the forward pass and store the result in the output tensor.
LeetGPU-2: Matrix Multiplication
Dec 17, 2025 Write a program that multiplies two matrices of 32-bit floating point numbers on a GPU. Given matrix A of dimensions \(M \times N\) and matrix B of dimensions N x K, compute the product matrix C, which will have dimensions MxK. All matrices are stored in row-major format.
LeetGPU-1: Vector Addition
Dec 14, 2025 Implement a program that performs element-wise addition of two vectors containing 32-bit floating point numbers on a GPU. The program should take two input vectors of equal length and produce a single output vector containing their sum.