Research Work

I have led and contributed to more than 20 research projects spanning high-performance computing (HPC), graph neural network (GNN) systems, and natural language processing (NLP), with work ranging from system-level optimization to the design of advanced model architectures. Many of these projects began as course projects I initiated and later evolved into posters and publications at venues such as SC and MLSys.

Below is a selection of my research projects in Systems and GNN. For my other projects, please see this link .

Systems and Graph Neural Networks

Enabling Model Parallelism for Graph Neural Networks with a Scalable Sparse-Dense Matrix Multiplication Operator

PyTorch

SpMM

Distributed

Differentiable

A new distributed, differentiable PyTorch sparse–dense matrix operator that enables scalable model-parallel training of very large Graph Neural Networks.

November 2025

Differentiable GPU kernel autotuner with Transfer Learning

Autotuner

CUDA

Kernel

Differentiable

vLLM

Transfer Learning

Amazon Internship

PyTorch

SciPy

GPU

Developed a robust, end-to-end differentiable GPU kernel autotuner for vLLM that requires very little (n<1000) ground truth for tuning.

August 2025

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Knowledge Graph

SpMM

Publication

Paper

CPU

GPU

PyTorch

Distributed

DDP

FSDP

We expressed and reformulated 10 KG embedding models using Sparse-dense matrix mutliplication speeding up the training for CPU and GPU while making them significantly memory efficient.

May 2025

Predicting Interactions in the Weapons of Mass Destruction Knowledge Graphs

Knowledge Graph

Publication

Book Chapter

Collaboration

Graph Databases (Neo4j)

An applied Knowledge Graph Embedding (KGE) project where I developed the Neo4j interface to facilitate efficient graph data handling and support the training of KGE models.

December 2024

A Sparse Approach for Translation-based Training of Knowledge Graph Embeddings

TransE

Knowledge Graph

SpMM

Sparse Linear Algebra

SC24 Best poster finalist. This work accelerates knowledge-graph embedding training by replacing traditional scatter/gather operations with sparse–dense matrix multiplication, reducing memory usage and achieving significant CPU, GPU, and multi-GPU speedups.

November 2024

iSpLib: A library for accelerating graph neural networks using auto-tuned sparse operations

PyTorch

GNN

C++

Autotuner

Code-generator

CPU

Sparse Linear Algebra

Publication

Paper

Kernel

An auto-tuned Sparse Matrix-multiplcation Library for GNN training and inference.

April 2024

A Spatio-Temporal Link Prediction Pipeline using GC-LSTM for Dynamic Graphs in PyTorch

Course Project

GNN

LSTM

Spatio-Temporal Graph

PyTorch

Protein-Protein interaction

Dynamic Graph

PyTorch Temporal

GC-LSTM

Neural Architecture Design

Link Prediction

Collaboration

Developed a spatio-temporal link prediction pipeline using GC-LSTM for dynamic graphs in PyTorch. Achieved over 75% Hits@100 accuracy for the protein-protein interaction graph sequences of DDPIN dataset.

December 2023

Investigating Spatial-Temporal and Knowledge Graph Machine Learning Algorithms for Dominant Kernels & Potential Scope of Speedup

Spatio-Temporal Graph

Knowledge Graph

Profiling

CPU

Kernel

Course Project

Collaboration

This project aims to identify the functions responsible for the long training times in Spatio-Temporal Graph Neural Networks and Knowledge Graph Embedding algorithms, comparing their frequency to optimize performance for larger graphs or real-time analysis.

December 2023

Python Interface of FastGraph (an OpenMP-based sparse-matrix library)

Pybind11

OpenMP

Zero-copy

Sparse Linear Algebra

Developed a PyBind11 interface for FastGraph, an OpenMP-based C++ parallel sparse-matrix library designed as a high-performance alternative to NetworkX.

April 2023