Md Saidul Hoque Anik
  • About
  • Research
  • News
  • Teaching
  • More
    • Leadership
    • Software
    • Publications
Categories
Amazon Internship
Autotuner
Book Chapter
C++
Code-generator
Collaboration
Concurrency
COO
Course Project
CPU
CPython
CSC
CSR
CUDA
DDP
Differentiable
Distributed
Dynamic Graph
FSDP
GC-LSTM
GNN
GPU
Graph Databases (Neo4j)
IronPython
Jython
Kernel
Knowledge Graph
LibTorch
Link Prediction
LSTM
Neural Architecture Design
OpenMP
Paper
Profiling
Protein-Protein interaction
Publication
Pybind11
Python
PyTorch
PyTorch Temporal
SciPy
Sparse Linear Algebra
Spatio-Temporal Graph
SpMM
TransE
Transfer Learning
vLLM
Zero-copy

Research Work

I have led and contributed to more than 20 research projects spanning high-performance computing (HPC), graph neural network (GNN) systems, and natural language processing (NLP), with work ranging from system-level optimization to the design of advanced model architectures. Many of these projects began as course projects I initiated and later evolved into posters and publications at venues such as SC and MLSys.

Below is a selection of my research projects in Systems and GNN. For my other projects, please see this link .

Systems and Graph Neural Networks

Enabling Model Parallelism for Graph Neural Networks with a Scalable Sparse-Dense Matrix Multiplication Operator

PyTorch
SpMM
Distributed
Differentiable

A new distributed, differentiable PyTorch sparse–dense matrix operator that enables scalable model-parallel training of very large Graph Neural Networks.

November 2025

Differentiable GPU kernel autotuner with Transfer Learning

Autotuner
CUDA
Kernel
Differentiable
vLLM
Transfer Learning
Amazon Internship
PyTorch
SciPy
GPU

Developed a robust, end-to-end differentiable GPU kernel autotuner for vLLM that requires very little (n<1000) ground truth for tuning.

August 2025

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Knowledge Graph
SpMM
Publication
Paper
CPU
GPU
PyTorch
Distributed
DDP
FSDP

We expressed and reformulated 10 KG embedding models using Sparse-dense matrix mutliplication speeding up the training for CPU and GPU while making them significantly memory efficient.

May 2025

Predicting Interactions in the Weapons of Mass Destruction Knowledge Graphs

Knowledge Graph
Publication
Book Chapter
Collaboration
Graph Databases (Neo4j)

An applied Knowledge Graph Embedding (KGE) project where I developed the Neo4j interface to facilitate efficient graph data handling and support the training of KGE models.

December 2024

A Sparse Approach for Translation-based Training of Knowledge Graph Embeddings

TransE
Knowledge Graph
SpMM
Sparse Linear Algebra

SC24 Best poster finalist. This work accelerates knowledge-graph embedding training by replacing traditional scatter/gather operations with sparse–dense matrix multiplication, reducing memory usage and achieving significant CPU, GPU, and multi-GPU speedups.

November 2024

iSpLib: A library for accelerating graph neural networks using auto-tuned sparse operations

PyTorch
GNN
C++
Autotuner
Code-generator
CPU
Sparse Linear Algebra
Publication
Paper
Kernel

An auto-tuned Sparse Matrix-multiplcation Library for GNN training and inference.

April 2024

A Spatio-Temporal Link Prediction Pipeline using GC-LSTM for Dynamic Graphs in PyTorch

Course Project
GNN
LSTM
Spatio-Temporal Graph
PyTorch
Protein-Protein interaction
Dynamic Graph
PyTorch Temporal
GC-LSTM
Neural Architecture Design
Link Prediction
Collaboration

Developed a spatio-temporal link prediction pipeline using GC-LSTM for dynamic graphs in PyTorch. Achieved over 75% Hits@100 accuracy for the protein-protein interaction graph sequences of DDPIN dataset.

December 2023

Investigating Spatial-Temporal and Knowledge Graph Machine Learning Algorithms for Dominant Kernels & Potential Scope of Speedup

Spatio-Temporal Graph
Knowledge Graph
Profiling
CPU
Kernel
Course Project
Collaboration

This project aims to identify the functions responsible for the long training times in Spatio-Temporal Graph Neural Networks and Knowledge Graph Embedding algorithms, comparing their frequency to optimize performance for larger graphs or real-time analysis.

December 2023

Python Interface of FastGraph (an OpenMP-based sparse-matrix library)

Pybind11
OpenMP
Zero-copy
Sparse Linear Algebra

Developed a PyBind11 interface for FastGraph, an OpenMP-based C++ parallel sparse-matrix library designed as a high-performance alternative to NetworkX.

April 2023

Accelerating Graph Machine Learning using Auto-tuned Sparse Primitives for GPU

SpMM
Code-generator
CUDA
GPU
PyTorch
C++
LibTorch
Autotuner
Course Project
Collaboration

Developed a GPU-compatible auto-tuner for a sparse-dense matrix multiplication (SpMM) library, enabling GPU execution of its kernels.

April 2023

A C++ Library for Sparse Matrix Data Structure

Sparse Linear Algebra
COO
CSC
CSR
C++
Course Project

A toy project to understand how matrix multiplication can become efficient by using sparse matrix data structures (COO/CSR/CSC).

December 2022

Concurrency Study of Python 2.7

Python
Concurrency
IronPython
CPython
Jython
Profiling
Course Project

A comparative study examining how multi-threaded programs perform when written in different Python implementations.

August 2017
No matching items