Md Saidul Hoque Anik
  • About
  • Research
  • News
  • Teaching
  • More
    • Leadership
    • Software
    • Publications

Md Saidul Hoque Anik

PhD Candidate at Texas A&M | ML Systems & Performance Optimization | Sparse Kernels for Graph Learning | Dual M.S. (ISE, CSE) — Indiana University Bloomington and BUET | Tenure-Track Faculty (On Sabbatical, UIU)
Github LinkedIn

About Me

I am a PhD candidate in Computer Science & Engineering at Texas A&M University, focusing on high-performance and distributed machine learning systems. I specialize in optimizing sparse linear algebra for efficient ML pipelines. Previously, I interned at Amazon AWS, where I built a differentiable GPU kernel autotuner that reduced LLM kernel-tuning time from days to hours using transfer learning.

I can:

  • Build custom PyTorch GNN training pipelines backed by optimized sparse linear algebra.
  • Integrate custom C++/CUDA kernels into PyTorch with LibTorch & PyBind11.
  • Develop distributed training systems with PyTorch Distributed and efficient disk-based data streaming.
  • Profile and optimize Python/C++ pipelines for performance bottlenecks.
  • Apply ML for systems optimization (e.g., developed neural GPU kernel autotuner with transfer-learning support).

Research Background

My PhD research centers on scalable graph learning systems (GNNs, KGEs, GraphRAG) and custom sparse linear algebra.
At MLSys 2025, I presented a generalized method for expressing KGE training models through sparse matrix multiplication to improve large-scale efficiency. This led to model training speedup of up to 5.3× in CPU, 4.2× in GPU, and an improvement of up to 11.1× in CUDA memory efficiency.
I also built a high-performance CPU SpMM library (ACM WebConf 2024) that accelerates PyTorch GNN training by up to 93× across Intel, AMD, and ARM CPUs.
Additionally, during my Amazon Summer Internship 2025, I developed a Differentiable GPU Kernel Autotuner, which achieved up to 60% higher accuracy in predicting optimal configurations compared to 16 other autotuning models.
Currently, I am collaborating with Oak Ridge National Lab on a distributed, differentiable framework for large-scale KGE training across hybrid compute tiers.

Featured Research

  • SC24 best poster finalist (top-6): Expressed TransE Knowledge Graph Embedding (KGE) training using faster sparse-dense matrix multiplication kernel.
  • MLSys 2025 Paper: Generalized 10 KGE model training using faster sparse-dense matrix multiplication kernel.
  • Amazon AWS Internship 2025: Developed Differentiable GPU Kernel Autotuner.

Explore all 20+ research projects here, or check out a quick progress overview here.

Leadership Roles

I led and coordinated various academic and technical initiatives, including curriculum revisions, postgraduate programs, and programming contests at UIU, MIST, and BUET, fostering a culture of innovation and academic excellence across departments. See all my leadership roles and contributions here.