Md Saidul Hoque Anik
  • About
  • Research
  • News
  • Teaching
  • More
    • LeetGPU
    • Leadership
    • Software
    • Publications
    • Blog
Categories
All (3)
Paper Reading (1)
Project (2)

Blogs and Notes

  • Getting Started with TinyCUDA

    TinyCUDA is a header-only C++17 wrapper that eliminates CUDA boilerplate—auto-handles cudaMalloc/cudaMemcpy via Buffer, kernel errors with CUDA_CHECK, and timings via KernelProfiler. Ideal for rapid 1D buffer prototyping without full-framework overhead (no autograd or multi-D tensors).
    Dec 26, 2025 Project
  • [Paper Summary] REFRAG: Rethinking RAG based Decoding

    This paper from Meta Superintelligence Labs proposes an efficient decoding framework that optimizes Retrieval-Augmented Generation (RAG) by addressing latency and memory bottlenecks associated with long-context inputs. It achieves this through a novel approach involving compressed chunk embeddings and a reinforcement learning-based selective expansion policy, exploiting attention sparsity. The framework demonstrates substantial improvements in time-to-first-token (TTFT) acceleration and effective context window extension while maintaining perplexity and downstream task accuracy.
    Oct 24, 2025 Paper Reading
  • iSpLib: An Auto-tuned GNN Accelerator for PyTorch

    [Published in WebConf 2024] iSpLib is a PyTorch library that accelerates Graph Neural Network (GNN) training by integrating auto-tuned sparse matrix operations from FusedMM, delivering up to 93× speedup on large graphs like Reddit and OGBN-Proteins. It features plug-and-play patching for PyTorch Geometric, backpropagation support for semirings, and caching for fixed adjacency matrices, boosting models like GCN (54×), GraphSAGE (23-32×), and GIN (51×).
    Sep 14, 2024 Project
No matching items