Enabling Model Parallelism for Graph Neural Networks with a Scalable Sparse-Dense Matrix Multiplication Operator

PyTorch

SpMM

Distributed

Differentiable

A new distributed, differentiable PyTorch sparse–dense matrix operator that enables scalable model-parallel training of very large Graph Neural Networks.

Published

November 2025

Note

PhD project ongoing work
Presented at Sparstitute All Member Meeting (Barkeley, CA)

Sparsitute Meeting 2025

Overview

[Work in progress] We are developing a distributed, differentiable sparse-matrix library designed to support model parallelism for Graph Neural Networks with very large parameters. The framework aims to improve memory efficiency and scalability across multiple devices, enabling efficient training of large-scale graph models. While still in progress, the work also lays the groundwork for future applications in sparse transformer architectures and other memory-intensive neural networks.