So no doubt modern software is ridiculously bloated, but ROCm isn't just a GPU driver. It includes all sorts of tools and libraries as well.
By comparison, if you go and download the CUDA toolkit as a single file, you get a download file that's over 4GB, so quite a bit larger than the download size you quoted. I haven't checked how much that expands to (it seems the ROCm install has a lot of redundancy given how well it compresses), but the point is, you get something that seems insanely large either way.
The biggest one just to pick on one is hipblaslt is "a library that provides general matrix-matrix operations. It has a flexible API that extends functionalities beyond a traditional BLAS library, such as adding flexibility to matrix data layouts, input types, compute types, and algorithmic implementations and heuristics." https://github.com/ROCm/hipBLASLt
There are mostly GPU kernels that by themselves aren't so big, but for every single operation x every single supported graphics architecture, eg:
My understanding is that ROCm contains all included kernels for each supported architecture, so it would have (made up):
-- matrix multiply 2048x2048 for Navi 31,
-- same for Navi 32,
-- same for Navi 33,
-- same for Navi 21,
-- same for Navi 22,
-- same for Navi 23,
-- same for Navi 24, etc.
-- matrix multiply 4096x4096 for Navi 31,
-- ...
Correct. Although, you wouldn't find Navi 22, 23 or 24 in the list because those particular architectures are not supported. Instead, you'd see Vega 10, Vega 20, Arcturus, Aldebaran, Aqua Vanjaram and sometimes Polaris.
We're working on a few different strategies to reduce the binary size. It will get worse before it gets better, but I think you can expect significant improvements in the future. There are lots of ways to slim the libraries down.
Wait, looking at that link I don't see how it avoids downloading CUDA or ROCM. Do you use MLIR to compile to GPU without using the vendor provided tooling at all?
> Of course, much of that is auto-generated header files... A large portion of it with AMD continuing to introduce new auto-generated header files with each new generation/version of a given block. These verbose header files has been AMD's alternative to creating exhaustive public documentation on their GPUs that they were once known for.