I almost tried to install AMD rocm a while ago after discovering the simplicity ...

atq2119 · on Oct 2, 2024

So no doubt modern software is ridiculously bloated, but ROCm isn't just a GPU driver. It includes all sorts of tools and libraries as well.

By comparison, if you go and download the CUDA toolkit as a single file, you get a download file that's over 4GB, so quite a bit larger than the download size you quoted. I haven't checked how much that expands to (it seems the ROCm install has a lot of redundancy given how well it compresses), but the point is, you get something that seems insanely large either way.

tomxor · on Oct 2, 2024

I suspected that, but any binaries being that large just seems wrong, I mean the whole thing is 35 time larger than my entire OS install.

Do you know what is included in ROCm that could be so big? Does it include training datasets or something?

lhl · on Oct 3, 2024

Here's the big files in my /opt/rocm/lib which is most of it:

  4.8G hipblaslt
  1.6G libdevice_conv_operations.a
  2.0G libdevice_gemm_operations.a
  1.4G libMIOpen.so.1.0.60200
  1.1G librocblas.so.4.2.60200
  1.6G librocsolver.so.0.2.60200
  1.4G librocsparse.so.1.0.60200
  1.5G llvm
  3.5G rocblas
  2.0G rocfft

The biggest one just to pick on one is hipblaslt is "a library that provides general matrix-matrix operations. It has a flexible API that extends functionalities beyond a traditional BLAS library, such as adding flexibility to matrix data layouts, input types, compute types, and algorithmic implementations and heuristics." https://github.com/ROCm/hipBLASLt

There are mostly GPU kernels that by themselves aren't so big, but for every single operation x every single supported graphics architecture, eg:

  304K TensileLibrary_SS_SS_UA_Type_SS_Contraction_l_Ailk_Bjlk_Cijk_Dijk_gfx942.co
  24K TensileLibrary_SS_SS_UA_Type_SS_Contraction_l_Ailk_Bjlk_Cijk_Dijk_gfx942.dat
  240K TensileLibrary_SS_SS_UA_Type_SS_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx942.co
  20K TensileLibrary_SS_SS_UA_Type_SS_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx942.dat
  344K TensileLibrary_SS_SS_UA_Type_SS_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx942.co
  24K TensileLibrary_SS_SS_UA_Type_SS_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx942.dat

saagarjha · on Oct 3, 2024

Ok so like four of those files literally just do matrix multiplications

EmilyHATFIELD · on Oct 3, 2024

"just"

saagarjha · on Oct 4, 2024

Ok some of them do tensor contractions too my bad

skirmish · on Oct 2, 2024

My understanding is that ROCm contains all included kernels for each supported architecture, so it would have (made up):

  -- matrix multiply 2048x2048 for Navi 31,
  -- same for Navi 32,
  -- same for Navi 33,
  -- same for Navi 21,
  -- same for Navi 22,
  -- same for Navi 23,
  -- same for Navi 24, etc.
  -- matrix multiply 4096x4096 for Navi 31,
  -- ...

slavik81 · on Oct 3, 2024

Correct. Although, you wouldn't find Navi 22, 23 or 24 in the list because those particular architectures are not supported. Instead, you'd see Vega 10, Vega 20, Arcturus, Aldebaran, Aqua Vanjaram and sometimes Polaris.

We're working on a few different strategies to reduce the binary size. It will get worse before it gets better, but I think you can expect significant improvements in the future. There are lots of ways to slim the libraries down.

steeve · on Oct 2, 2024

You can look us up at https://github.com/zml/zml, we fix that.

andyferris · on Oct 2, 2024

Wait, looking at that link I don't see how it avoids downloading CUDA or ROCM. Do you use MLIR to compile to GPU without using the vendor provided tooling at all?

steeve · on Oct 3, 2024

We do use ROCm and CUDA. Only we sandbox it with the model and download only the needed parts which are about 1/10th of the size.

burnte · on Oct 2, 2024

CPU drivers are complete OSes that run on the GPUs now.

greenavocado · on Oct 2, 2024

It's not just you; AMD manages to completely shit-up the Linux kernel with their drivers: https://www.phoronix.com/news/AMD-5-Million-Lines

striking · on Oct 2, 2024

> Of course, much of that is auto-generated header files... A large portion of it with AMD continuing to introduce new auto-generated header files with each new generation/version of a given block. These verbose header files has been AMD's alternative to creating exhaustive public documentation on their GPUs that they were once known for.

NekkoDroid · on Oct 2, 2024

There have been talks about moving those headers to a separate repo and only including the needed headers upstream[1]

[1]: https://gitlab.freedesktop.org/drm/amd/-/issues/3636

anthk · on Oct 2, 2024

OpenBSD, too.