CUDA: Unified vs. Managed Memory

Posted on 23 Feb 2025 in Computing

TL;DR / skip reading and go here: https://github.com/pmekhedjian/cuda-device-properties

These are CUDA notes for what will eventually be a comprehensive guide for myself on referring to memory management for programming in CUDA C and CUDA Fortran. Why CUDA, you might ask? Well, if you have an Nvidia GPU sitting in your desktop or server right now, you're sitting on a lot of untapped potential. Used intelligently and cleverly, GPU acceleration has the potential to significantly speed up your code. It's quite simple, provided you can understand and access the inner workings of the GPU via the CUDA runtime API.

I won't lie to you. To code CUDA correctly, you need to have a computer science major's level of understanding of software development, memory management, and knowledge of how code defines and accesses information in memory during runtime. As an SRE with a physics background, this doesn't come naturally to me either. Frankly, I suck at coding, but I'm willing to get my hands dirty and learn something new along the way.

A friend recommended I check out "unified memory" in CUDA as something that could help me more easily juggle the paradigm of host vs. device memory. In CUDA speak, the word "host" refers to anything in system memory (RAM), number-crunched by the CPU, and "device" is anything relating to the GPU's video memory (VRAM), number-crunched by the GPU cores themselves. This is my naive attempt at figuring this out, and I'm probably not going to get a lot right on the first go (so forgive me). I look forward to unpacking more as I learn.

As a physicist with exposure to Fortran and Python, it seemed natural to want to dive into a part of the pool I was more familiar with, and started looking at CUDA Fortran via samples referenced in the Nvidia HPC SDK, and then found writing(s) and posts by Greg Ruetsch, who also wrote the book on it. I became enamored. The more I read, the more I realized I should be learning both CUDA C and CUDA Fortran, as there are a ton of differences in the two languages regarding the CUDA implementations, and because they're both useful for what I'm interested in. Though superficially similar, Fortran and C/C++ are not one-to-one in how the CUDA API is exposed in their respective languages, and recognizing the nuances and limits of each would make me a better CUDA programmer in the long run.

Long story short, the CUDA C Programming Guide on the Nvidia website also documents a lot of useful constraints and definitions of the API from the C/C++ perspective. One major point is that of unified memory (see this post here as a starting point: Unified Memory for CUDA Beginners), which allows you to define, access, and utilize data that is usable by both the CPU (host) and GPU (device) compute, without having to ferry information back and forth.

I developed a short CUDA C++ and CUDA Fortran code that outputs what the current setting of the device parameters is in your Nvidia GPU. Is your system using basic CUDA managed memory, or does it have full CUDA unified memory support, to allow you to seamless use data across CPUs and GPUs on the system?

If you own a CUDA-capable Nvidia GPU, I invite you to clone this repo and try it out yourself. It does require you download and install the Nvidia HPC SDK in order to use nvfortran and nvcc, but it's free and a high-quality SDK you should have in your toolbox (instructions and code in repo):

$ git clone https://github.com/pmekhedjian/cuda-device-properties

Or visit the page here: https://github.com/pmekhedjian/cuda-device-properties

I do hope this piques your interest in CUDA and GPU computing!

Next up (probably): Why heterogenous memory management (HMM) in Linux will turn your CUDA Managed Memory GPU into a CUDA Unified Memory GPU. Is that really possible?!

References:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/

https://developer.nvidia.com/blog/unified-memory-cuda-beginners

https://developer.nvidia.com/blog/easy-introduction-cuda-fortran

https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide

https://github.com/romerojosh/CUDA-Fortran-Tutorial

https://github.com/Koushikphy/Intro-to-CUDA-Fortran