sudo make

On building PyTorch libs in Docker

Recently I've been building containerized apps written in Caffe2/PyTorch. One of them had a dependency on a third-party API with some custom PyTorch modules built via torch.utils.ffi. This is a three-step process:

  1. nvcc compiles the CUDA code and builds a shared object.
  2. PyTorch utils create an FFI object.
  3. A subclass of torch.nn.Module wraps the implementation.

Building each external module was organized through build.py files, each of them contained the following code:

if torch.cuda.is_available():
    sources += sources_list
    headers += headers_list
    with_cuda = True
    ...
    
ffi = torch.utils.ffi.create_extension(
    ext_name,
    headers=headers,
    sources=sources,
    relative_to=__file__,
    with_cuda=with_cuda,
    ...
)

if __name__ == '__main__':
    ffi.build()

So, CUDA-capable build will be successful only if at the moment of building these extensions torch.cuda.is_available() returns True. Otherwise the script will proceed with default parameters for CPU-only build.

When someone is building the extensions with Docker, the drivers are mounted only when the container is started via Docker volume – there is no workaround for this case since volumes are not available during the build process. So, this approach is obviously incorrect. This explicit check for drivers availability is also unnecessary – nvcc doesn't care as long as all libraries are available.

A Docker-friendly and generally more transparent approach:

  1. Remove the call to torch.cuda.is_available().
  2. Expose the flag for compiling with/without CUDA in the main script.

This ensures the compliance with "fail fast" principle and saves a lot of time, especially if you have a lot of custom modules.

Author image
About Roman Trusov