On building PyTorch libs in Docker
Recently I've been building containerized apps written in Caffe2/PyTorch. One of them had a dependency on a third-party API with some custom PyTorch modules built via
torch.utils.ffi. This is a three-step process:
nvcccompiles the CUDA code and builds a shared object.
- PyTorch utils create an FFI object.
- A subclass of
torch.nn.Modulewraps the implementation.
Building each external module was organized through
build.py files, each of them contained the following code:
if torch.cuda.is_available(): sources += sources_list headers += headers_list with_cuda = True ... ffi = torch.utils.ffi.create_extension( ext_name, headers=headers, sources=sources, relative_to=__file__, with_cuda=with_cuda, ... ) if __name__ == '__main__': ffi.build()
So, CUDA-capable build will be successful only if at the moment of building these extensions
True. Otherwise the script will proceed with default parameters for CPU-only build.
When someone is building the extensions with Docker, the drivers are mounted only when the container is started via Docker volume – there is no workaround for this case since volumes are not available during the build process. So, this approach is obviously incorrect. This explicit check for drivers availability is also unnecessary –
nvcc doesn't care as long as all libraries are available.
A Docker-friendly and generally more transparent approach:
- Remove the call to
- Expose the flag for compiling with/without CUDA in the main script.
This ensures the compliance with "fail fast" principle and saves a lot of time, especially if you have a lot of custom modules.