Docker workarounds

When running intensive code e.g. pytorch dataloaders, errors may occur in a docker container due to memory limits. You may see errors such as Unable to open shared memory object </torch_3500_2599739126>. To work around them, add

--ipc=host

to your docker create or run command.

An example of a complete command for one of our machines:

docker create --name maskrcnn-devel-mike --runtime=nvidia --mount type=bind,source=/usr/local/data/msmith,target=/usr/local/data/msmith --mount type=bind,source=/home/vision/msmith,target=/home/vision/msmith --mount type=bind,source=/usr/local/data2/msmith,target=/usr/local/data2/msmith --privileged --ipc=host -p 8850:8850 -it nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 bash

GPU Benchmarking

Some basic numbers for our systems that might be generally useful. Executed using tensorflow-nightly as of April 22, 2019 as a ResNet50 model with the batch size set to 32 or 64 depending what the GPU can fit. Values reported are images/sec.

Float type/GPUQuadro P6000K402080 Ti
32 bit float21330300
16 bit float27433496

So the 2080 Ti is 10-15x faster than the K40, while the P6000 is 7-8x faster. The 2080 Ti is between 1.4-1.8x faster than the P6000, but of course the P6000’s advantage is that it has 24GB of memory which is necessary for some models.

Lastly, the P6000 can get a few more images/sec improvement when using even larger batch sizes but the general trends hold consistently.

Docker commands

These commands will likely be useful for anyone doing ML and needing to use different libraries and/or not conflict with other users on the same machine. They are targeted towards our lab but may be generally useful.

1. Create container.   

  • Container name is up to you.  To make Nick’s life easier, I recommend adding your name to it so he knows who uses it. 
  • –runtime-nvidia says to allow GPU access 
  • The two –mount options allow access to the home folder and local hardrive 
  • Generally speaking, the hard drive on the machine is for access to datasets and other large things that goes on the hard drive and not the SSD while the home folder is for code and other small things that need to be backed up 
  • In the case of machines with other additional drives e.g. soma with /usr/local/data2, add that as well 
  • Remember to replace <user> with your username! 
  • The -p argument specifies port forwading, which is only really useful if you run some web-based programs like Jupyter notebook.  Some containers such as tensorflow specify some ports by default, others like pytorch don’t. 
  • -it says to use an interactive session 
  • <image> specifies the image to download and use from the internet e.g. pytorch or tensorflow prebuilt images.  The 2nd to last parameter specifies the image you want.  This is the key part, but depends on what you want to use; here in the example, I’ve specified the latest version of tensorflow with GPU support for Python 3.  You can of course use other images such as a basic CUDA install (e.g nvidia/cuda:latest) or Pytorch (pytorch/pytorch:latest) 
  • The last argument is what to run when the container starts; we just run bash to let you use it interactively 

Format 

docker create --name <container name> --runtime=nvidia --mount type=bind,source=/usr/local/data/<user>,target=/usr/local/data/<user> --mount type=bind,source=/home/vision/<user>,target=/home/vision/<user> -p <port>:<port> -it <image> bash 

Examples 

docker create --name tensorflow-test --runtime=nvidia --mount type=bind,source=/usr/local/data/msmith,target=/usr/local/data/msmith --mount type=bind,source=/home/vision/msmith,target=/home/vision/msmith -p 8888:8888 -it tensorflow/tensorflow:latest-gpu-py3 bash 

or 

docker create --name pytorch-test --runtime=nvidia --mount type=bind,source=/usr/local/data/msmith,target=/usr/local/data/msmith --mount type=bind,source=/home/vision/msmith,target=/home/vision/msmith -p 8889:8889 -it pytorch/pytorch:nightly-runtime-cuda9.2-cudnn7 bash 

2. Start container 

docker start <container name> 
e.g. docker start tensorflow-test 

3. “Attach” your terminal to container – similar to byobu/tmux 

docker attach <container name> 

4. At this point, you can now run your experiments within this container.  Note that you may have to install some software/libraries manually, at least for non-standard python packages.  These will not interfere with the system versions though, and if you manage to get it into an unrecoverable state you can delete the container and start over. 

You are also free to use apt as normal within the container; it will not affect the underlying system. 

e.g. docker attach tensorflow-test 

4. You can leave it running, similar to tmux/byobu with Ctrl-P then Ctrl-Q.  Ctrl-D will exit bash which will stop the container, which can then be restarted later. 

You may still want to use byobu or tmux within the container though. 

5. Stopping a container manually: 

docker stop <container name> 
e.g. docker stop tensorflow-test 

5. If you want to delete a container: 

docker rm <container name> 
e.g. docker rm tensorflow-test 

Note that this is exactly like deleting a virtual machine, so any installed software on the container will be deleted. 

Using remote file mounting 

If you need to access files on a remote device e.g. the Synology from within the docker, the easiest solution is to add –privileged to the docker create command.  You can then modify /etc/fstab in the docker and mount the folder without any danger of destroying the host system.  The /etc/fstab has one entry per line, with a line formatted as follows for the Synology: 

<remote mount location> <local mount location> cifs rw,user,username=<your username>,noauto,vers=3.02 0 0 

Before you mount the folder, you may need to run 

apt-get install cifs-utils 

in the docker, depending on how the container was built.  Following that, you can mount the folder using the command 

mount <local mount location> 

and entering your password.  It will stay mounted until the docker container is killed. 

An example of /etc/fstab (should be all one line): 

//10.45.0.1/APL      /APL     cifs    rw,user,username=msmith,noauto,vers=3.02        0       0 

Note that the remote mounting location varies depending on the host because of the way the Synology is networked to the computers.  Using the above APL shared folder as an example, here is what to use depending on the host 

godiva: //10.45.0.1/APL 

richart: //10.46.0.1/APL 

debondt: //10.44.0.1/APL 

All other computers: //apl.cim.mcgill.ca/APL 

Any folder you can see in the web interface File Station is mountable this way.  This includes the home folder which is specific to you and not visible by anyone else.