I’ve got Jupyter/JupyterLab and Slurm working but I’m not sure how to fully build out what I need–that is Tensorflow GPU for notebooks and Slurm. I have anaconda installed on the nodes but between Jupyter, kernels, conda, and Lmod, I’m not sure how to put it all together. Can someone point me to any references that might help (or even just advice/guidance). I’ve never tried to set up these things for a clustered environment before. Thanks.
Do you have TensorFlow already installed in your cluster?
We have to use a Singularity container here for TensorFlow. So, we installed Jupyter in that same container, and then used that container as the backend for our OOD Jupyter app. Jupyter launches within the container environment, and the TensorFlow package (in the container) can be imported.
It’s actually an NVidia DeepOps POC on some OpenStack nodes, so technically Tensorflow is available via nvidia-docker. Can you post the relevant parts of your Jupyter app code/config/yaml? I might be able to translate it for nvidia-docker. Thanks…
Using Singularity, there’s only one change required to the template/script.sh.erb
file.
Replace:
jupyter notebook --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
With:
singularity exec /path/to/my/TF_container.sif jupyter notebook --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
Thanks… that was pretty easy to get set up. I did it like this to also provide JupyterLab:
singularity exec /path/to/my/TF_container.sif \
jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> \
--ip="0.0.0.0" \
--config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
Also a form tweak required for JupyterLab: JupyterLab Installed
By the way, I notice you are exec’ing a container. I’m new to Singularity, but wouldn’t that normally be an image in a multiuser environment? I’m guessing there’s something I don’t understand. Thanks…
Glad to hear it worked. Not sure I follow your Singularity question. I’m definitely not a Singularity expert but happy to explain if you can provide more context.
Never mind! I was looking at an old video that used a .simg extension when pulling from a Docker repo. I think that’s legacy terminology now replaced by .sif