I’m finally coming back to this, and I’m quite sure that I’m close but still not able to make it work. I don’t think I’m totally clear which piece runs where, between the OnDemand server, the compute node, and then the compute node inside of Singularity.
We have our image on a shared filesystem that is mounted on both the OnDemand server and all compute nodes, and both websockify
and vncpasswd
/vncserver
are also on this shared filesystem. They’re not in the Singularity image at all.
Here’s my form, in /etc/ood/config/apps/bc_desktop:
title: "Amarel Desktop"
cluster: "amarel"
submit: submit/container.yml.erb
attributes:
desktop: "mate"
bc_vnc_idle: 0
bc_vnc_resolution:
required: true
node_type: null
memory_gigs:
widget: "number_field"
label: "Gigabytes of memory"
value: 4
help: |
Number of gigabytes of memory (larger values may mean longer wait)
min: 1
max: 100
step: 1
form:
- bc_vnc_idle
- desktop
# - bc_account
- bc_num_hours
# - bc_num_slots
- num_cores
- memory_gigs
- node_type
- bc_queue
- bc_vnc_resolution
- bc_email_on_started
- reservation
And then here’s my submit/container.yml.erb:
<%
image="/projects/community/containers/mate_desktop_v2.img"
%>
---
script:
native:
- "-c"
- "<%= num_cores.blank? ? 1 : num_cores.to_i %>"
- "--mem=<%= memory_gigs %>G"
template: "vnc"
batch_connect:
websockify_cmd: '/projects/community/containers/bin/websockify'
script_wrapper: |
cat << "CTRSCRIPT" > container.sh
export PATH="$PATH:/projects/community/containers/bin"
module purge
module load singularity
%s
CTRSCRIPT
module purge
module load singularity
export SINGULARITY_BINDPATH="/run/munge:/run/munge,/tmp/$ID:/run/user,/projects,/scratch,/cache/sw:/opt/sw,/cache,/cache/home:/home,/projectsp,/projectsn,/projectsc,/etc/slurm"
singularity exec <%= image %> /bin/bash container.sh
What I get with this setup is something I don’t understand in output.log
:
Setting VNC password...
e[91mERROR : Failed to set effective UID to 0
e[0mStarting VNC server...
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
e[91mERROR : Failed to set effective UID to 0
e[0me[91mERROR : Failed to set effective UID to 0
e[0m
Cleaning up...
e[91mERROR : Failed to set effective UID to 0
e[0m
This appears to be Singularity saying this, but I don’t totally understand it.
If I remove module purge; module load singularity
from the line above %s
, I get the below in output.log
:
Setting VNC password...
/projects/community/containers/bin/vncpasswd: line 3: singularity: command not found
Starting VNC server...
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
If I put it back but instead remove it from below CTRSCRIPT
, I get the following in output.log
:
/var/lib/slurm/slurmd/job14618206/slurm_script: line 216: singularity: command not found
I’m not totally sure what is happening. I’d appreciate any pointers you might be able to provide.