Interactive app -Desktop : can not find command

After I submitted a desktop app to slurm by ood webportal , I got a error in output.log file:

/var/spool/slurmd/job00019/slurm_script: line 130: hostname: command not found
Setting VNC password...
/var/spool/slurmd/job00019/slurm_script: line 121: head: command not found
/var/spool/slurmd/job00019/slurm_script: line 121: head: command not found
Starting VNC server...
/var/spool/slurmd/job00019/slurm_script: line 165: seq: command not found
Cleaning up...
/var/spool/slurmd/job00019/slurm_script: line 26: awk: command not found
/usr/bin/env: perl: No such file or directory
/var/spool/slurmd/job00019/slurm_script: line 30: pkill: command not found

And when I try to ensure env by modifying template file , I could not find the template file under “/var/www/ood/apps/sys” path.
So what should I do to solve this problem?

There are a couple of different ways to set the environment during a job’s execution.

By default we use --export=NONE when we submit jobs. I wonder what the behaviour is for you when you submit jobs with --export=NONE from your shell? What is the PATH in either case?

One thing I’d ask is if you’re using srun or something like that because it’s usually only when we try to issue srun command that you see an error like this.

This configuration here will submit all your jobs with --export=ALL.

# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
  metadata:
    title: "My Cluster"
  login:
    host: "my_cluster.my_center.edu"
  job:
    adapter: "slurm"
    copy_environment: true

Or you can try what we do, where we add the export SLURM_EXPORT_ENV=ALL in the before script itself.

I had added it before I submitted bc_desktop job.

# /etc/ood/config/apps/bc_desktop/submit/slurm-dev-submit.yml.erb
---
batch_connect:
  before_script: |
    # Export the module function if it exists
    [[ $(type -t module) == "function"  ]] && export -f module

    # MATE acts strange in pitzer-exp and doesn't like /var/run/$(id -u)
    export XDG_RUNTIME_DIR="$TMPDIR/xdg_runtime"

    # reset SLURM_EXPORT_ENV so that things like srun & sbatch work out of the box
    export SLURM_EXPORT_ENV=ALL

    # set profile
    source /etc/profile

script:
  native:
    - "-N"
    - "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"



and follow this instruction, I updated my cluster configuration

# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
  metadata:
    title: "Cluster-DEV"
  login:
    host: "192.168.19.78."
  job:
    adapter: "slurm"
    bin: "/usr/bin/"
    conf: "/etc/slurm/slurm.conf"
    copy_environment: true // updated
  batch_connect:
    copy_environment: true //updated
    basic:
      script_wrapper: |
        module purge
        %s
    vnc:
      script_wrapper: |
        module purge
        source /etc/profile //updated
        source ~/.bash_profile //updated
        export PATH="/usr/local/turbovnc/bin:/opt/TurboVNC/bin"
        export WEBSOCKIFY_CMD="/usr/local/websockify/run"
        %s

but got more error when I had tried to submit a bc_desktop job

/usr/libexec/grepconf.sh: line 5: grep: command not found
/usr/libexec/grepconf.sh: line 5: grep: command not found
/var/spool/slurmd/job00021/slurm_script: line 131: hostname: command not found
Setting VNC password...
/var/spool/slurmd/job00021/slurm_script: line 122: head: command not found
/var/spool/slurmd/job00021/slurm_script: line 122: head: command not found
Starting VNC server...
/var/spool/slurmd/job00021/slurm_script: line 166: seq: command not found
Cleaning up...
/var/spool/slurmd/job00021/slurm_script: line 27: awk: command not found
/usr/bin/env: perl: No such file or directory
/var/spool/slurmd/job00021/slurm_script: line 31: pkill: command not found

Emmm, when I submit a normal shell job, it can be queued ,started,ran and completed successfully.
I can use hostname/grep/awk and the other commands in nornal shell job.

OK how about this. Are you a csh user with something csh specific in your ~/.profile?

All our interactive apps use /bin/bash as the shell, so if you have something un-bash like in a common file like ~/.profile (common to both bash and csh) then you could run into some oddities if your environment.

No, I am not. I just use bash. and I add printenv command in script_wrapper

SLURM_NODELIST=master
SLURM_JOB_NAME=sys/dashboard/sys/bc_desktop/form
XDG_SESSION_ID=c26
SLURMD_NODENAME=master
SLURM_TOPOLOGY_ADDR=master
HOSTNAME=master
SLURM_PRIO_PROCESS=0
SLURM_NODE_ALIASES=(null)
SLURM_EXPORT_ENV=NONE
SHELL=/bin/bash
SLURM_JOB_QOS=normal
HISTSIZE=1000
TMPDIR=/tmp
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_NNODES=1
USER=dev
SLURM_JOBID=29
SLURM_TASKS_PER_NODE=1
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/dev/.local/bin:/home/dev/bin
MAIL=/var/spool/mail/dev
SLURM_WORKING_CLUSTER=cluster:192.168.19.244:6817:9472:109
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_ID=29
SLURM_JOB_USER=dev
PWD=/home/dev/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/form/output/bf2e442b-969c-4293-ba24-ba6a283df3c2
LANG=en_US.UTF-8
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
SLURM_JOB_UID=1001
LOADEDMODULES=
SLURM_NODEID=0
SLURM_SUBMIT_DIR=/var/www/ood/apps/sys/dashboard
SLURM_TASK_PID=6824
SLURM_CPUS_ON_NODE=1
SLURM_PROCID=0
ENVIRONMENT=BATCH
HISTCONTROL=ignoredups
SLURM_JOB_NODELIST=master
HOME=/home/dev
SHLVL=2
SLURM_LOCALID=0
SLURM_GET_USER_ENV=1
SLURM_JOB_GID=1001
SLURM_JOB_CPUS_PER_NODE=1
SLURM_CLUSTER_NAME=cluster
SLURM_GTIDS=0
SLURM_SUBMIT_HOST=node02
SLURM_JOB_PARTITION=compute1
LOGNAME=dev
SLURM_JOB_NUM_NODES=1
MODULESHOME=/usr/share/Modules
LESSOPEN=||/usr/bin/lesspipe.sh %s
XDG_RUNTIME_DIR=/run/user/1001
BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
}
_=/bin/printenv

Are there any problems?

Let’s remove OOD from the situation because you seem to have a very small/new cluster.

Are you sure everything works as it should from the command line? what happens for something similar to this. Are you able to correctly submit a very simple script (with and without the --export flag) that has say hostname.

[johrstrom ~()]  cat delme.sh 
#!/bin/bash

hostname
[johrstrom ~()]  sbatch delme.sh 
Submitted batch job 19315661
[johrstrom ~()]  cat slurm-19315661.out 
o0646.ten.osc.edu
[johrstrom ~()]  sbatch --export=NONE delme.sh 
Submitted batch job 19315665
[johrstrom ~()]  cat slurm-19315665.out 
o0646.ten.osc.edu

Also - # are comments in YAML. I don’t know if you added the //updated after the fact or what, but it’ll complain about that if you did.

I think I’ m sure .It really works.

[root@node02 dev]# cat myjob.sh
#!/bin/bash

hostname

ip addr |grep -v inet6 |grep inet |grep eth0


[root@node02 dev]# sbatch myjob.sh
Submitted batch job 33
[root@node02 dev]# cat slurm-33.out
master
    inet 192.168.19.244/24 brd 192.168.19.255 scope global noprefixroute eth0
[root@node02 dev]# sbatch --export=NONE myjob.sh
Submitted batch job 34
[root@node02 dev]# cat slurm-34.out
master
    inet 192.168.19.244/24 brd 192.168.19.255 scope global noprefixroute eth0
[root@node02 dev]# sbatch --export=ALL myjob.sh
Submitted batch job 35
[root@node02 dev]# cat slurm-35.out
master
    inet 192.168.19.244/24 brd 192.168.19.255 scope global noprefixroute eth0
[root@node02 dev]#

OK I can see that node02 must be you’re web server and that you’ve submitted from the same node. So that’s all good.

I know I’m stuck on this export thing, but I would like to confirm that when you submit
sbatch --export=NONE myjob.sh you don’t have any SLURM env variables defined that could override this, specifically SLURM_EXPORT_ENV.

I’ll keep digging on the slurm side.

I think it could not be overrided.I modified myjob.sh and submited again, there are some resaults

[root@node02 dev]# cat myjob.sh
#!/bin/bash

#hostname

#ip addr |grep -v inet6 |grep inet |grep eth0

printenv

[root@node02 dev]# sbatch --export=NONE myjob.sh
Submitted batch job 39
[root@node02 dev]# sbatch  myjob.sh
Submitted batch job 40
[root@node02 dev]# diff slurm-39.out slurm-40.out
3c3
< XDG_SESSION_ID=c30
---
> XDG_SESSION_ID=243
9d8
< SLURM_EXPORT_ENV=NONE
10a10
> TERM=xterm
14a15,16
> SSH_CLIENT=192.168.19.5 44014 22
> SSH_TTY=/dev/pts/0
17c19,20
< SLURM_JOBID=39
---
> LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
> SLURM_JOBID=40
19c22
< PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
---
> PATH=/home/ruby-3.0.0/bin:/home/node-v14.18.1-linux-x64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
23,24c26,27
< SLURM_JOB_ID=39
< _=/bin/printenv
---
> SLURM_JOB_ID=40
> _=/usr/bin/printenv
33c36
< SLURM_TASK_PID=9728
---
> SLURM_TASK_PID=9749
34a38
> RUBY_HOME=/home/ruby-3.0.0
42d45
< SLURM_GET_USER_ENV=1
50a54
> SSH_CONNECTION=192.168.19.5 44014 192.168.19.250 22
53a58
> DISPLAY=localhost:10.0
54a60
> NODE_HOME=/home/node-v14.18.1-linux-x64

and in slurm-39.out which set export=NONE ,I can see that SLURM_EXPORT_ENV is None.

SLURM_NODELIST=master
SLURM_JOB_NAME=myjob.sh
XDG_SESSION_ID=c30
SLURMD_NODENAME=master
SLURM_TOPOLOGY_ADDR=master
HOSTNAME=master
SLURM_PRIO_PROCESS=0
SLURM_NODE_ALIASES=(null)
SLURM_EXPORT_ENV=NONE
SHELL=/bin/bash
SLURM_JOB_QOS=normal
HISTSIZE=1000
TMPDIR=/tmp
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_NNODES=1
USER=root
SLURM_JOBID=39
SLURM_TASKS_PER_NODE=1
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
MAIL=/var/spool/mail/root
SLURM_WORKING_CLUSTER=cluster:192.168.19.244:6817:9472:109
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_ID=39
_=/bin/printenv
SLURM_JOB_USER=root
PWD=/home/dev
LANG=en_US.UTF-8
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
SLURM_JOB_UID=0
LOADEDMODULES=
SLURM_NODEID=0
SLURM_SUBMIT_DIR=/home/dev
SLURM_TASK_PID=9728
SLURM_CPUS_ON_NODE=1
SLURM_PROCID=0
ENVIRONMENT=BATCH
HISTCONTROL=ignoredups
SLURM_JOB_NODELIST=master
HOME=/root
SHLVL=2
SLURM_LOCALID=0
SLURM_GET_USER_ENV=1
SLURM_JOB_GID=0
SLURM_JOB_CPUS_PER_NODE=1
SLURM_CLUSTER_NAME=cluster
SLURM_GTIDS=0
SLURM_SUBMIT_HOST=node02
SLURM_JOB_PARTITION=compute1
LOGNAME=root
SLURM_JOB_ACCOUNT=root
SLURM_JOB_NUM_NODES=1
MODULESHOME=/usr/share/Modules
LESSOPEN=||/usr/bin/lesspipe.sh %s
XDG_RUNTIME_DIR=/run/user/0
BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
}

I mean to check the environment variables of the submitting user’s environment, not the jobs’.

You are testing the CLI parity as root. Can you submit jobs as your non-privileged regular user?

It’s my fault. I lost my system $path. It should be export PATH="/usr/local/turbovnc/bin:/opt/TurboVNC/bin:$PATH"