After I submitted a desktop app to slurm by ood webportal , I got a error in output.log file:
/var/spool/slurmd/job00019/slurm_script: line 130: hostname: command not found
Setting VNC password...
/var/spool/slurmd/job00019/slurm_script: line 121: head: command not found
/var/spool/slurmd/job00019/slurm_script: line 121: head: command not found
Starting VNC server...
/var/spool/slurmd/job00019/slurm_script: line 165: seq: command not found
Cleaning up...
/var/spool/slurmd/job00019/slurm_script: line 26: awk: command not found
/usr/bin/env: perl: No such file or directory
/var/spool/slurmd/job00019/slurm_script: line 30: pkill: command not found
And when I try to ensure env by modifying template file , I could not find the template file under “/var/www/ood/apps/sys” path.
So what should I do to solve this problem?
There are a couple of different ways to set the environment during a job’s execution.
By default we use --export=NONE when we submit jobs. I wonder what the behaviour is for you when you submit jobs with --export=NONE from your shell? What is the PATH in either case?
One thing I’d ask is if you’re using srun or something like that because it’s usually only when we try to issue srun command that you see an error like this.
This configuration here will submit all your jobs with --export=ALL.
# /etc/ood/config/apps/bc_desktop/submit/slurm-dev-submit.yml.erb
---
batch_connect:
before_script: |
# Export the module function if it exists
[[ $(type -t module) == "function" ]] && export -f module
# MATE acts strange in pitzer-exp and doesn't like /var/run/$(id -u)
export XDG_RUNTIME_DIR="$TMPDIR/xdg_runtime"
# reset SLURM_EXPORT_ENV so that things like srun & sbatch work out of the box
export SLURM_EXPORT_ENV=ALL
# set profile
source /etc/profile
script:
native:
- "-N"
- "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"
and follow this instruction, I updated my cluster configuration
but got more error when I had tried to submit a bc_desktop job
/usr/libexec/grepconf.sh: line 5: grep: command not found
/usr/libexec/grepconf.sh: line 5: grep: command not found
/var/spool/slurmd/job00021/slurm_script: line 131: hostname: command not found
Setting VNC password...
/var/spool/slurmd/job00021/slurm_script: line 122: head: command not found
/var/spool/slurmd/job00021/slurm_script: line 122: head: command not found
Starting VNC server...
/var/spool/slurmd/job00021/slurm_script: line 166: seq: command not found
Cleaning up...
/var/spool/slurmd/job00021/slurm_script: line 27: awk: command not found
/usr/bin/env: perl: No such file or directory
/var/spool/slurmd/job00021/slurm_script: line 31: pkill: command not found
Emmm, when I submit a normal shell job, it can be queued ,started,ran and completed successfully.
I can use hostname/grep/awk and the other commands in nornal shell job.
OK how about this. Are you a csh user with something csh specific in your ~/.profile?
All our interactive apps use /bin/bash as the shell, so if you have something un-bash like in a common file like ~/.profile (common to both bash and csh) then you could run into some oddities if your environment.
Let’s remove OOD from the situation because you seem to have a very small/new cluster.
Are you sure everything works as it should from the command line? what happens for something similar to this. Are you able to correctly submit a very simple script (with and without the --export flag) that has say hostname.
OK I can see that node02 must be you’re web server and that you’ve submitted from the same node. So that’s all good.
I know I’m stuck on this export thing, but I would like to confirm that when you submit sbatch --export=NONE myjob.sh you don’t have any SLURM env variables defined that could override this, specifically SLURM_EXPORT_ENV.