I’m trying to apply a GPU with Slurm using native attributes. It runs but the GPUs aren’t being reserved as expected. How can I see the actual sbatch command that is running to know what is going on? My actual submit.yml.erb file is given below:
batch_connect:
template: "basic"
<%-
slurm_args = if gpu_switch == 1
["--gpus-per-node", "1", "--gres", "gpu:1" ]
else
[]
end
-%>
script:
native:
<%- slurm_args.each do |arg| %>
- "<%= arg %>"
<%- end %>
Yea you can see the command being run in /var/log/ondemand-nginx/USER/error.log. For this error, I just kinda knew from experience that they’re always strings - maybe I’d run into the same issue once or twice. Something like this, gpu_switch isn’t going to be in the sbatch command, but you can see what all these templates got templated with in the user_defined_context.json in the job’s session directory (the link in the card). This will indicate whether or not flags are being correctly passed from the form to the submit.yml.
I’m not seeing much useful in error.log. The user_defined_context.json is definitely helpful.
If I drop print statements into my ERB block in submit.yml.erb, I cannot find the output anywhere. Is there a way to print out info like this?
No, but if you’re in a development environment - that is, in your home directory where only you have access to this app - I have raised errors with debug messages like:
<%-
raise StandardError.new("the value of the things I'm looking for is: #{the_thing.inspect}")
-%>