Yesterday I posted a problem with the hpc-toolset-tutorial demo that I was trying to run on AWS. Today I have it running on my own machine so I am able to access it using
But I have another problem with setting up ColdFront which is apparently necessary before I can run OnDemand.
The docs say to “Go to Admin interface, Users” and then click on the hpcadmin user and make it a "
superuser by checking the boxes next to
Staff Status and
Superuser Status - SAVE".
I am not clear what is the admin interface. Is it the link under the Admin menu that says Coldfront Administration? If so, there are no boxes that say
Staff Status and
Also, if I go to User Search under the admin menu and search for the hpcadmin user, the screen it brings me to also does not have
Staff Status or
Superuser Status boxes.
Is the documentation out of sync with the docker images?
I don’t recall any ColdFront setup required. I’ve never had to do anything beyond the OnDemand instructions.
What’s the issue you were seeing that made you think it was necessary?
When I log in to ondemand (https://localhost:3443) as hpcadmin and try and submit a job using job composer, I get
An error occurred when submitting jobs for simulation 1: sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
On our cluster this error means the user has not yet been set up to use the cluster. So that’s why I thought maybe setup in coldfront was necessary, and also since the docs have the coldfront setup first, before the ondemand setup.
What are the arguements you’re passing for account and partition?
We only use/work on those once a year in preperation for PEARC or similar conference. I did just watched this GIF of me submitting the job with empty values and that works.
Again - for 3 or so years, I’ve never had to go through the other tutorials, though that may have changed this year and I haven’t looked into it yet.
I am leaving those blank.
I think I had this thing working on a different machine a few weeks ago. Not sure what could be different.
FWIW, if I exec to the ood node and try and run a job from the command line I get a different error:
[hpcadmin@ondemand ~]$ srun echo hello
slurmstepd: error: setgroups: Operation not permitted
slurmstepd: error: write to unblock task 0 failed: Broken pipe
srun: error: cpn01: task 0: Exited with exit code 1