The keys all looked ok but just to be sure I moved my .ssh folder, created a new key and copied it to the cluster just to be sure and now as a user I can once again submit job and such. That still leaves my original problem where interactive desktops still aren’t working.
I was watching the Slurm log on my cluster when I submitted an interactive desktop job and it does actually get a Slurm job ID but output.log never shows up on the OOD server to give me any other information.
Output from error.log in ondemand-nginx for that user:
INFO "method=GET path=/pun/sys/dashboard/batch_connect/sys/bc_desktop/rcs-sc/session_contexts/new format=html controller=BatchConnect::SessionContextsController action=new status=200 duration=25.52 view=15.12"
App 16492 output: [2023-02-23 09:49:27 -0500 ] INFO "execve = [{\"SLURM_CONF\"=>\"/opt/slurm/slurm.conf\"}, \"ssh\", \"-o\", \"BatchMode=yes\", \"-o\", \"UserKnownHostsFile=/dev/null\", \"-o\", \"StrictHostKeyChecking=yes\", \"rcs-scsn.fandm.edu\", \"/usr/bin/sbatch\", \"-D\", \"/home/user/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/rcs-sc/output/a4d5d22d-c981-46c3-9ff8-a95aa952e88d\", \"-J\", \"sys/dashboard/sys/bc_desktop/rcs-sc\", \"-o\", \"/home/user/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/rcs-sc/output/a4d5d22d-c981-46c3-9ff8-a95aa952e88d/output.log\", \"-t\", \"01:00:00\", \"--export\", \"NONE\", \"-N\", \"1\", \"--parsable\"]"
App 16492 output: [2023-02-23 09:49:28 -0500 ] INFO "method=POST path=/pun/sys/dashboard/batch_connect/sys/bc_desktop/rcs-sc/session_contexts format=html controller=BatchConnect::SessionContextsController action=create status=302 duration=512.04 view=0.00 location=https://rcs-grid.fandm.edu/pun/sys/dashboard/batch_connect/sessions"
App 16492 output: [2023-02-23 09:49:28 -0500 ] INFO "execve = [{\"SLURM_CONF\"=>\"/opt/slurm/slurm.conf\"}, \"ssh\", \"-o\", \"BatchMode=yes\", \"-o\", \"UserKnownHostsFile=/dev/null\", \"-o\", \"StrictHostKeyChecking=yes\", \"rcs-scsn.fandm.edu\", \"/usr/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"59586\"]"
App 16492 output: [2023-02-23 09:49:28 -0500 ] INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions format=html controller=BatchConnect::SessionsController action=index status=200 duration=348.24 view=19.88"
App 16492 output: [2023-02-23 09:49:38 -0500 ] INFO "execve = [{\"SLURM_CONF\"=>\"/opt/slurm/slurm.conf\"}, \"ssh\", \"-o\", \"BatchMode=yes\", \"-o\", \"UserKnownHostsFile=/dev/null\", \"-o\", \"StrictHostKeyChecking=yes\", \"rcs-scsn.fandm.edu\", \"/usr/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"59586\"]"
App 16492 output: [2023-02-23 09:49:39 -0500 ] INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=416.56 view=8.68"
I suspect my problem is the hostaname. For example, on one of our nodes, hostname is n01.cluster and that hostname is not accessible to our OOD server. I added the set_host lines in my clusters config file with no luck. Job still doesn’t start and there still isn’t an output.log file