POC Interactive Desktop configuration

Hi,

I’m a new OOD user and I’ve been trying to use this Ansible role to configure a test cluster of two nodes. OOD server is running on one ubuntu20.04, and the other will be the compute node(running Ubuntu). I’m able to run the ansible role and login does work. I have no clue on how to actually add clusters with ansible and run VDI desktop. I did use the .apps.yml example and changed the host to point it to the compute node. What am I missing? Do I need to do anything on the compute node? I’m assuming tasks/apps.yml will configure the cluster if given the right vars. Please help.

OOD v2.0.9

cat clusters.d/my_cluster.yml

v2:
  metadata:
    title: "my_cluster"
  login:
    host: "<compute.node>"
  job:
    adapter: slurm
    bin: /usr/local
  batch_connect:
    basic:
      script_wrapper: |
        module purge
        %s
    vnc:
      script_wrapper: |
        module purge
        export PATH="/opt/TurboVNC/bin:$PATH"
        export WEBSOCKIFY_CMD="/usr/local/bin/websockify"
        %s

cat apps/bc_desktop/my_cluster.yml

title: "remote desktop"
cluster: my_cluster
submit: "submit/my_submit.yml.erb"
attributes:
  bc_queue: null
  desktop: "xfce"

cat apps/bc_desktop/submit/submit.yml.erb

{"script": {"native": ["<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>", "1"]}}

I also want to make this clear, it’s not really an issue with the ansible role. Here is the error that I’m seeing for my specific use case. I’ve changed desktop: "xfce" to default(mate).

App 7599 output: [2022-04-18 01:28:02 +0000 ]  INFO "execve = [{}, \"/usr/local/sbatch\", \"-D\", \"/home/vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/ec774786-8da1-45f3-8661-02b9d529d46b\", \"-J\", \"sys/dashboard/sys/bc_desktop/my_cluster\", \"-o\", \"/home/vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/ec774786-8da1-45f3-8661-02b9d529d46b/output.log\", \"-A\", \"vagrant\", \"-t\", \"01:00:00\", \"--export\", \"NONE\", \"-N\", \"1\", \"--parsable\"]"
App 7599 output: [2022-04-18 01:28:02 +0000 ] ERROR "ERROR: Errno::ENOENT - No such file or directory - /usr/local/sbatch"
App 7599 output: [2022-04-18 01:28:02 +0000 ]  INFO "execve = [\"git\", \"describe\", \"--always\", \"--tags\"]"
App 7599 output: [2022-04-18 01:28:02 +0000 ]  INFO "method=POST path=/pun/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/session_contexts format=html controller=BatchConnect::SessionContextsController action=create status=200 duration=53.84 view=25.23"

When I try to submit a job. I got below

App 3757 output: [2022-04-18 07:51:14 +0000 ] DEBUG "\e[1m\e[36mCACHE Job Load (0.0ms)\e[0m  \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m  [[\"workflow_id\", 1]]"
App 3757 output: [2022-04-18 07:51:14 +0000 ]  INFO "method=GET path=/pun/sys/myjobs/workflows/1 format=json controller=WorkflowsController action=show status=200 duration=24.64 view=6.73 db=1.90"
App 2833 output: [2022-04-18 07:51:14 +0000 ]  INFO "method=GET path=/pun/sys/dashboard/files/api/v1/fs/home/vagrant/ondemand/data/sys/myjobs/projects/default/1/main_job.sh format=html controller=FilesController action=fs status=200 duration=6.28 view=0.00"
App 3757 output: [2022-04-18 07:51:16 +0000 ] DEBUG "\e[1m\e[36mWorkflow Load (0.1ms)\e[0m  \e[1m\e[34mSELECT  \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m  [[\"id\", 1], [\"LIMIT\", 1]]"
App 3757 output: [2022-04-18 07:51:16 +0000 ] DEBUG "\e[1m\e[36mJob Load (0.1ms)\e[0m  \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m  [[\"workflow_id\", 1]]"
App 3757 output: [2022-04-18 07:51:16 +0000 ]  INFO "execve = [{}, \"/usr/local/sbatch\", \"--export\", \"NONE\", \"--parsable\"]"
App 3757 output: [2022-04-18 07:51:16 +0000 ]  INFO "method=PUT path=/pun/sys/myjobs/workflows/1/submit format=html controller=WorkflowsController action=submit status=500 error='Errno::ENOENT: No such file or directory - /usr/local/sbatch' duration=9.10 view=0.00 db=0.17"
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL ""
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL "Errno::ENOENT (No such file or directory - /usr/local/sbatch):"
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL ""
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL "config/initializers/open3_extensions.rb:4:in `capture3'\napp/models/resource_mgr_adapter.rb:36:in `qsub'\napp/models/workflow.rb:291:in `each'\napp/models/workflow.rb:291:in `submit_jobs'\napp/models/workflow.rb:259:in `submit'\napp/controllers/workflows_controller.rb:181:in `block in submit'\napp/controllers/workflows_controller.rb:176:in `submit'"

Hi.

Thanks for the post. The errors in both log entries are stating that /usr/local/sbatch is not found.

sbatch is required to submit a job if you are using slurm. Can you please confirm that sbatch is in the proper path?

Thanks,
-gerald

Would this be on the OnDemand server? or does it need to be on every node?

I configured slurm everywhere. now I’m able to submit jobs, but I was expecting to see the mate desktop. it’s opening the file manager instead. What did I do wrong? I didn’t change anything else from the config above beside switching from xfce to mate.

Hi.

Please try changing the desktop value from “xfce” to “mate”. If you have mate installed, it should pick it up by doing that.

Thanks,
-gerald

as I stated in my last post. I did change the desktop value to mate. I’m still getting the issue. It seems like the issue is the absence of network file share.

Sorry. I missed that sentence in your post. My mistake.

Let me discuss with my colleagues, and either I or one of the others will respond.

Thanks,
-gerald

Hi.

Can you please post a screenshot as well as add the logs that are being generated?

Thanks,
-gerald

Here is what I see when I click session id link.
I’m seeing a bunch of those form $USER/error.log

App 12430 output: [2022-04-20 17:03:43 +0000 ]  INFO "execve = [{}, \"/usr/local/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"3\"]"
App 12430 output: [2022-04-20 17:03:43 +0000 ]  INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=25.04 view=11.67"
App 12430 output: [2022-04-20 17:03:47 +0000 ]  INFO "execve = [{}, \"/usr/local/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"3\"]"
App 12430 output: [2022-04-20 17:03:47 +0000 ]  INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=28.79 view=13.82"
App 12430 output: [2022-04-20 17:03:48 +0000 ]  INFO "execve = [{}, \"/usr/local/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"3\"]"
App 12430 output: [2022-04-20 17:03:48 +0000 ]  INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=28.96 view=14.82"

from $USER/access.log

unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/assets/OpenOnDemand_powered_by_RGB-cb3aad5ff5350c7994f25
0fb334ddcc72e343233ce99eb71fda93beddd76a847.svg HTTP/1.1" 200 5610 "https://ondemand-example/pun/sys/dashboard/files/fs/home/
vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/81a51485-3da1-4732-8894-a3ca10e8467a" "Moz
illa/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/assets/font-awesome/fa-solid-900-787d76ad6deab67ccf8bac1
b584260205e114f508fc5542b612e3f75d49a34e4.woff2 HTTP/1.1" 200 76084 "https://ondemand-example/pun/sys/dashboard/assets/applic
ation-85fa591affc54648b8d4e8982e3ada6e8ed9f8e9711e33487208df71c8bd7d00.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.
36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/files/fs//home/vagrant/ondemand/data/sys/dashboard/batch
_connect/sys/bc_desktop/my_cluster/output/81a51485-3da1-4732-8894-a3ca10e8467a HTTP/1.1" 200 9497 "https://ondemand-example/p
un/sys/dashboard/files/fs/home/vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/81a51485-3d
a1-4732-8894-a3ca10e8467a" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safa
ri/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/assets/font-awesome/fa-regular-400-86e496b536b26ba60cdb6
8df9dd9143b19a63b65e30e373b0321833aab1295d6.woff2 HTTP/1.1" 200 13576 "https://ondemand-example/pun/sys/dashboard/assets/appl
ication-85fa591affc54648b8d4e8982e3ada6e8ed9f8e9711e33487208df71c8bd7d00.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53
7.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:36 +0000] "GET /pun/sys/dashboard/batch_connect/sessions.js?_=1650474125531 HTTP/1.1" 200 
10471 "https://ondemand-example/pun/sys/dashboard/batch_connect/sessions" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:37 +0000] "GET /pun/sys/dashboard/batch_connect/sessions.js?_=1650473741133 HTTP/1.1" 200 
10471 "https://ondemand-example/pun/sys/dashboard/batch_connect/sessions" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"

this is the page I got redirected to when I click the submission id

if I open in terminal from screenshot above, I got below

Hi.

Thanks for the information.

Can you please take a screenshot of every step that you are doing to launch your interactive desktop?

Thanks,
-gerald

Here you go

submit

job created

open completed job

redirect to file manager. expecting the ubuntu desktop environment on my browser with novnc.

So, it’s completing before you can connect to it. You should see it turn from starting phase to Running and there would then be a connect to <the app> button.

That session id link is the files of the working directory of that particular job. you should be able to look at an output.log file in that folder for the output from your job. job_script_content.sh is the entrypoint which calls the other scripts. Notably script.sh – that’s what you provided in you app’s template/script.sh.erb.

My guess is the script ran and couldn’t load the right modules for the app you’re trying to build, or any number of other things.

In any case, output.log is where you want to look next.

@jeff.ohrstrom It’s not showing the connect to the <app> button. Once I launch the app, it goes to Queued then Completed. I can’t find any output.log. Any location specific? Also, look at this picture from the link below, it seems like it’s looking for network share for vagrant share on the compute node. of course, I do not have that I’m using Pam authentication with mod_auth_pam. https://aws1.discourse-cdn.com/business4/uploads/osc/original/2X/8/8baa937778adbc8a74ffd052ecf55fe37f2f2ea5.png

The location is the file browser image you’ve given in the 12th comment. You see a script.sh and so on. output.log is a sibling to script.sh.

@jeff.ohrstrom . I don’t have that log file script.sh does exist but no output.log. is it missing because I’m running the latest OOD package? is the job expecting a network file system for the user? because it’s trying to find a file that doesn’t exist on the compute node.

Yes. This directory you’ve linked to is expected to be on both the web server and the cluster. We write all these files for the job to execute and the job itself is writing a connection.yml file in the same location to pass information back to the web server.

In sum - yes we expect your home directory to be a storage location that’s available to both the webserver and the compute node.