Rockawear
(Rockawear)
April 16, 2022, 5:18am
1
Hi,
I’m a new OOD user and I’ve been trying to use this Ansible role to configure a test cluster of two nodes. OOD server is running on one ubuntu20.04, and the other will be the compute node(running Ubuntu). I’m able to run the ansible role and login does work. I have no clue on how to actually add clusters with ansible and run VDI desktop. I did use the .apps.yml
example and changed the host to point it to the compute node. What am I missing? Do I need to do anything on the compute node? I’m assuming tasks/apps.yml
will configure the cluster if given the right vars. Please help.
OOD v2.0.9
cat clusters.d/my_cluster.yml
v2:
metadata:
title: "my_cluster"
login:
host: "<compute.node>"
job:
adapter: slurm
bin: /usr/local
batch_connect:
basic:
script_wrapper: |
module purge
%s
vnc:
script_wrapper: |
module purge
export PATH="/opt/TurboVNC/bin:$PATH"
export WEBSOCKIFY_CMD="/usr/local/bin/websockify"
%s
cat apps/bc_desktop/my_cluster.yml
title: "remote desktop"
cluster: my_cluster
submit: "submit/my_submit.yml.erb"
attributes:
bc_queue: null
desktop: "xfce"
cat apps/bc_desktop/submit/submit.yml.erb
{"script": {"native": ["<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>", "1"]}}
Rockawear
(Rockawear)
April 18, 2022, 1:31am
2
I also want to make this clear, it’s not really an issue with the ansible role. Here is the error that I’m seeing for my specific use case. I’ve changed desktop: "xfce"
to default(mate).
App 7599 output: [2022-04-18 01:28:02 +0000 ] INFO "execve = [{}, \"/usr/local/sbatch\", \"-D\", \"/home/vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/ec774786-8da1-45f3-8661-02b9d529d46b\", \"-J\", \"sys/dashboard/sys/bc_desktop/my_cluster\", \"-o\", \"/home/vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/ec774786-8da1-45f3-8661-02b9d529d46b/output.log\", \"-A\", \"vagrant\", \"-t\", \"01:00:00\", \"--export\", \"NONE\", \"-N\", \"1\", \"--parsable\"]"
App 7599 output: [2022-04-18 01:28:02 +0000 ] ERROR "ERROR: Errno::ENOENT - No such file or directory - /usr/local/sbatch"
App 7599 output: [2022-04-18 01:28:02 +0000 ] INFO "execve = [\"git\", \"describe\", \"--always\", \"--tags\"]"
App 7599 output: [2022-04-18 01:28:02 +0000 ] INFO "method=POST path=/pun/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/session_contexts format=html controller=BatchConnect::SessionContextsController action=create status=200 duration=53.84 view=25.23"
When I try to submit a job. I got below
App 3757 output: [2022-04-18 07:51:14 +0000 ] DEBUG "\e[1m\e[36mCACHE Job Load (0.0ms)\e[0m \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m [[\"workflow_id\", 1]]"
App 3757 output: [2022-04-18 07:51:14 +0000 ] INFO "method=GET path=/pun/sys/myjobs/workflows/1 format=json controller=WorkflowsController action=show status=200 duration=24.64 view=6.73 db=1.90"
App 2833 output: [2022-04-18 07:51:14 +0000 ] INFO "method=GET path=/pun/sys/dashboard/files/api/v1/fs/home/vagrant/ondemand/data/sys/myjobs/projects/default/1/main_job.sh format=html controller=FilesController action=fs status=200 duration=6.28 view=0.00"
App 3757 output: [2022-04-18 07:51:16 +0000 ] DEBUG "\e[1m\e[36mWorkflow Load (0.1ms)\e[0m \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m [[\"id\", 1], [\"LIMIT\", 1]]"
App 3757 output: [2022-04-18 07:51:16 +0000 ] DEBUG "\e[1m\e[36mJob Load (0.1ms)\e[0m \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m [[\"workflow_id\", 1]]"
App 3757 output: [2022-04-18 07:51:16 +0000 ] INFO "execve = [{}, \"/usr/local/sbatch\", \"--export\", \"NONE\", \"--parsable\"]"
App 3757 output: [2022-04-18 07:51:16 +0000 ] INFO "method=PUT path=/pun/sys/myjobs/workflows/1/submit format=html controller=WorkflowsController action=submit status=500 error='Errno::ENOENT: No such file or directory - /usr/local/sbatch' duration=9.10 view=0.00 db=0.17"
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL ""
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL "Errno::ENOENT (No such file or directory - /usr/local/sbatch):"
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL ""
App 3757 output: [2022-04-18 07:51:16 +0000 ] FATAL "config/initializers/open3_extensions.rb:4:in `capture3'\napp/models/resource_mgr_adapter.rb:36:in `qsub'\napp/models/workflow.rb:291:in `each'\napp/models/workflow.rb:291:in `submit_jobs'\napp/models/workflow.rb:259:in `submit'\napp/controllers/workflows_controller.rb:181:in `block in submit'\napp/controllers/workflows_controller.rb:176:in `submit'"
gbyrket
(Gerald Byrket)
April 19, 2022, 1:38pm
3
Hi.
Thanks for the post. The errors in both log entries are stating that /usr/local/sbatch is not found.
sbatch is required to submit a job if you are using slurm. Can you please confirm that sbatch is in the proper path?
Thanks,
-gerald
Rockawear
(Rockawear)
April 19, 2022, 1:55pm
4
Would this be on the OnDemand server? or does it need to be on every node?
Rockawear
(Rockawear)
April 20, 2022, 4:44am
5
I configured slurm everywhere. now I’m able to submit jobs, but I was expecting to see the mate desktop. it’s opening the file manager instead. What did I do wrong? I didn’t change anything else from the config above beside switching from xfce to mate.
gbyrket
(Gerald Byrket)
April 20, 2022, 1:09pm
6
Hi.
Please try changing the desktop value from “xfce” to “mate”. If you have mate installed, it should pick it up by doing that.
Thanks,
-gerald
Rockawear
(Rockawear)
April 20, 2022, 1:56pm
7
as I stated in my last post. I did change the desktop value to mate. I’m still getting the issue. It seems like the issue is the absence of network file share.
gbyrket
(Gerald Byrket)
April 20, 2022, 2:45pm
8
Sorry. I missed that sentence in your post. My mistake.
Let me discuss with my colleagues, and either I or one of the others will respond.
Thanks,
-gerald
gbyrket
(Gerald Byrket)
April 20, 2022, 4:02pm
9
Hi.
Can you please post a screenshot as well as add the logs that are being generated?
Thanks,
-gerald
Rockawear
(Rockawear)
April 20, 2022, 5:06pm
10
Here is what I see when I click session id link.
I’m seeing a bunch of those form $USER/error.log
App 12430 output: [2022-04-20 17:03:43 +0000 ] INFO "execve = [{}, \"/usr/local/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"3\"]"
App 12430 output: [2022-04-20 17:03:43 +0000 ] INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=25.04 view=11.67"
App 12430 output: [2022-04-20 17:03:47 +0000 ] INFO "execve = [{}, \"/usr/local/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"3\"]"
App 12430 output: [2022-04-20 17:03:47 +0000 ] INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=28.79 view=13.82"
App 12430 output: [2022-04-20 17:03:48 +0000 ] INFO "execve = [{}, \"/usr/local/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"3\"]"
App 12430 output: [2022-04-20 17:03:48 +0000 ] INFO "method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=28.96 view=14.82"
from $USER/access.log
unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/assets/OpenOnDemand_powered_by_RGB-cb3aad5ff5350c7994f25
0fb334ddcc72e343233ce99eb71fda93beddd76a847.svg HTTP/1.1" 200 5610 "https://ondemand-example/pun/sys/dashboard/files/fs/home/
vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/81a51485-3da1-4732-8894-a3ca10e8467a" "Moz
illa/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/assets/font-awesome/fa-solid-900-787d76ad6deab67ccf8bac1
b584260205e114f508fc5542b612e3f75d49a34e4.woff2 HTTP/1.1" 200 76084 "https://ondemand-example/pun/sys/dashboard/assets/applic
ation-85fa591affc54648b8d4e8982e3ada6e8ed9f8e9711e33487208df71c8bd7d00.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.
36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/files/fs//home/vagrant/ondemand/data/sys/dashboard/batch
_connect/sys/bc_desktop/my_cluster/output/81a51485-3da1-4732-8894-a3ca10e8467a HTTP/1.1" 200 9497 "https://ondemand-example/p
un/sys/dashboard/files/fs/home/vagrant/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/my_cluster/output/81a51485-3d
a1-4732-8894-a3ca10e8467a" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safa
ri/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:33 +0000] "GET /pun/sys/dashboard/assets/font-awesome/fa-regular-400-86e496b536b26ba60cdb6
8df9dd9143b19a63b65e30e373b0321833aab1295d6.woff2 HTTP/1.1" 200 13576 "https://ondemand-example/pun/sys/dashboard/assets/appl
ication-85fa591affc54648b8d4e8982e3ada6e8ed9f8e9711e33487208df71c8bd7d00.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53
7.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:36 +0000] "GET /pun/sys/dashboard/batch_connect/sessions.js?_=1650474125531 HTTP/1.1" 200
10471 "https://ondemand-example/pun/sys/dashboard/batch_connect/sessions" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
unix: - vagrant [20/Apr/2022:17:02:37 +0000] "GET /pun/sys/dashboard/batch_connect/sessions.js?_=1650473741133 HTTP/1.1" 200
10471 "https://ondemand-example/pun/sys/dashboard/batch_connect/sessions" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" "192.168.58.1"
this is the page I got redirected to when I click the submission id
if I open in terminal from screenshot above, I got below
gbyrket
(Gerald Byrket)
April 20, 2022, 5:41pm
11
Hi.
Thanks for the information.
Can you please take a screenshot of every step that you are doing to launch your interactive desktop?
Thanks,
-gerald
Rockawear
(Rockawear)
April 20, 2022, 6:28pm
12
Here you go
submit
job created
open completed job
redirect to file manager. expecting the ubuntu desktop environment on my browser with novnc.
So, it’s completing before you can connect to it. You should see it turn from starting phase to Running and there would then be a connect to <the app>
button.
That session id link is the files of the working directory of that particular job. you should be able to look at an output.log
file in that folder for the output from your job. job_script_content.sh
is the entrypoint which calls the other scripts. Notably script.sh
– that’s what you provided in you app’s template/script.sh.erb
.
My guess is the script ran and couldn’t load the right modules for the app you’re trying to build, or any number of other things.
In any case, output.log
is where you want to look next.
Rockawear
(Rockawear)
April 25, 2022, 5:26am
14
@jeff.ohrstrom It’s not showing the connect to the <app>
button. Once I launch the app, it goes to Queued
then Completed
. I can’t find any output.log. Any location specific? Also, look at this picture from the link below, it seems like it’s looking for network share for vagrant
share on the compute node. of course, I do not have that I’m using Pam authentication with mod_auth_pam. https://us1.discourse-cdn.com/flex015/uploads/osc/original/2X/8/8baa937778adbc8a74ffd052ecf55fe37f2f2ea5.png
The location is the file browser image you’ve given in the 12th comment. You see a script.sh
and so on. output.log
is a sibling to script.sh
.
Rockawear
(Rockawear)
April 25, 2022, 7:57pm
16
@jeff.ohrstrom . I don’t have that log file script.sh
does exist but no output.log
. is it missing because I’m running the latest OOD package? is the job expecting a network file system for the user? because it’s trying to find a file that doesn’t exist on the compute node.
Yes. This directory you’ve linked to is expected to be on both the web server and the cluster. We write all these files for the job to execute and the job itself is writing a connection.yml
file in the same location to pass information back to the web server.
In sum - yes we expect your home directory to be a storage location that’s available to both the webserver and the compute node.
system
(system)
Closed
October 22, 2022, 8:30pm
18
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.