I am running some small tests, deploying OnDemand to an EC2 instance. The installation went well and I have a localhost cluster that I can log in to via the “Clusters” dropdown as a user.
Cluster specification (permissions are -rw-r--r--. 1 root root 196 Jan 16 18:32 linux_host.yml)
When I attempt to launch the app, I am erroring with This app requires clusters that do not exist or you do not have access to.. I can confirm that the ‘jlaura’ user can ls and cat down the /var/www/ood/apps/dev/jlaura/ path. I also ran yamllint on all of the files checking for syntax issues. Finally, I can access the cluster via the “Clusters” dropdown. Any other places that I should be looking at?
I have to admit the linuxhost adapter is super difficult to debug. I see you already have the debug flag set. When this is set you’ll get 2 shell scripts (in your HOME? not 100%) you can use to replicate.
You may be hitting this spot. The other spot I see start_with? in that file has a protection against nil, so I doubt that’s the error.
You seem to have some issue parsing the script.sh.erb?
This may be your issue passing <% script %> here to the wrapper. When you navigate to the jobs’ directory (in ~/ondemand/data/sys/dashboard/batch_connect/....), what does script.sh.erb look like? Is it empty?
Can you tell me what you’re trying to do here, either by allowing the user to specify script (I don’t see it in the form) or you have the variable script defined somewhere?
@jeff.ohrstrom Thanks for the assist. I was able to get the linux host working late yesterday. I can get the script to fire. I had to update to the following:
Ultimately, we are trying to have a consistent OnDemand UI where a user can spin up an ephemeral AWS cluster (using HTCondor or parallel cluster) and get a shell. The first step here was testing if we could use Open OnDemand and be able to access the AWS CLI. That is a success and means that we should be able to provision the ephemeral resources. The next step is learn how we might (or might not) be able to get a remote shell on the ephemeral head node in the browser / OnDemand ecosystem. Any thoughts on that?
Sounds like you want a login only cluster? I.e., you can’t schedule jobs on it (no need for a batch connect application), but it’ll appear in the Clusters menu to shell into.
Yes, I think so. Is there a way to dynamically populate that list for a specific user, without having to create the cluster.yml and then restart the service?
As an example, I tried modifying the template/script.sh to the following (this is super janky):
#!/bin/bash
# Define the remote machine and user
REMOTE_MACHINE="localhost"
REMOTE_USER="${USER}" # Use the current user
# Start the SSH session and execute bash
echo "Connecting to ${REMOTE_MACHINE}..."
ssh -tt ${REMOTE_USER}@${REMOTE_MACHINE} <<EOF
echo "You are now on ${REMOTE_MACHINE}."
exec bash
EOF
Which keeps the app running and I get the nice “Connect to AppName” button in the OnDemand UI. The URL for that button 404s. Is there a way to set that URL in the submission script?
I think this is covered in the app setup here and the follow on page around setting up the reverse proxy.
Dynamically create? Maybe? Avoid the cluster.yml file, no.
Maybe pun_pre_hook_root_cmd? It’s a hook you can run as root before the PUN starts up. But if you supply files to cluster.d directory, they’re available to all users. So you’d have to chown & chmod so it’s only visible to that user.
Also you can bounce the PUN through a command. So if you have cloud-init type stuff going on or however this is created you can use the nginx_stage command to bounce a users PUN (i.e., without having them bounce their own PUN).
The way they handle configuring OOD after changes to the cluster infra in the AWS sample project for OOD on AWS ( GitHub - aws-samples/open-on-demand-on-aws ) is by attaching an EventBridge rule to the cluster creation that runs an SSM Run Command shell script on the OOD instance to update the configuration files and restart OOD.
Is it a hard requirement that the cluster be fully dynamic and brought up via a button in the OOD interface?
Soft yes (soliciting suggestions). We are exploring options where we do not need to give the users access to the AWS web console / CLI. Right now, we have a service catalog product, but that requires that users have access to the console to launch. We use OOD for things like Jupyter notebooks and are exploring if we can also use it to provision the service catalog product and get the user in browser shell access.
You can stop (not terminate) PCluster head nodes, and as long as no running jobs complete while the head node is stopped, it doesn’t impact PCluster at all. You only pay for the block storage for the image while they’re not running, and then the button just needs to send the start instance command instead of trying to bring a full cluster and all the supporting infrastructure up.
I think we discovered the stop-but-not-terminate option accidentally one day, and we built a prototype of this functionality, but we haven’t had reason to use it just yet.