Setting up OnDemand with shared cluster home directory

Greetings all. We are evaluating Open OnDemand at the moment, thank you for developing a great product.

My question deals with using a shared home directory on the OnDemand server.

We have shared home directories across our clusters. What I want to achieve is greater ease of use for users, I want them to land in the shared /home directory when logging in or at least using Job Composer.

Playing around with this I have symlinked /home on the ondemand server to the shared home directory. I don’t know if this is a great best practice and after doing this I run into issues launching the job composer:

A problem occurred while initializing your data for this app.
At your own risk you can still open the app or you can just go back to the dashboard
Share this with the developer of your app: The setup production script is supposed to be idempotent and is run each time the user opens the app through the dashboard.
Exception: OodApp::SetupScriptFailed
Per user setup failed for script at /var/www/ood/apps/sys/myjobs/./bin/setup-production for user my.username with output: Exception occurred: database is locked

Another consideration is that each time a new user logs in to the login node of our cluster(s) a public/private key pair is created. Since the filesystem is shared across the cluster each node including the log in node shares the same /home/user/.ssh folder, we can achieve passwordless ssh between the nodes after the very first time the user logs in.

That’s another reason why I’d like to potentially share the home directory, so that users can take advantage of this setup to achieve passwordless ssh to the cluster nodes from the ondemand server.

What’s the recommended way, if any, to do this? Is this achievable? Curious as to how other sites are doing this and if I am on the right track. Thank you for your time.

Hi! and welcome!

I’m not sure I follow you.

If you’ve mounted the shared home directory to that machine, why not just mount it directly to home?

The best practice is to mount the NFS (I assume your shared home directory is some sort of network file system?) on the OOD server the same way it’s mounted across the rest of your site. The open ondemand’s file structure should look just like everywhere else - especially the users $HOME directory.

What you want to achieve is very much in line with how most sites run this.

As to the DB being locked, somehow you have two process’ opening it? Or your previous OOD session hasn’t completely died and still has file descriptors open on that file? Not sure how that can happen. You can try restarting your web server in the help menu at the top right.

thank you for the swift reply. Sorry for the lack of clarity in my original post, I have it mounted directly as /home now.

hmmm… I have tried restarting the web server in the help menu and still having the database issue saying it’s locked. I also restarted the ondemand server just in case and I’m still getting that locked database issue with my own account and with another account as well on a fresh login. I am digging around and haven’t found it yet, perhaps you can point me in the right direction, are there any database logs I can check out to see where the issue might be coming from?

If nothing else I can try setting up from scratch again, I have not gotten very far with the setup. Thank you again.

NP at all! Yea I just didn’t follow your mounting scheme is all.

Your DB lock issue is probably an NFS issue then, especially if you get it immediately with two different logins. Here are the NSFv4 options we use. I’d say local_lock=none is the ticket.

rw,vers=4.0,rsize=65536,wsize=65536 and local_lock=none

There could be logs in /var/log/ondemand-nginx/$USER/[error,access].log but they likely don’t give more information than we already have. You can try lsof 2>/dev/null| grep "production.sqlite3" to see if that comes up with anything strange, specifically the lowest PID. If that PID is not real, or somehow zombied, then that’s our problem. That zombie process has a lock on the DB and can’t release it.

But this seems unlikely given that you’ve restarted the ondemand server itself. I think the NFSv4 options are our best bet.

Hello Jeff and thank you - aha - I think we’ve got it. I wasn’t coming back with anything when running the lsof command and I was also having the same issue with the local_lock=none option. After flipping this back to local /home and seeing that it worked, like you said all signs seemed to be pointing to NFS. This time I edited /etc/fstab and added ‘nolock’ on the end instead of local_lock=none. I logged in and got the error described here: User gets no such table: schema_migrations error in job composer

Where Job Composer failed again and spit out ‘no such table: schema_migrations’

I went into the two users’ folders I was having trouble with earlier and removed the production.sqlite3 file and restarted the web server. and it worked! I will keep seeing if something breaks, but for now, I think this issue is resolved. Cheers

1 Like