Input/output error with cluster access and interactive apps

Hi all,

Have not seen this before with OnDemand and wondering if someone can point us in the right direction. One of our users reported this issue this morning while trying to open an interactive app (JupyterLab) and we have seen this as well on our accounts and test account.

Error when trying to launch JupyterLab below. This also happens with our other interactive app, Jupyter Notebook, and is affecting all users.

#<Errno::EIO: Input/output error @ rb_sysopen - /home/testuser/ondemand/data/sys/dashboard/batch_connect/sys/jupyterlab/context.json>

/var/www/ood/apps/sys/dashboard/app/controllers/batch_connect/session_contexts_controller.rb:12:in read' /var/www/ood/apps/sys/dashboard/app/controllers/batch_connect/session_contexts_controller.rb:12:in read’
/var/www/ood/apps/sys/dashboard/app/controllers/batch_connect/session_contexts_controller.rb:12:in new' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal/basic_implicit_render.rb:6:in send_action’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/abstract_controller/base.rb:194:in process_action' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal/rendering.rb:30:in process_action’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/abstract_controller/callbacks.rb:42:in block in process_action' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:132:in run_callbacks’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/abstract_controller/callbacks.rb:41:in process_action' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal/rescue.rb:22:in process_action’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal/instrumentation.rb:34:in block in process_action' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in block in instrument’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/notifications/instrumenter.rb:23:in instrument' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in instrument’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal/instrumentation.rb:32:in process_action' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal/params_wrapper.rb:256:in process_action’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/abstract_controller/base.rb:134:in process' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionview-5.2.4.2/lib/action_view/rendering.rb:32:in process’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal.rb:191:in dispatch' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_controller/metal.rb:252:in dispatch’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:52:in dispatch' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:34:in serve’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:52:in block in serve' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:35:in each’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:35:in serve' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:840:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/tempfile_reaper.rb:15:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/etag.rb:27:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/conditional_get.rb:27:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/head.rb:12:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/http/content_security_policy.rb:18:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/session/abstract/id.rb:266:in context’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/session/abstract/id.rb:260:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/cookies.rb:670:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/callbacks.rb:28:in block in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:98:in run_callbacks’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/callbacks.rb:26:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/debug_exceptions.rb:61:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/show_exceptions.rb:33:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/lograge-0.11.2/lib/lograge/rails_ext/rack/logger.rb:15:in call_app’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/railties-5.2.4.2/lib/rails/rack/logger.rb:26:in block in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:71:in block in tagged’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:28:in tagged' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:71:in tagged’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/railties-5.2.4.2/lib/rails/rack/logger.rb:26:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/remote_ip.rb:81:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/request_store-1.5.0/lib/request_store/middleware.rb:19:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/request_id.rb:27:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/method_override.rb:24:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/runtime.rb:22:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/activesupport-5.2.4.2/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/executor.rb:14:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/static.rb:127:in call' /opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/rack-2.2.2/lib/rack/sendfile.rb:110:in call’
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.11/gems/railties-5.2.4.2/lib/rails/engine.rb:524:in call' /opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/rack/thread_handler_extension.rb:97:in process_request’
/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:149:in accept_and_process_next_request' /opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:110:in main_loop’
/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler.rb:415:in block (3 levels) in start_threads' /opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/utils.rb:113:in block in create_thread_and_abort_on_exception’

Nothing has really changed on the server at all. We did temporarily lose access to the NFS mount a few days ago (where our cluster /home directory is exported from) but it was remounted. I don’t know if that’s what kicked off the issue. Occasionally if I retry the interactive app, the fill-in form will appear, I can appear to start a session then it disappears.

Something else is that when opening the shell access we are occasionally getting:

Load key “/home/testuser/.ssh/id_rsa”: Input/output error
Load key “/home/testuser/.ssh/id_ecdsa”: Input/output error

This is happening on all user accounts as well. If I close the shell access app and retry, it sometimes works immediately with the SSH key, sometimes back to the above error. We have our login node added in /etc/ssh/ssh_known_hosts on the ondemand server.

I notice when I cat these “affected” files on that NFS mount as root on the ondemand server I am sometimes getting input/output errors i.e.:

cat: id_rsa: Input/output error

Sounds like this is pointing to an NFS issue for whatever reason but we aren’t sure where to really start. Really appreciate any suggestions you can send our way.

You do have storage issues. What they are is hard to say, but from googing Input/output error linux and my own experience with it, it’s not fun.

Maybe you have actual bad blocks on your NFS or maybe you’re just having trouble syncing. You should probably find relevant things in dmesg, but it seems like you’ll have to reach out to your hardware folks to see what’s up with that storage device. A restart on the machine may even clear it all up?

@tdockendorf do you have any advice for input/output errors?

No advice other than what’s already been suggested. Checking “dmesg -T” would be where I start and also on the NFS server side to see if maybe there are underlying issues on that end too, if this a Linux NFS server.