VNC connection hangs

In order to make OOD server auto deployed. We have configured two servers for ondemand. (A)One is manually configured, (B)another one is configured with salt autoconfiguration tool.

Server B works on other applications but bc_matlab and bc_desktop. Since both servers are pointing to the same scheduler, and can see the submitted jobs from both servers. The same bc_desktop job works on server A but not on server B. The VNC session is running on the compute node, but on server B, it just hangs with a connecting message.

Can anyone show a good way to debug this issue?

Hi Cherry. Thanks for your post and your question. This is definitely a tough one. With that being the case, I’m going to ask what may be perceived as some basic questions, so please bare with me.

  1. Have you compared the 2 installs to ensure server B is setup properly?
  2. Have you checked the logs for errors on server B.

If you can please send us the logs for server B for when you are trying to launch the failing apps, that would be great.

We’ll work through this.
Thanks again,
-gerald

Thanks Gerald,

  1. Server B is built with the exact steps from manual built, only difference is the OOD version is 2.0.18 (Server A is 2.0.23).
  2. The error shows when I click the Launch interactive desktop button:
    App 17497 output: [2022-03-30 13:34:24 -0400 ] INFO “method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=150.93 view=26.47”
    App 17497 output: [2022-03-30 13:34:26 -0400 ] FATAL “”
    App 17497 output: [2022-03-30 13:34:26 -0400 ] FATAL “ActionController::RoutingError (No route matches [GET] “/noVNC-1.1.0/vendor/pako/lib/utils/common.js”):”
    App 17497 output: [2022-03-30 13:34:26 -0400 ] FATAL “”
    App 17497 output: [2022-03-30 13:34:26 -0400 ] FATAL “actionpack (5.2.6) lib/action_dispatch/middleware/debug_exceptions.rb:65:in call'\nactionpack (5.2.6) lib/action_dispatch/middleware/show_exceptions.rb:33:in call’\nlograge (0.11.2) lib/lograge/rails_ext/rack/logger.rb:15:in call_app'\nrailties (5.2.6) lib/rails/rack/logger.rb:26:in block in call’\nactivesupport (5.2.6) lib/active_support/tagged_logging.rb:71:in block in tagged'\nactivesupport (5.2.6) lib/active_support/tagged_logging.rb:28:in tagged’\nactivesupport (5.2.6) lib/active_support/tagged_logging.rb:71:in tagged'\nrailties (5.2.6) lib/rails/rack/logger.rb:26:in call’\nactionpack (5.2.6) lib/action_dispatch/middleware/remote_ip.rb:81:in call'\nrequest_store (1.5.0) lib/request_store/middleware.rb:19:in call’\nactionpack (5.2.6) lib/action_dispatch/middleware/request_id.rb:27:in call'\nrack (2.2.3) lib/rack/method_override.rb:24:in call’\nrack (2.2.3) lib/rack/runtime.rb:22:in call'\nactivesupport (5.2.6) lib/active_support/cache/strategy/local_cache_middleware.rb:29:in call’\nactionpack (5.2.6) lib/action_dispatch/middleware/executor.rb:14:in call'\nrack (2.2.3) lib/rack/sendfile.rb:110:in call’\nrailties (5.2.6) lib/rails/engine.rb:524:in call'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/rack/thread_handler_extension.rb:107:in process_request’\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:149:in accept_and_process_next_request'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:110:in main_loop’\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler.rb:416:in block (3 levels) in start_threads'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/utils.rb:113:in block in create_thread_and_abort_on_exception’”
    App 17497 output: [2022-03-30 13:34:27 -0400 ] INFO “method=GET path=/pun/sys/dashboard/404 format=/ controller=ErrorsController action=not_found status=404 duration=117.16 view=8.91”

Let me know if you need other information.

Cherry

Hi Cherry.

Thanks for the response. To get to a point where it’s apples to apples, please upgrade Server B to 2.0.23

Thanks,
-gerald

It seems like you’re upgrading so you probably want to fix the one first.

If I look at my ondemand-nginx/$USER/access.log I have the prefix /pun/sys/dashboard to these calls.

unix: - - [30/Mar/2022:14:10:41 -0400] "GET /pun/sys/dashboard/noVNC-1.1.0/vendor/pako/lib/utils/common.js HTTP/1.1" 200 1062 "https://ondemand-test.osc.edu/pun/sys/dashboard/noVNC-1.1.0/vendor/pako/lib/zlib/inflate.js" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0" "131.187.45.7"

Are you sure they both have the same settings? Specifically around any public or uri related configurations.

After upgrading server B to 2.0.23, dashboard failed to start with error:
Could not spawn process for application /var/www/ood/apps/sys/dashboard: The application encountered the following error: undefined method `kubernetes?’ for #OodCore::Cluster:0x000000000323ebc0 (NoMethodError)
Error ID: 41a390e2
Error details saved to: /tmp/passenger-error-iOuqMi.html

Are you installing through the rpm? Can you remove the old ondemand-gem if so.

I think that I found the root cause, which is I have done some customization in /var/www/ood/apps/sys/dashboard/config/initializer by adding few files, looks like that dashborad doesn’t like it.

Putting it under /etc/ood/config/apps/dashboard/initializers seems working.

And the upgrades automatically resolves bc_desktop issue.

Could you let me know what is best practice on putting things under /var/www/ood/apps vs /etc/ood/config/apps?

Thanks for your time.

1 Like

Always put configurations in /etc/ood/config because they will keep through RPM updates. Anything in /var/www/ood/apps will be overwritten during updates.