OnDemand support for LSF 10.1 and Enhanced lsb query

Hi,

i’m using the deployment v1.8 ( previously tried v2.0) of ondemand and got communication issues when adding this config to the lsf.conf
LSB_QUERY_PORT=6883
LSB_QUERY_ENH=y
this lsf.conf entries allow using parallel processes of mbatch and process jobs when the count is high (+2000).
Replicated the issue by adding those lines in a development deployment which was working fine, same os, scheduler and ondemand version, then failed again. then, worked again, when those lines were removed and lsf was restarted.

Thanks,

Roberto P.

Hi @periv4,

Welcome to the board and thanks for your post.

I will work to see what I can find on this for you.

Thanks,
-gerald

Also Roberto,

To make sure I am understanding the scope of the issue. This is happening on v2.0 and not v1.8, or is this happening on both?

Thanks,
-gerald

Hi Roberto.

After consulting with my colleagues, we have the following information for you.

  1. We do not use mbatch for our LSF scheduler. We believe that may be the root of the problem you are experiencing.
  2. What are you trying to accomplish by setting the variables? LSB_QUERY_PORT=6883 & LSB_QUERY_ENH=y
  3. You can check in /var/log/ondemand-nginx/<YOUR_USERNAME>. In that folder are some logs that might be informative. Feel free to paste the error log information into this discourse.

I hope this helps.

Thanks,
-gerald

Hi Gerald,

the version that i’m testing in development is 2.0, it communicates with a master node with lsf 10.1 for job processing. When using the variables LSB_QUERY, the jobs and desktops from the ondemand server fails, those variable are used for handling large number of jobs in a lsf environment, above 1000+ jobs. once, those variables are taken out it works normally, i can deploy desktops and applications. About the mbatchd, that is a daemon used for lsf, for the master batch job manager. What i think is taht normally the mbatchd uses port 6881 for communication between the master and the nodes, and when the LSB_QUERY_ENH is activated it uses in parallel 6881 and 6883 in this case.

Right now, the logs doesn’t show up to much information, except a CSRF error, when i reverted back to normal from LSF:
App 9040 output: [2022-01-26 13:36:07 -0500 ] INFO “execve = [“git”, “describe”, “–always”, “–tags”]”
App 9040 output: [2022-01-26 13:36:07 -0500 ] INFO “method=GET path=/pun/sys/dashboard/batch_connect/sys/bc_desktop/devhpc/session_contexts/new format=html controller=BatchConnect::SessionContextsController action=new status=200 duration=25.57 view=14.99”
App 9040 output: [2022-01-26 13:36:15 -0500 ] WARN “Can’t verify CSRF token authenticity.”
App 9040 output: [2022-01-26 13:36:15 -0500 ] INFO “method=POST path=/pun/sys/dashboard/batch_connect/sys/bc_desktop/devhpc/session_contexts format=html controller=BatchConnect::SessionContextsController action=create status=422 error=‘ActionController::InvalidAuthenticityToken: ActionController::InvalidAuthenticityToken’ duration=1.13 view=0.00”
App 9040 output: [2022-01-26 13:36:15 -0500 ] FATAL “”
App 9040 output: [2022-01-26 13:36:15 -0500 ] FATAL “ActionController::InvalidAuthenticityToken (ActionController::InvalidAuthenticityToken):”
App 9040 output: [2022-01-26 13:36:15 -0500 ] FATAL “”

followed this HTTP 422 Error / InvalidAuthenticityToken · Issue #1193 · OSC/ondemand · GitHub

but didn’t work

Roberto P.

also, not sure if it was a good idea, but i was able to pass the CSRF error by doing this:
comment out at around line 242:
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.20/gems/actionpack-5.2.6/lib/action_controller/metal/request_forgery_protection.rb
def verify_authenticity_token # :doc:
mark_for_same_origin_verification!

    #if !verified_request?
    #  if logger && log_warning_on_csrf_failure
    #    if valid_request_origin?
    #      logger.warn "Can't verify CSRF token authenticity."
    #    else
    #      logger.warn "HTTP Origin header (#{request.origin}) didn't match request.base_url (#{request.base_url})"
    #    end
    #  end
    #  handle_unverified_request
    #end
  end

This is not production, is a development system for POC on a enclosed network

Roberto P.

I was able to fix the LSB_QUERY_ENH issue, found out that the regular port for lsf calls is 6881 defined in lsf.conf: “LSB_MBD_PORT=6881” and the “LSB_QUERY_ENH=y” was using 6883 which confused the application. After doing some network tracing and troubleshooting, the LSB_QUERY_PORT was set to 6881 instead of 6883. Both LSB_QUERY_PORT and LSB_QUERY_ENH are needed on lsf.conf for parallel job processing.

After that, the ondemand application started to work fine.

Roberto P.