Active Jobs with SSGE 8.1.9 : Request for jobs failed due to body parsing error

Hi,

I observe something like a bug with the “Active Jobs” view which returns an error message and seems partial. same thing either ood 1.6.20 and 1.7.6 versions.

I remember that our POC with a Slurm configuration allowed to kill jobs. tell if i’m wrong. if not, can you help me ?

thanks

jean-marie

On the same “Active Jobs” subject, how to add a “nbcores” column ?
i have seen than you can add a filter.rb in t/he “/etc/ood/config/apps/activejobs/initializers” directory, but i don’t know how to do the filter.

thanks

jean-marie

Hello,

Could you tell me where are located the logs of OOD ?

for the moment, i have identified those :

  • /var/log/ondemand-nginx/error.log

  • /var/log/ondemand-nginx/user/error.log

  • /var/log/ood/…error.log

  • /var/www/ood/apps/sys/dashboard/log/production.log with nothing inside

It will helpfull to know where to look for in case of trouble

thanks for your help.

jean-marie

Hey sorry for the delayed reply. Yes those are the only error logs we have and specifically you’d see this being thrown in /var/log/ondemand-nginx/$USER/error.log, you’re error log.

It seems you’re having trouble parsing the output. To be clear, this is SGE that’s not working right?

If it’s SGE that’s is not working, what’s the output of qstat -r -xml -u jms on the ood machine? Is it valid XML all the way through? Could you copy it here (obfuscating whatever you like), maybe we can replicate with it and see if there’s something wrong with our parser.

It would seem the parser is getting stuck somewhere. You’re failing right here though what the actual stack could be, who knows.

I went digging for that string “Request for jobs failed due to body parsing error” and could not find it anywhere, in our source code, SGE (what ships with debian) or libdrmaa. Do you have libdrmaa by the way? I believe we think of it as nice but not necessary. Maybe it is more necessary than we think?

If you put a puts e.backtrace right before line 108, before the raise JobAdapterError, e.message the stack would show up in your ondemand-nginx $USER logs. If this is a modifyable system, we may have to resort to that. I would not suggest it, but if it’s a sort of test/throwaway/I can wipe and reinstall later, then maybe that’s an avenue of investigation for us.

in 1.7.6 it’s
/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.6/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge.rb

In 1.6 it’s
/var/www/ood/apps/sys/activejobs/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge.rb

Hi, @jeff.ohrstrom

don’t worry with the delay. it’s nice to have your questions and hypothesis in return.

to sum up :

  • i have a OOD v1.7.6 config on a test cluster made on CentOS 7.7, Son of SGE 8.1.9 + pam auth
  • in parallel, i’m preparing on our real HPC ( CentOS 7.7, Son of SGE 8.1.6 + pam auth) the OOD 1.7.6 version (i hope you deliver soon the final 1.7 version)

on both, i have the “OOD_BC_SSH_TO_COMPUTE_NODE=0” set and the “fix_sge_procs.rb” inside the dashboard initializers activated.

so, first the "Request for jobs failed due to body parsing error” is closed. it was due to the test of the “linux_host” feature. so, removing the cluster.d yml file stopped this message.

for the second point : output of “ActiveJobs” not complete, no progress for the moment.

here is the view of our ssge_8.1.6 config

here is the view of our ssge_8.1.9 config

it’s the same thing

concerning the answer of qstat -r -xml -u $USER

here is the 8.1.6 answer :

i<?xml version='1.0'?>

<job_info xmlns:xsd=“http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat.xsd”>
<queue_info>
<job_list state=“running”>
<JB_job_number>261077</JB_job_number>
<JAT_prio>1.54559</JAT_prio>
<JB_name>BASIC</JB_name>
<JB_owner>jms</JB_owner>
r
<JAT_start_time>2020-01-29T15:48:28</JAT_start_time>
<queue_name>int.q@node-036.cluster.org</queue_name>
1
<full_job_name>BASIC</full_job_name>
<hard_request name=“h_rt” resource_contribution=“0.000000”>2592000</hard_request>
<hard_req_queue>int.q</hard_req_queue>
NONE
</job_list>
<job_list state=“running”>
<JB_job_number>261767</JB_job_number>
<JAT_prio>1.54601</JAT_prio>
<JB_name>FLUENT</JB_name>
<JB_owner>jms</JB_owner>
r
<JAT_start_time>2020-02-06T17:38:13</JAT_start_time>
<queue_name>all.q@node-026.cluster.org</queue_name>
32
<full_job_name>FLUENT</full_job_name>
<requested_pe name=“openmpi_exclusif_32”>32</requested_pe>
<granted_pe name=“openmpi_exclusif_32”>32</granted_pe>
<hard_request name=“lic_flue_acfd_solver” resource_contribution=“0.000000”>1</hard_request>
<hard_request name=“lic_flue_para_max” resource_contribution=“0.000000”>32</hard_request>
<hard_request name=“mem_free” resource_contribution=“0.000000”>1G</hard_request>
<hard_request name=“mem_dispo” resource_contribution=“0.000000”>1</hard_request>
<hard_request name=“swap_used” resource_contribution=“0.000000”>1G</hard_request>
<hard_request name=“vnode” resource_contribution=“0.000000”>0</hard_request>
<hard_request name=“h_rt” resource_contribution=“0.000000”>39600</hard_request>
<hard_request name=“short_node” resource_contribution=“0.000000”>0</hard_request>
<hard_request name=“urgent” resource_contribution=“0.000000”>0</hard_request>
<hard_request name=“frontal” resource_contribution=“0.000000”>0</hard_request>
<soft_request name=“highspeeddisk”>0</soft_request>
<soft_request name=“memoire”>63</soft_request>
<soft_request name=“nv_type”>K2200|K2000</soft_request>
<hard_req_queue>all.q@@DELL_32_7</hard_req_queue>
NONE
</job_list>
</queue_info>
<job_info>
</job_info>
</job_info>

this 8.1.9 answer

<?xml version='1.0'?>

<job_info xmlns:xsd=“http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat.xsd”>
<queue_info>
<job_list state=“running”>
<JB_job_number>147</JB_job_number>
<JAT_prio>0.50500</JAT_prio>
<JB_name>BASIC</JB_name>
<JB_owner>jms</JB_owner>
r
<JAT_start_time>2020-02-07T17:16:26</JAT_start_time>
<queue_name>int.q@node-001</queue_name>
1
<full_job_name>BASIC</full_job_name>
<hard_request name=“h_rt” resource_contribution=“0.000000”>2592000</hard_request>
<hard_req_queue>int.q</hard_req_queue>
NONE
</job_list>
<job_list state=“running”>
<JB_job_number>148</JB_job_number>
<JAT_prio>0.60500</JAT_prio>
<JB_name>PARAVIEW</JB_name>
<JB_owner>jms</JB_owner>
r
<JAT_start_time>2020-02-07T17:17:26</JAT_start_time>
<queue_name>all.q@node-002</queue_name>
8
<full_job_name>PARAVIEW</full_job_name>
<requested_pe name=“mpi”>8</requested_pe>
<granted_pe name=“mpi”>8</granted_pe>
<hard_request name=“mem_free” resource_contribution=“0.000000”>1G</hard_request>
<hard_request name=“swap_used” resource_contribution=“0.000000”>1G</hard_request>
<hard_req_queue>all.q@@VM_4</hard_req_queue>
NONE
</job_list>
</queue_info>
<job_info>
</job_info>
</job_info>

and for the end, the “e.backtrace” adding into the sge.rb file

with libdrmaa on ssge 8.1.9, the extract from ondemand-nginx/jms/error.log

App 30930 output: [2020-02-07 17:42:58 +0100 ]  INFO "method=GET path=/pun/sys/dashboard/apps/show/activejobs format=html controller=AppsController action=show status=302 duration=3428.26 view=0.00 location=https://caravanshow.ddns.net/pun/sys/activejobs"

App 33512 output: Rails Error: Unable to access log file. Please ensure that /var/www/ood/apps/sys/activejobs/log/production.log exists and is writable (ie, make it writable for user and group: chmod 0664 /var/www/ood/apps/sys/activejobs/log/production.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
App 33512 output: [2020-02-07 17:43:00 +0100 ] INFO “method=GET path=/pun/sys/activejobs/ format=html controller=JobsController action=index status=200 duration=63.87 view=63.01”
App 33512 output: [2020-02-07 17:43:01 +0100 ] INFO “method=GET path=/pun/sys/activejobs/jobs.json format=json controller=JobsController action=index status=200 duration=177.09 view=0.00”

without libdrmaa on ssge 8.1.9, the extract from ondemand-nginx/jms/error.log

App 27700 output: [2020-02-07 17:37:51 +0100 ] INFO “method=GET path=/pun/sys/dashboard/apps/show/activejobs format=html controller=AppsController action=show status=302 duration=3279.10 view=0.00 location=https://caravanshow.ddns.net/pun/sys/activejobs”
App 30263 output: Rails Error: Unable to access log file. Please ensure that /var/www/ood/apps/sys/activejobs/log/production.log exists and is writable (ie, make it writable for user and group: chmod 0664 /var/www/ood/apps/sys/activejobs/log/production.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
App 30263 output: [2020-02-07 17:37:53 +0100 ] INFO “method=GET path=/pun/sys/activejobs/ format=html controller=JobsController action=index status=200 duration=40.29 view=39.55”
App 30263 output: [2020-02-07 17:37:54 +0100 ] INFO “method=GET path=/pun/sys/activejobs/jobs.json format=json controller=JobsController action=index status=200 duration=107.63 view=0.00”

That’s all for me for the moment. hope it’s helpfull for you.

jean-marie

if source of interest for you, I can send you the files describing the format of the qstat command located in the $SGE_ROOT/util/resources/schemas/qstat directory :

  • qstat.xsd
  • message.xsd
  • detailed_job_info.xsd

Have a nice weekend

jean-marie

Not that’s fine! I was just interested in the XML because I’d assumed you had an issue with parsing the XML output.

I’ll look into adding new columns to the active jobs application early next week.

As an aside, I take it you’re interested in the linux host adapter in 1.7. Very cool! Obviously you’re having issues with it, feel free to open a ticket with your cluster.yml if you continue to have issue with with and you are planning to use it in production.

Thanks you have a nice weekend too!

Hi, Jeff

here is the files relatives to the qstat xml format.

Thanks to have a look on it to solve the “activesjobs” bug.

jean-marire

(Attachment qstat.tar.gz is missing)

here is the qstat.xsd file

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema
targetNamespace=“http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat_cb.xsd
xmlns=“http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat_cb.xsd
elementFormDefault=“qualified”>
<xs:element name=“job_info”>
xs:complexType
xs:annotation
xs:documentation
This schema describes most of the qstat outputs. There are extra
schema defintions for “qstat -j” and and “qstat -j job”.
</xs:documentation>
</xs:annotation>
xs:sequence
<xs:element name=“queue_info” type=“queue_info_t” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element name=“job_info” type=“job_info_t” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element name=“cluster_queue_summary” type=“cqueue_summary_t” minOccurs=“0” maxOccurs=“unbounded”/>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:complexType name="cqueue_summary_t">
	<xs:sequence>
		<xs:element name="name" type="xs:string"/>
    	<xs:element name="load" type="xs:float" minOccurs="0"/>
		<xs:element name="used" type="xs:unsignedInt"/>
		<xs:element name="resv" type="xs:unsignedInt"/>
		<xs:element name="available" type="xs:unsignedInt"/>
		<xs:element name="total" type="xs:unsignedInt"/>
		<xs:element name="temp_disabled" type="xs:unsignedInt"/>
		<xs:element name="manual_intervention" type="xs:unsignedInt"/>
		<xs:element name="suspend_manual" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="suspend_threshold" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="suspend_on_subordinate" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="suspend_calendar" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="unknown" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="load_alarm" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="disabled_manual" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="disabled_calendar" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="ambiguous" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="orphaned" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="error" type="xs:unsignedInt" minOccurs="0"/>
	</xs:sequence>
</xs:complexType>

<xs:complexType name="queue_info_t">
	<xs:sequence minOccurs="0" maxOccurs="unbounded">
            <!-- Queue-List looks like a mistake, but too late to change -->
		<xs:element name="Queue-List" type="queue_list_t" minOccurs="0"/>
		<xs:element name="job_list" type="job_list_t" minOccurs="0" maxOccurs="unbounded"/>
	</xs:sequence>
</xs:complexType>

<xs:complexType name="queue_list_t">
	<xs:sequence minOccurs="0">
		<xs:element name="name" type="xs:string" minOccurs="0"/>
		<xs:element name="qtype" type="xs:string" minOccurs="0"/>
		<xs:element name="slots_resv" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="slots_used" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="slots_total" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="load_avg" type="xs:float" minOccurs="0"/>
		<xs:element name="arch" type="xs:string" minOccurs="0"/>
		<xs:element name="state" type="xs:string" minOccurs="0"/>
		<xs:element name="message" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="resource" type="resource_t" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="job_list" type="job_list_t" minOccurs="0" maxOccurs="unbounded"/>
	</xs:sequence>
</xs:complexType>

<xs:complexType name="job_info_t">
	<xs:sequence>
		<xs:element name="job_list" type="job_list_t" minOccurs="0" maxOccurs="unbounded"/>
	</xs:sequence>
</xs:complexType>

<xs:complexType name="job_list_t">
	<xs:sequence>
		<xs:element name="JB_job_number" type="xs:unsignedInt"/>
		<xs:element name="JAT_prio" type="xs:float"/>
		<xs:element name="JAT_ntix" type="xs:float" minOccurs="0"/>

		<xs:element name="JB_nppri" type="xs:float" minOccurs="0"/>
		<xs:element name="JB_nurg" type="xs:float" minOccurs="0"/>
		<xs:element name="JB_urg" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="JB_rrcontr" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="JB_wtcontr" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="JB_dlcontr" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="JB_priority" type="xs:unsignedInt" minOccurs="0"/>

		<xs:element name="JB_name" type="xs:string"/>
		<xs:element name="JB_owner" type="xs:string"/>
		<xs:element name="JB_project" type="xs:string" minOccurs="0"/>
		<xs:element name="JB_department" type="xs:string" minOccurs="0"/>

		<xs:element name="state" type="xs:string"/>

		<xs:element name="JB_submission_time" type="xs:dateTime" minOccurs="0"/>
		<xs:element name="JAT_start_time" type="xs:dateTime" minOccurs="0"/>
		<xs:element name="JB_deadline" type="xs:dateTime" minOccurs="0"/>

		<xs:element name="cpu_usage" type="xs:float" minOccurs="0"/>
		<xs:element name="mem_usage" type="xs:float" minOccurs="0"/>
		<xs:element name="io_usage" type="xs:float" minOccurs="0"/>

		<xs:element name="tickets" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="JB_override_tickets" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="JB_jobshare" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="otickets" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="ftickets" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="stickets" type="xs:unsignedInt" minOccurs="0"/>
		<xs:element name="JAT_share" type="xs:float" minOccurs="0"/>

		<xs:element name="queue_name" type="xs:string" minOccurs="0"/>
		<xs:element name="master" type="xs:string" minOccurs="0"/>

		<xs:element name="slots" type="xs:unsignedInt"/>
		<xs:element name="tasks" type="xs:string" minOccurs="0"/>

		<xs:element name="requested_pe" type="requested_pe_t" minOccurs="0"/>
		<xs:element name="granted_pe" type="granted_pe_t" minOccurs="0"/>
		<xs:element name="JB_checkpoint_name" type="xs:string" minOccurs="0"/>
		<xs:element name="hard_request" type="request_t" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="def_hard_request" type="request_t" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="soft_request" type="request_t" minOccurs="0" maxOccurs="unbounded"/>

		<xs:element name="hard_req_queue" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="soft_req_queue" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="master_hard_req_queue" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="predecessor_jobs_req" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
		<xs:element name="predecessor_jobs" type="xs:unsignedInt" minOccurs="0" maxOccurs="unbounded"/>

     <xs:element name="binding" type="xs:string" minOccurs="0"/>
	</xs:sequence>
	<xs:attribute name="state" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="request_t">
	<xs:attribute name="name" type="xs:string" use="required"/>
	<xs:attribute name="resource_contribution" type="xs:float"/>
</xs:complexType>

<xs:complexType name="granted_pe_t">
	<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>

<xs:complexType name="requested_pe_t">
	<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>

<xs:complexType name="resource_t">
	<xs:attribute name="name" type="xs:string" use="required"/>
	<xs:attribute name="type" type="xs:string" use="required"/>
</xs:complexType>

</xs:schema>

the message.xsd file

<?xml version="1.0" encoding="UTF-8" ?>

<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema
targetNamespace=“http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/message.xsd
xmlns=“http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/message.xsd”>
<xs:element name=“message”>
xs:complexType
xs:annotation
xs:documentationqstat -j output</xs:documentation>
</xs:annotation>
xs:sequence
<xs:element name=“qmaster_response”>
xs:complexType
xs:sequence
<xs:element name=“SME_global_message_list” type=“gmessage_list_t” minOccurs=“0” maxOccurs=“1”/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:complexType name=“gmessage_list_t”>
xs:annotation
xs:documentation</xs:documentation>
</xs:annotation>
xs:sequence
<xs:element name=“element” minOccurs=“0” maxOccurs=“unbounded”>
xs:complexType
xs:sequence
<xs:element name=“MES_message_number” type=“xs:positiveInteger”/>
<xs:element name=“MES_message” type=“xs:string”/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>

</xs:schema>

the last one : detailed_job_info.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
          targetNamespace="http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/detailed_job_info.xsd"
           xmlns="http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/detailed_job_info.xsd"
           elementFormDefault="qualified">
<xs:element name="detailed_job_info">
	<xs:complexType>
		<xs:annotation>
			<xs:documentation>qstat -j number output</xs:documentation>
		</xs:annotation>
		<xs:sequence >
			<xs:element name="djob_info" type="djob_info_t"/>
			<xs:element name="messages" type="message_t"/>
		</xs:sequence>
	</xs:complexType>
</xs:element>

<xs:complexType name="djob_info_t">
	<xs:sequence minOccurs="0" maxOccurs="unbounded">
	       <xs:element name="qmaster_response" type="qmaster_response_t"/>
    </xs:sequence>
</xs:complexType>

<xs:complexType name="qmaster_response_t">
	<xs:sequence >
                    <xs:element name="JB_job_number" type="xs:unsignedInt"/>
                    <xs:element name="JB_ar" type="xs:unsignedInt"/>
                    <xs:element name="JB_exec_file" type="xs:string"/>
                    <xs:element name="JB_submission_time" type="xs:dateTime" minOccurs="0"/>
                    <xs:element name="JB_owner" type="xs:string"/>
                    <xs:element name="JB_uid" type="xs:unsignedInt"/>
                    <xs:element name="JB_group" type="xs:string"/>
                    <xs:element name="JB_gid" type="xs:unsignedInt"/>
                    <xs:element name="JB_account" type="xs:string"/>
                    <xs:element name="JB_merge_stderr" type="xs:boolean"/>
                    <xs:element name="JB_mail_list" type="JB_mail_t"/>
                    <xs:element name="JB_project" type="xs:string"/>
                    <xs:element name="JB_notify" type="xs:boolean"/>
                    <xs:element name="JB_job_name" type="xs:string"/>
                    <xs:element name="JB_stdout_path_list" type="JB_stdout_path_list_t"/>
                    <xs:element name="JB_jobshare" type="xs:unsignedInt"/>
                    <xs:element name="JB_hard_resource_list" type="JB_hard_resource_list_t"/>
                    <xs:element name="JB_soft_resource_list" type="JB_soft_resource_list_t"/>
                    <xs:element name="JB_hard_queue_list" type="JB_hard_queue_list_t"/>
                    <xs:element name="JB_soft_queue_list" type="JB_soft_queue_list_t"/>
                    <xs:element name="JB_shell_list" type="JB_shell_list_t"/>
                    <xs:element name="JB_env_list" type="JB_env_list_t"/>
                    <xs:element name="JB_job_args" type="JB_job_args_t"/>
                    <xs:element name="JB_script_file" type="xs:string"/>
                    <xs:element name="JB_ja_tasks" type="JB_ja_task_t"/>
                    <xs:element name="JB_context" type="JB_context_t"/>
                    <xs:element name="JB_cwd" type="xs:string"/>
                    <xs:element name="JB_stderr_path_list" type="JB_stderr_path_list_t"/>
                    <xs:element name="JB_jid_predecessor_list" type="JB_jid_predecessor_list_t"/>
                    <xs:element name="JB_jid_successor_list" type="JB_jid_successor_list_t"/>
                    <xs:element name="JB_deadline" type="xs:dateTime"/>
                    <xs:element name="JB_execution_time" type="xs:unsignedInt"/>
                    <xs:element name="JB_checkpoint_name" type="xs:string"/>
                    <!-- checkpoint_attr is (common/symbols.h) bit-or of:
                      CHECKPOINT_AT_MINIMUM_INTERVAL 0x01
                      CHECKPOINT_AT_SHUTDOWN         0x02
                      CHECKPOINT_SUSPEND             0x04
                      NO_CHECKPOINT                  0x08
                      CHECKPOINT_AT_AUTO_RES         0x10
                     -->
                    <xs:element name="JB_checkpoint_attr" type="xs:unsignedInt"/>
                    <xs:element name="JB_checkpoint_interval" type="xs:unsignedInt"/>
                    <xs:element name="JB_directive_prefix" type="xs:string"/>
                    <xs:element name="JB_reserve" type="xs:boolean"/>
                    <xs:element name="JB_mail_options" type="xs:unsignedInt" />
                    <xs:element name="JB_stdin_path_list" type="JB_stdin_path_list_t"/>
                    <xs:element name="JB_priority" type="xs:unsignedInt"/>
                    <xs:element name="JB_restart" type="xs:unsignedInt"/>
                    <xs:element name="JB_verify" type="xs:unsignedInt"/>
                    <xs:element name="JB_master_hard_queue_list" type="JB_master_hard_queue_list_t"/>
                    <xs:element name="JB_script_size" type="xs:unsignedInt"/>
                    <xs:element name="JB_pe" type="xs:string"/>
                    <xs:element name="JB_pe_range" type="JB_pe_range_t"/>
                    <xs:element name="JB_jid_request_list" type="JB_jid_request_list_t"/>
                    <xs:element name="JB_verify_suitable_queues" type="xs:unsignedInt"/>
                    <xs:element name="JB_soft_wallclock_gmt" type="xs:unsignedInt"/>
                    <xs:element name="JB_hard_wallclock_gmt" type="xs:unsignedInt"/>
                    <xs:element name="JB_override_tickets" type="xs:unsignedInt"/>
                    <xs:element name="JB_version" type="xs:unsignedInt"/>
                    <xs:element name="JB_ja_structure" type="JB_ja_structure_t"/>
                    <xs:element name="JB_type" type="xs:unsignedInt"/>
                    <xs:element name="JB_binding" type="JB_binding_t" minOccurs="0"/>
                    <xs:element name="JB_ja_task_concurrency" type="xs:unsignedInt"/>
	</xs:sequence>
</xs:complexType>

    <!-- =========== Data Types ============ -->
    
    <!-- JB_mail_t -->
    <xs:complexType name="JB_mail_t">
	<xs:sequence>
		<xs:element name="element" type="JB_mail_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_mail_element_t">
	<xs:sequence>
		<xs:element name="MR_user" type="xs:string"/>
		<xs:element name="MR_host" type="xs:string"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_stdout_path_list_t -->
    <xs:complexType name="JB_stdout_path_list_t">
	<xs:sequence >
		<xs:element name="path_list" type="JB_stdout_path_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_stdout_path_list_element_t">
	<xs:sequence >
		<xs:element name="PN_path" type="xs:string"/>
		<xs:element name="PN_host" type="xs:string"/>
		<xs:element name="PN_file_host" type="xs:string" minOccurs="0" />
		<xs:element name="PN_file_staging" type="xs:boolean"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_hard_resource_list_t -->
    <xs:complexType name="JB_hard_resource_list_t">
	<xs:sequence>
		<xs:element name="qstat_l_requests" type="JB_hard_resource_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_hard_resource_list_element_t">
	<xs:sequence>
		<xs:element name="CE_name" type="xs:string"/>
                    <xs:element name="CE_valtype" type="xs:unsignedInt"/>
                    <xs:element name="CE_stringval" type="xs:unsignedInt"/>
                    <xs:element name="CE_doubleval" type="xs:double"/>
                    <xs:element name="CE_relop" type="xs:unsignedInt"/>
                    <xs:element name="CE_consumable" type="xs:boolean"/>
                    <xs:element name="CE_dominant" type="xs:unsignedInt"/>
                    <xs:element name="CE_pj_doubleval" type="xs:double"/>
                    <xs:element name="CE_pj_dominant" type="xs:unsignedInt"/>
                    <xs:element name="CE_requestable" type="xs:unsignedInt"/>
                    <xs:element name="CE_tagged" type="xs:unsignedInt"/>                        
            </xs:sequence>
</xs:complexType>
    <!-- JB_soft_resource_list_t -->
    <xs:complexType name="JB_soft_resource_list_t">
	<xs:sequence>
		<xs:element name="qstat_l_requests" type="JB_soft_resource_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_soft_resource_list_element_t">
	<xs:sequence>
		<xs:element name="CE_name" type="xs:string"/>
                    <xs:element name="CE_valtype" type="xs:unsignedInt"/>
                    <xs:element name="CE_stringval" type="xs:unsignedInt"/>
                    <xs:element name="CE_doubleval" type="xs:double"/>
                    <xs:element name="CE_relop" type="xs:unsignedInt"/>
                    <xs:element name="CE_consumable" type="xs:boolean"/>
                    <xs:element name="CE_dominant" type="xs:unsignedInt"/>
                    <xs:element name="CE_pj_doubleval" type="xs:double"/>
                    <xs:element name="CE_pj_dominant" type="xs:unsignedInt"/>
                    <xs:element name="CE_requestable" type="xs:unsignedInt"/>
                    <xs:element name="CE_tagged" type="xs:unsignedInt"/>                        
            </xs:sequence>
</xs:complexType>
    <!-- JB_hard_queue_list_t -->
    <xs:complexType name="JB_hard_queue_list_t">
	<xs:sequence>
		<xs:element name="destin_ident_list" type="JB_hard_queue_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_hard_queue_list_element_t">
	<xs:sequence>
		<xs:element name="QR_name" type="xs:string"/>                        
            </xs:sequence>
</xs:complexType>
    <!-- JB_soft_queue_list_t -->
    <xs:complexType name="JB_soft_queue_list_t">
	<xs:sequence>
		<xs:element name="destin_ident_list" type="JB_soft_queue_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_soft_queue_list_element_t">
	<xs:sequence>
		<xs:element name="QR_name" type="xs:string"/>                        
            </xs:sequence>
</xs:complexType>
    <!-- JB_shell_list_t -->
    <xs:complexType name="JB_shell_list_t">
	<xs:sequence >
		<xs:element name="path_list" type="JB_shell_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_shell_list_element_t">
	<xs:sequence >
		<xs:element name="PN_path" type="xs:string"/>
		<xs:element name="PN_host" type="xs:string"/>
		<xs:element name="PN_file_host" type="xs:string"/>
		<xs:element name="PN_file_staging" type="xs:boolean"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_env_list_t -->
    <xs:complexType name="JB_env_list_t">
	<xs:sequence>
		<xs:element name="element" type="JB_env_element_t"/>
	</xs:sequence>
</xs:complexType>
<xs:complexType name="JB_env_element_t">
	<xs:sequence>
		<xs:element name="VA_variable" type="xs:string"/>
		<xs:element name="VA_value" type="xs:string"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_job_args_t -->
    <xs:complexType name="JB_job_args_t">
	<xs:sequence>
		<xs:element name="element" type="JB_job_args_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_job_args_element_t">
	<xs:sequence>
		<xs:element name="ST_name" type="xs:string"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_ja_task_t -->
    <xs:complexType name="JB_ja_task_t">
	<xs:sequence>
		<xs:element name="ulong_sublist" type="JB_ja_task_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_ja_task_element_t">
	<xs:sequence>
		<xs:element name="JAT_status" type="xs:unsignedInt"/>
                    <xs:element name="JAT_task_number" type="xs:unsignedInt"/>
                    <xs:element name="JAT_scaled_usage_list" type="JAT_scaled_usage_list_t" minOccurs="0"/>
            </xs:sequence>
</xs:complexType>
    <xs:complexType name="JAT_scaled_usage_list_t">
	<xs:sequence>
		<xs:element name="element" type="JAT_scaled_usage_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JAT_scaled_usage_list_element_t">
	<xs:sequence>
		<xs:element name="UA_name" type="xs:string"/>
		<xs:element name="UA_value" type="xs:float"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_context_t -->
    <xs:complexType name="JB_context_t">
	<xs:sequence >
		<xs:element name="context_list" type="JB_context_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_context_element_t">
	<xs:sequence>
		<xs:element name="VA_variable" type="xs:string"/>
		<xs:element name="VA_value" type="xs:string"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_stderr_path_list_t -->
    <xs:complexType name="JB_stderr_path_list_t">
	<xs:sequence >
		<xs:element name="path_list" type="JB_stderr_path_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_stderr_path_list_element_t">
	<xs:sequence >
		<xs:element name="PN_path" type="xs:string"/>
		<xs:element name="PN_host" type="xs:string"/>
		<xs:element name="PN_file_host" type="xs:string" minOccurs="0" />
		<xs:element name="PN_file_staging" type="xs:boolean"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_jid_predecessor_list_t -->
    <xs:complexType name="JB_jid_predecessor_list_t">
	<xs:sequence >
		<xs:element name="job_predecessors" type="JB_jid_predecessor_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_jid_predecessor_list_element_t">
	<xs:sequence>
		<xs:element name="JRE_job_number" type="xs:unsignedInt"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_jid_successor_list_t -->
    <xs:complexType name="JB_jid_successor_list_t">
	<xs:sequence >
		<xs:element name="ulong_sublist" type="JB_jid_successor_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_jid_successor_list_element_t">
	<xs:sequence>
		<xs:element name="JRE_job_number" type="xs:unsignedInt"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_stdin_path_list_t -->
    <xs:complexType name="JB_stdin_path_list_t">
	<xs:sequence>
		<xs:element name="path_list" type="JB_stdin_path_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_stdin_path_list_element_t">
	<xs:sequence >
		<xs:element name="PN_path" type="xs:string"/>
		<xs:element name="PN_host" type="xs:string"/>
		<xs:element name="PN_file_host" type="xs:string" minOccurs="0" />
		<xs:element name="PN_file_staging" type="xs:boolean"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_master_hard_queue_list_t -->
    <xs:complexType name="JB_master_hard_queue_list_t">
	<xs:sequence>
		<xs:element name="destin_ident_list" type="JB_master_hard_queue_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_master_hard_queue_list_element_t">
	<xs:sequence>
		<xs:element name="QR_name" type="xs:string"/>                        
            </xs:sequence>
</xs:complexType>
    <!-- JB_pe_range_t -->
    <xs:complexType name="JB_pe_range_t">
	<xs:sequence>
		<xs:element name="ranges" type="JB_pe_range_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_pe_range_element_t">
	<xs:sequence >
		<xs:element name="RN_min" type="xs:unsignedInt"/>
		<xs:element name="RN_max" type="xs:unsignedInt"/>
		<xs:element name="RN_step" type="xs:unsignedInt"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_jid_request_list_t -->
    <xs:complexType name="JB_jid_request_list_t">
	<xs:sequence >
		<xs:element name="element" type="JB_jid_request_list_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_jid_request_list_element_t">
	<xs:sequence>
		<xs:element name="JRE_job_number" type="xs:unsignedInt"/>
		<xs:element name="JRE_job_name" type="xs:string"/>
	</xs:sequence>
</xs:complexType>
    <!-- JB_ja_structure_t -->
    <xs:complexType name="JB_ja_structure_t">
	<xs:sequence>
		<xs:element name="task_id_range" type="JB_ja_structure_element_t"/>
	</xs:sequence>
</xs:complexType>
    <xs:complexType name="JB_ja_structure_element_t">
	<xs:sequence >
		<xs:element name="RN_min" type="xs:unsignedInt"/>
		<xs:element name="RN_max" type="xs:unsignedInt"/>
		<xs:element name="RN_step" type="xs:unsignedInt"/>
	</xs:sequence>
</xs:complexType>
     <!-- JB_binding -->
     <xs:complexType name="JB_binding_t">
        <xs:sequence>
           <xs:element name="binding" type="BT_binding_element_t"/>
        </xs:sequence>
     </xs:complexType>
     <xs:complexType name="BT_binding_element_t">
        <xs:sequence >
           <xs:element name="BN_strategy" type="xs:string"/>
  <!-- fixme:  should be uint or string "slots" -->
           <xs:element name="BN_parameter_n" type="xs:unsignedInt"/>
           <xs:element name="BN_parameter_socket_offset" type="xs:unsignedInt"/>
           <xs:element name="BN_parameter_core_offset" type="xs:unsignedInt"/>
           <xs:element name="BN_parameter_striding_first_core" type="xs:unsignedInt"/>
           <xs:element name="BN_parameter_striding_last_core" type="xs:unsignedInt"/>
           <xs:element name="BN_parameter_striding_step_size" type="xs:unsignedInt"/>
        </xs:sequence>
     </xs:complexType>
    <!-- message_t -->	
<xs:complexType name="message_t">
     <xs:annotation>
        <xs:documentation>qstat -j output</xs:documentation>
     </xs:annotation>
     <xs:sequence>
        <xs:element name="qmaster_response">
           <xs:complexType>
              <xs:sequence>
                 <xs:element name="SME_global_message_list" type="gmessage_list_t" minOccurs="0" maxOccurs="1"/>
              </xs:sequence>
           </xs:complexType> 
        </xs:element>
     </xs:sequence>
  </xs:complexType>   

<xs:complexType name=“gmessage_list_t”>
xs:sequence
<xs:element name=“element” minOccurs=“0” maxOccurs=“unbounded”>
xs:complexType
xs:sequence
<xs:element name=“MES_message_number” type=“xs:positiveInteger”/>
<xs:element name=“MES_message” type=“xs:string”/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>

@jms27000 Thanks for the information.

I’m sorry, but redefining columns in active jobs isn’t available. Those columns are hard coded in the view html and the javascript that updates it. I’ve created this ticket below to allow this sort of functionality, though I can’t say when it’ll be available. The filter.rb you were looking at provides different menu dropdowns to filter what rows are shown in the table. Like how we filter on your primary group so you can see your groups jobs - but that doesn’t affect the columns, just the rows (jobs) returned.

Hi, Jeff

Do you think you will be able to correct the incomplete view of ActiveJobs for SGE like schedulers ?
is there anything i can do to help ?

jean-marie

Thanks for the offer! I’ve got the ticket open and it should be applicable to all schedulers. At this point it’s just find the time, design then code. Feel free to submit a PR if you’re able!

Looking at the code it looks like our JSON api has all this data. But while trying to figure out the right API call to make, it looks like it’s buggy. It would be /pun/sys/activejobs/json?cluster=SGC_HPC for you but again, what’s returned is not what you’d expect. Looks like this topic could generate a feature request and a bug report.

Hi, @jeff.ohrstrom,
have you got some news on this topic ? we are near to put in operation OOD and this bug on the “ActiveJobs” incomplete view over SSGE stays a blocking point for us.

thanks a lot

jean-marie

Hi, everybody

to @jeff.ohrstrom : what is your reference in term of provider and version for your SGE qstat.xsd model implemented in OOD ? with this answer, i will be able to compare your reference format to the SSGE 8.1.6/9 one. in case of differences, i will try to make a turn around with a wrapper.

thanks in advance.

jean-marie

OK I think I see what the issue is. First, in the view, there’s no way to add the columns for CPU just
yet. I’ve opened that feature ticket, but have no idea when/if it may come.

I’m also now seeing the JSON api isn’t broken, it just isn’t what I expected (it’s html within json). What it returns is this expanded HTML panel where you can see CPUs. It returns this panel below when the arrow on the left hand side is toggled.

I see you don’t have this toggle, and that’s an issue indeed. It looks like SGE is not enabled to get this extended view.

I’ve opened another ticket to have this feature enabled for SGE.

Here are the output examples we test against. I can’t tell what version it is. Though again, this is an issue with the view in active jobs, not necessarily the way we’re interacting with qsub/qstat.

Thanks for news, @jeff.ohrstrom

for the first point, my idea was about nb_cores to add in the view of ActiveJobs, because it’s a scheduler parameter, and it’s useful to sort the active jobs on this item. this point is not mandatory for us;

the mandatory point in fact is the capacity to kill jobs in the extended view with a SSGE scheduler. hope you’ll find a quick answer on this topic;

waiting for your good news.

jean-marie

This feature to delete jobs in active jobs was added in 1.7 which will be released in the next few weeks.

Active jobs now looks like this, where you’ll be shown a button to delete if it’s your job and it’s not complete, regardless of the scheduler used.

I know there are other features you’ve requested, configurable columns and SGE extended panel which I have tickets open for both, though I don’t know when they’ll get put in.

Hi, Jeff. Thanks a lot to the team. I’m waiting for this new OOD delivery :grinning: