DUG Insight User Manual

Managing Jobs via the Cluster Monitor

The Cluster Monitor window displays information about cluster jobs and allows modifications to the jobs.

1. Job table

The Job table can give insight into the jobs currently on the cluster.

Prio: Current job priority
R: Number of tasks that are running. I/O intensive jobs are indicated by a (*)
Q: Number of tasks that are queued
D: Number of tasks that are dependent and awaiting completion of other tasks
H: Number of tasks that are on hold
IO: Number of jobs that are queued for I/O-related reasons. A (*) in this column indicates that the job is I/O intensive but is queuing for other reasons
E: Numbers of tasks that ended because of an error

These statuses are also reflected in the tasks table under the Status column of the Task details table.

Note: IO (input/output, i.e. reading and writing) represents a scarce, shared resource on the file systems. Jobs that are particularly straining to network disks (I/O intensive) are marked as such, and there is a limit to how many of them can run concurrently.

2. Task details table

The Task details table displays the status of jobs that are selected in the Job table.

Note: Click the double-arrow icon beside Task details to display more information on the selected task.

The following details are displayed on the Task details table:

Job ID
Task ID
Name
Status
Priority
Partition
Run Time

3. Job Modification Options

The following are job modification options to perform simple changes to jobs queueing on the cluster. If multiple tasks are selected, job modifications are only applicable to the selected tasks.

Set Priority: Set priority for Queueing or Dependent jobs.
Set Partitions: Set partitions for non-running jobs.
Hold: Place jobs into a Held status. Held jobs will not run until they are released. Only applicable to Queueing jobs.

WARNING: Selecting Requeue Hold, Requeue, Release, and Cancel Jobs options will result in the loss of any work that has been completed.

Requeue Hold: Stops a running job and places it in Held status. Any work already done is lost. Only applicable to Running jobs.
Requeue: Stops a running job and immediately re-queues it. Any work already done is lost. Only applicable to Running jobs.
Release: Release jobs with Error or Held status into the Queueing status.
Cancel Jobs: Delete selected jobs from the cluster. Any work already done is lost.

4. Logs Panel

The logs panel shows log entries from the selected tasks. The results of the selected jobs are merged and sorted by time.

Optionally, use the following filters to refine the logs:

Error: List only the errors the tasks ran into.
Warning: List only the warning messages prompted when the tasks were run.
Search bar: Find log entries that match the search terms or phrases.

To report a problem, send us the logs by following the steps described below:

Select Send Logs to DUG. This will open a log reporting window.

Select the Tell DUG about this problem check box to send this problem report to us.
- Select the Send information about my session check box to send us the logs for the session you are running.
- Select the Allow DUG to contact me about this problem check box and enter your email address in the box if you would like to be contacted. We will respond to your report as soon as we are able to.
- Insert any other relevant information in the box. Any additional comments or details would greatly help hasten the solving of your problem.
Click OK.