about publications hpc6 other

Useful hpc6 commands

bjobs
Displays a user's running and waiting jobs.

bjobs -w
Wide output to give the full name of the nodes and jobs. This could, for example, be piped to sort ( bjobs -w | sort ) to give the jobs in order of submit time. An alias could be created using alias myqw="bjobs -w | sort" to then give the output by entering myqw .

bjobs -l
Full information about the jobs. Options such as -l will work for many of the commands in the LSF queuing system.

bjobs -u all
Diplays jobs on the cluster for all users.

bkill JOBID
Removes your job with number JOBID from the queue.

btop JOBID
Moves a job to the top of your queuing jobs.

bhosts
Show the status of all the nodes in the cluster.

lsload -l
Show the load on each node in the cluster. This includes how much memory is free and free scratch space (tmp2).

bslots -l
Display the available spaces that a job can backfill depending on its runtime.

bhist -C DAY1,DAY2
Show all jobs that have finished between DAY1 and DAY2 of the current month.

bqueues -l
Show full information regarding the queues. This includes each user's current priority for the fairshare system.

Acknowledgements: Thanks to Dr S. Erhardt for producing a list of useful hpc6 commands on which this was based.

scripts

Scripts to submit jobs could be viewed by ls /usr/local/bin . Running the script without a filename should reveal the necessary commands. There is an optional command for required scratch space in Gigabytes which can be put at the end of the script command, otherwise a default estimate of 30 Gigabytes will be used.

queues

There are 53 nodes each with 12 processors. Four of these nodes have around 1000 Gigabytes of disk space and 96 Gigabytes of memory while the rest have 500 Gigabytes of disk space and 48 Gigabytes of memory. Given the requirements of a job in the submit script then the queuing system will attempt to start the job on the most appropriate node as quickly as possible.

There are two active queues: normalq and shortq. Jobs with a runtime estimate of 24 hours or less will automatically run in the shortq. Longer jobs will run in the normalq where there is currently a limit of 216 processors per user in total. One node (compute-00-29) only accepts jobs from the shortq and the GPU node is currently reserved for Gromacs calculations. The system will automatically delete jobs that exceed their runtime estimate.

The queues are setup to use fairshare. Users have a priority given by

100 / ( (historic_run_time+run_time) * 2 + (1 + job_slots) * 3).
Here one hour of historic run time decays to 0.1 hours after a week.