Personal tools
You are here: Home LIONS2 Documentation LIONS HPC Docs Tutorial_old

Tutorial_old

Gives a brief introduction on how to use resources in the HPC environment.

Tutorial

The LIONS2 HPC environment uses the Sun Grid Engine (SGE) batch queuing software on the Solaris and Red Hat Linux environments in the LIONS2 HPC environment. One of the main advantages of SGE is the fact that ALL LIONS2 CLIENTS are capable of participating in the LIONS2 SGE grid. What this means is that one does not need to log only into a sol-login box to prepare an SGE job - it can potentially be prepared on your Solaris workstation and submitted from there.

[NOTE]To assist with the conversion from the Platform LSF product which used to be used in the original LIONS environment, there will be entries like this one to compare the former LSF command with the comparable SGE command to do the same thing.

SGE (Sun Grid Engine) allows for transparent use of the OpenAFS filesystem used extensively in LIONS. To do so, SGE makes a copy of your Kerberos tickets and uses that for authentication on the machine which your jobs runs. You can submit a job which can run for up to 30 days if you set up a 'renewable ticket'.

[NOTE]LSF under the original LIONS did something similar, but it required you to enter your password when a job was submitted. SGE uses the actual Kerberos tickets to do the same thing - no password required!

To set up a renewable ticket for SGE, issue the kinit command at the shell prompt with the '-r' switch, followed by the number of days to renew your tickets, like so:

kinit -r 30d

This example shows the creation of a 30 day renewable ticket. Please see the kinit manual page for more details.

To make sure you've got your renewable ticket, issue the klist command. You should then see something like the following:

Ticket cache: FILE:/tmp/krb5cc_21
Default principal: tony@LIONS.ODU.EDU

Valid starting     Expires            Service principal
12/20/04 17:22:10  12/21/04 03:22:10  krbtgt/LIONS.ODU.EDU@LIONS.ODU.EDU
        renew until 01/19/05 17:22:08
12/20/04 17:22:11  12/21/04 03:22:10  afs@LIONS.ODU.EDU
        renew until 12/20/04 17:22:11


Kerberos 4 ticket cache: /tmp/tkt21
klist: You have no tickets cached

Note that SGE is set up so that if you submit a job or a command without renewable tickets, you will be warned with the following message:

get_cred stderr: WARNING: non-renewable tickets - you may want to resubmit with 'kinit -r'

Submitting a Job to SGE

[NOTE]Unlike LSF, SGE was designed to allow the user to submit jobs in any number of ways. The main two ways that LIONS has enabled are as follows:

  • qsub - to submit a batch job for background execution
  • qrsh - to submit a job to run in the foreground, i.e., interactively.

Please note that the SGE 'qrsh' command is the equivalent to the LSF 'bsub -I' command.

To run an SGE job, you usually first prepare an SGE job script file, containing the SGE keyword statements, shell commands, etc. to run the job. The following is an example of such a job script. Basically, all it does is issue the 'sleep' command:

#!/bin/csh
#
#$ -cwd
#$ -j y
#$ -S /bin/csh
#
date
sleep 10
date

Please note that SGE command options with the shell script comment character (#) followed by the "dollar" sign ($) followed by a keyword and its value, if any. In this example, the SGE command options are as follows:

  • -cwd means to execute the job for the current working directory.
  • -j y means to merge the standard error stream into the standard output stream instead of having two separate error and output streams.
  • -S /bin/csh specifies the interpreting shell for this job to be the C Shell.

[NOTE]By default, the LSF 'bsub' command could take either a script or a binary file as it's argument. Unfortunately, SGE was originally written to handle job scripts only. A recent change in the product allows you to specify the '-b y' switch to indicate that the job being submitted is a binary program rather than a job script. SGE still doesn't handle binary files with the utmost care, so we recommend that when you use the 'qsub' command, please write a job script as documented here. (By default, when running 'qsub', '-b n' is the default.) The 'qrsh' command, on the other hand, assumes '-b y' as a default. Please see the 'qsub' and 'qrsh' manual pages for more information.

In addition to the SGE command options, the job script file will usually contain shell commands to initialize the environment for the job, invoke the program for the job and perform any post-job processing needed.

Note: While you can write any job scripts in either Bourne Shell (sh) or Korn Shell (ksh), it is recommended that new scripts which are written use 'csh' as it is similar to your default interactive environment, which is the 'tcsh' shell. Csh scripts have to be used for submitting batch jobs to the GNU/Linux environment.


How to Submit a Job (qsub)

The qsub command is used to submit jobs to SGE. Once you have prepared a job script, you then submit it to SGE using the qsub command. Using the above example, you would submit the "sleep.sh" file to SGE queue 'normal' by entering:

qsub -q normal sleep.sh

SGE will respond to you with something like:

Your job 16 ("sleep.sh") has been submitted

where 16 is the job number (SGE job identifier) generated by qsub.

Here we're setting up a job script named "gaussian.sh" which runs the application Gaussian 98. It uses the file testDFT.com for input. It is assumed that the file testDFT.com and the job script "gaussian.sh" are in the directory from which you submit this job.

#!/bin/csh
#
#$ -cwd
#$ -j y
#
g98 < testDFT.com
For more information on the qsub command, enter man qsub when logged onto any LIONS2 client.

How to Determine the Status of a Job (qstat)

The qstat command displays information about the SGE queues and the jobs in the queues, either running or waiting to run. The command:

qstat

shows you information about your jobs in the queues. These jobs can either be running or waiting to run.

qstat -f

shows you information about the SGE queues and all jobs in the queues.

For more information on the qstat command, enter man qstat when logged onto any LIONS2 client.

How to Cancel a Job (qdel)

The qdel is used to cancel a SGE job, either while it is waiting to execute qdel jobid

Cancels a job by its SGE job identifier (see the qsub command). The qstat command displays the job identifiers for your jobs.

For more information on the qdel command, enter man qdel when logged onto any LIONS2 client.

How to Change the Status of a Job (qhold, qrls)

The qhold and qrls commands are used to place a "hold" on a SGE job and to release a previous "hold" on a SGE job.

qhold jobid

Place a "user hold" on a job by its SGE job identifier (see the qsub command. The qstat command displays the job identifiers for your jobs. A "user hold" will cause SGE to skip over the job when it is considering the next job to be scheduled for execution.

qrls jobid

Release a previous "user hold" on a job by its SGE job identifier. The job will be eligible for selection when SGE is considering the next job to be scheduled for execution.

For more information on the qhold and qrls commands, enter man qhold or man qrls when logged onto any LIONS2 client.

Other SGE Commands

In addition to the commands described above, SGE provides the following additional commands. Each one has a man page describing it.

qalter
Changes the characteristics of a SGE job that is waiting to run.
qconf
Displays, adds, changes, or deletes SGE system parameters. General users can only display information about SGE.
qconf -sql
Displays the available SGE queues.
qmon
An X-Windows interface for SGE.
Document Actions