Last updated:
0.0 AN INTRODUCTION TO THE LIONS2 ENVIRONMENT 1.0 GENERAL QUESTIONS ABOUT LIONS2:Welcome to the LIONS2 FAQ! The intent of this document is to announce
the new UNIX LIONS2 environment to Old Dominion University faculty, staff and
students, as well as to provide an area with which to document frequently asked
questions with answers for future reference.
LIONS2 is the next generation of
the Old Dominion University central UNIX and GNU/Linux-based computing
services. Whereas the original LIONS environment (also refered to as LIONS1
in this document) used DCE and DFS for centralized account management
and home-directories, LIONS2 will be built using Kerberos 5, OpenLDAP
and OpenAFS for those services.
In keeping with new University standards, access to the LIONS2
environment will be set up using your Old Dominion University MIDAS
We in OCCS UNIX Support sincerely hope readers find the information in this document
useful. However, we also hope readers will pose new questions so that
the document remains fresh with relevant information current to any
future situation at hand. Please do not hesitate to contact OCCS UNIX Support with a
link as reference upon noticing any factual error, new question, or
confusion over any of the contents published in this FAQ.
One additional note - the layout of this FAQ and contents are based upon a similar document originally written for MIT by J. Maynard Gelinas. His original document is located here, and we thank him for allowing us to do this - Thanks again!!
Bottom line, the lessons that we
learned during the deployment of the LIONS1 environment are being used to make
the LIONS2 environment more flexible and easier to use.
MIDAS is a log-in ID and password management system that stores user information and communicates that information to each of the University's networked resources. This allows the user to log in to each of those resources with the same user ID and password.
After creating your MIDAS ID and password from this site, you will use that ID and password to access other systems as soon as they become linked to MIDAS. The University Portal will be the first application to use the MIDAS ID and password. Other applications that will eventually use the MIDAS ID and password are:
| » Novell (LAN) | »
LIONS2 (!) |
| » Wireless LAN | » Open Jack Access |
| » Blackboard | » Authenticated SMTP |
| » Lotus Notes |
When you create a MIDAS ID and password, you will also create a security profile - a group of questions/answers that will make it possible for you to reset your MIDAS password on-line even if you do not know the current password. Guidelines for creating your MIDAS ID and password are provided on the ID creation page.
AFS FAQ (a little dated, but still complete)
MIT Kerberos 5 is used as the primary authentication method. OpenAFS, a
distributed network filesystem, is used to serve home directories,
application binaries, and project space. OpenLDAP is used to serve
Username to UID/GID mappings, Full Names, and /path/to/shell (it's an
/etc/passwd map without any passwords, served across the network).
The primary Kerberos, OpenAFS database, and OpenLDAP servers are all
located in the OCCS machine room in the new
Engineering and Computational Sciences building. These are the "top level"
servers which hold master copies of each database. Each fileserver has
internal hardware SCSI RAID
connected to the server host with about 128GB of disk space available
on the main file server and 250GB of disk available on each of the two
auxilary file servers.
If a Kerberos, LDAP, or OpenAFS database server fails very little
happens,
the clients just fall over to another server and go on their
merry way. However, if a fileserver fails, every Read/Write volume
served from that fileserver goes offline. Basically, this means that
every home-directory volume served from that fileserver will be
unavailable until that server comes back online or we restore all the
affected volumes from tape to a new fileserver. The SCSI RAID feature
will help protect against disk failures on these systems. A single disk
failure will not take down a file server.
Yes and no. Data volumes aren't tied to any individual fileserver, so the
very act of restoring each volume to any live fileserver makes that
volume available to the entire OpenAFS cell instantly. Note that for a RAID
array to lose data the system must either experience two disk failures
at once, or the host has to damage the filesystem(s) stored on that
RAID beyond repair. The first case is extremely rare, the second less so for any one filesystem but extremely rare for all
filesystems stored on said RAID. However, OCCS has planned for quick
disaster recovery in the event of total failure of any and all hardware
associated with a fileserver.
Unfortunately, NFS has it's own set of problems which
make it unsuitable for serving large amounts of data and large numbers
of home-directories across a WAN. Note that we actually are using LDAP
to pass out Username to UID/GID mappings, full names, shell location,
etc; a network served "/etc/passwd" map without any encrypted
passwords, authentication being handled by Kerberos.
Yes. OpenAFS mitigates the issue somewhat by caching reads on the local workstation,
but OpenAFS writes will always be slower than writing to your local disk.
Even ancient workstations (Older UltraSPARCs and Pentium II and below which lacks
IDE DMA) will still find network writes to be slower than writing to a local disk.
However, our Network group has put a lot of effort into keeping our network up to
date with the most modern hardware. Most every workstation has it's own 100Mb Full
Duplex connection directly to a Cisco switch, while most fileservers have a 1Gb Full
Duplex SX fiber connection to each switch. In practice this means that any arbitrary
workstation has a theoretical maximum OpenAFS write transfer rate of ~3MB/sec or so
after subtracting layer 2 and 3 network plus OpenAFS protocol overhead. On the
fileserver side it's roughly about five to seven times that, or between 15MB/sec to
21MB/sec (about 2/3rds the transfer rate of IDE ATA100 with DMA). So, a file
server's network pipe should saturate given about five to seven users maxing out
their network pipes at the same time, either reading new files (files not locally
cached), or committing writes.
Under DCE/DFS, cache sizes were restricted to about 100 megabytes. For OpenAFS,
we have increased the default cache sizes on each workstation and server to about one
gigabyte of data. This will speed up disk reads, but not disk writes, as writes will
be buffered back to the AFS servers.
Only if you have the explicit permission of that workstation's primary user. (Please
note that the sol-login machines are off-limits for such use.) If you need a large
compute job which requires several systems to finish in a reasonable time frame,
please use the Sun Grid Engine batch system, described in a very rough tutorial located
here.
One change you will notice right off is that file paths are different.
Under DCE/DFS, all software
resides in the directory /dfs. Under OpenAFS,
they reside under /afs/lions.odu.edu. This change
is due to the fact that OpenAFS was designed as a wide-area networking
protocol. (Some sites distribute software over "anonymous" AFS
connections, like /afs/sitename.edu/pub/software
for example.) One result of this is that programs paths can become quite long - currently
under DCE/DFS, most user software is located in /dfs/usr/@sys/bin.
To shrink this down somewhat (and make it easier to remember!), we've set up
symbolic links on each workstation/server to place the user software in /usr/pubsw - this
actually expands out to /afs/lions.odu.edu/@sys/pubsw.
Some of the terminology changes from DCE/DFS to Kerberos/OpenAFS. The following is a
chart of the differences:
| Terminology | DCE/DFS: | OpenAFS: |
|---|---|---|
| Authentication Name: |
What DFS calls a credential... | ...Kerberos refers to it as a ticket and OpenAFS calls it a token |
| File "Object" Name: |
What DFS calls a Fileset... | ...OpenAFS calls a Volume |
| Fileset/Volume Lengths: |
DFS fileset names can be no longer than 102 characters in length | AFS volume names can be no longer than 22 characters in length |
| Definition of ACLs: |
One of rwxcid: Files/Dirs: r = read w = write x = execute c = control Dir Only: i = insert d = delete |
One of rlidwka: Files: r = read w = write k = lock Dirs: l = lookup i = insert d = delete a = administer |
Another change is that the various commands that you will be using will be different.
The following is a list of the commands that will be changing from the old
environment to the new environment, and how they will be changing:
| Command | DCE/DFS: | OpenAFS: |
|---|---|---|
| Refreshing Credentials (DCE) or Tickets and Tokens (AFS): |
kinit <username> | afs_login <username> |
| Destroying Credentials (DCE) or Tickets and Tokens (AFS): |
kdestroy | kdestroy unlog |
| Showing ACLs: |
lsacl [-io|-ic] <dir/file path> -or- dcecp -c acl show <dir/file path> |
fs listacl -path <dir/file path> |
| Setting ACLs: |
aclmod [-hVvDFxcfdOR] [-ic|-io] <acl-entry-list> <file>+ -or- dcecp -c acl modify <dir/file path> '-add {<access list entries>}' -or- dcecp -c acl modify <dir/file path> '-change {<access list entries>}' |
fs setacl -dir <directory>+ -acl <access list entries>+ [-clear] [-negative] |
| Listing fileset/volume info: |
fts lsft [{-path {<filename> | <directory name>} | -fileset {<name> | <ID>}}] | fs examine [-path <dir/file path>+] -or- vos examine -id <volume name or ID> [-extended] |
| Listing FLDB/VLDB info: |
fts lsfldb [-fileset {<name> | <ID>}] [-server <machine>] [-aggregate <name>] | vos listvldb [-name <volume name or ID>] [-server <machine>] [-partition <name>] |
| Listing quota info: |
fts lsquota [{-path {<filename> | <directory_name>}... | -fileset {<name> | <ID>}...}] | fs listquota [-path <dir/file path>+] |
| Setting fileset/volume quotas: |
fts setquota {-fileset {<name> | <ID>} -size <quota (KB)> | fs setquota [-path <dir/file path>] -max <max quota in kbytes> |
| Creating filesets/volumes: |
fts create -ftname <name> -server <machine> -aggregate <name> fts setquota {-fileset {<name> | <ID>} -size <quota (KB)> |
vos create -server <machine name> -partition <partition name> -name <volume name> [-maxquota <initial quota (KB)>] |
Another visible change is that the Solaris graphical client login will change from
Sun's dtlogin over to the GNOME Display Manager - this change is due to
the lack of good authentication for Kerberos and OpenAFS in dtlogin.
In LIONS2, we have replaced Platform Corporation's LSF batch subsystem with
Sun Grid Engine. Sun Grid Engine (or SGE for short)
is a open source project that allows one to build distributed computing
solutions. The following is a list comparing the LSF and their equivalent SGE
commands.
| Command | Platform LSF: | Sun Grid Engine: |
|---|---|---|
| Job Submission: |
bsub [-q <queuename>] <command> | qsub [-q <queuename>] <shellscript> -or- qsub [-q <queuename>] -b y <binarycommand> -or- qrsh [-q <queuename>] <binarycommand> |
| Job Status: | bjobs -u all | qstat |
| Queue Status: | bqueues | qstat -f |
For LIONS2, we have purchased the latest Sun hardware to replace the aging
systems that we have. Three Sun 280R's will replace the existing sol-login boxes, a
SunFire 2900 with eight UltraSPARC IV processors and 64 gigabytes of memory
will join Helios E10000 for High Performance Computing, as well as a new Sun Opteron-based
V20Z 32-node cluster.
To get your LIONS2 account and have your files copied over to LIONS2, here's what you'll need to do:
Once your LIONS2 account is established, please copy your files over to LIONS2 using the instructions in the next section.
If you feel uncomfortable doing this yourself, please let us
know and we can copy your files for you. This may take a
while due to the load of requests, so please bear with us.
| Service | original LIONS | LIONS2 |
|---|---|---|
| Web Services | www.lions.odu.edu | www.lions2.odu.edu |
| FTP Services | ftp.lions.odu.edu | ftp.lions2.odu.edu |
| pop.lions.odu.edu | imap.lions2.odu.edu | |
| Streaming Video Services | real.lions.odu.edu | real.lions2.odu.edu |
| Solaris Login Services | sol-login.lions.odu.edu | sol-login.lions2.odu.edu |
Before you begin, you will need to make sure your new home directory has enough
space to hold your new data. To discover this, login into your LIONS1 by issuing
an 'ssh' command to the machine rangiroa.lions.odu.edu and then issue
the following command:
fts lsquota $HOME
The output should look something like this (note that the numbers will be different):
Fileset Name Quota Used Used Aggregate home.tdamato 524288 204472 39% 72% = 25179974/34710192 (LFS)Next, log into your LIONS2 account via sol-login.lions2.odu.edu and issue this command:
Volume Name Quota Used %Used Partition home.tdamato 524288 10485 2% 36%In this case, both home directories are using 512 megabytes of storage, "home.tdamato" on LIONS1 is using 204472 kilobytes (about 203 megabytes) of storage, and "home.tdamato" on LIONS2 is using only 10485 kilobytes (about 10 megabytes) of storage. This means that there is at least 513803 kilobytes of storage left on LIONS2 to hold my LIONS1 data. If you do not have enough quota on the new system to hold the data you wish to copy over, please contact us and we can make arrangements for you.
Once you have confirmed that you have enough space to copy your files
over, again log into your LIONS1 account via rangiroa.lions.odu.edu
and, at the shell prompt, issue the
following command:
( cd $HOME; /usr/bin/tar cpf - . ) | ssh sol-login.lions2.odu.edu "/usr/bin/tar xpf -"
In a nutshell, this command is (1) creating a tar archive of your home
directory, sending the output to a UNIX pipe, (2) logging into one of the
LIONS2 sol-login using the "ssh" command, and then (3) running the "tar"
command on that machine to read input from the UNIX pipe and 'untar' the
output into your home directory. It sounds more complicated than it really
is.
NOTE! If your LIONS1 account and LIONS2 accounts are different,
you'll need to add the account name of your LIONS2 account between the 'ssh'
command and 'sol-login', like so:
( cd $HOME; /usr/bin/tar cpf - . ) | ssh l2acct@sol-login.lions2.odu.edu "/usr/bin/tar xpf -"
Again, note the command is the same, but you've now added your LIONS2 account name (indicated by l2acct) and an 'at sign' before the 'sol-login' name. You'll now get a prompt for your LIONS2 account password.
Please a couple of cautions to take once you've copied your files
over to LIONS2:
Again, if there are any questions about this process, please do not hesitate to
contact us!
Kerberos works by handing out a master authentication "ticket", called the Ticket Granting Ticket, or TGT, on login, which is then used to generate "service tickets" for various other kinds of network services, for example, OpenAFS. This TGT is presented to the OpenAFS servers to obtain a "token". Both the ticket and the token grant access to all of a user's autorized resources in LIONS2. This master ticket has a Time To Live (TTL) set by default to ten hours. Once the ticket expires, authentication fails, and the user is forced to re-authenticate. See the Ticket Expiration in the Kerberos Frequently Asked Questions page for additional details. Since the TGT has a TTL of ten hours it's likely that a "permission denied" message when trying to access your home-directory, even after login, is due to your ticket having expired. The simplest fix is to simply log in in the morning and log off in the evening, instead of leaving your session running overnight.
No. Kerberos works this way by design. Users can re-authenticate with command line tools, so if your TGT expires and you have a local shell available it's easy enough to generate a new ticket and AFS token.
Yes. But it's a VERY BAD idea to implement. We use this Kerberos feature to pass administrative level authentication out to our slave servers for cron jobs and such. However, those keyfiles are kept strictly on critical servers with carefully chosen filesystem permissions, on which no one but OCCS UNIX administrative staff has access. Think about it: if OCCS provided you with a keyfile for your Kerberos principal you would have to store that file on some workstation in one of its local filesystems such as /scratch. And if anyone EVER gained access to that file they could become YOU on every LIONS2 system at any time they chose. OCCS simply cannot do this, no matter how convenient to the LIONS2 community, for very good security reasons.
In a local shell type either:
$ kinit
password: (type in your password)
$ aklog
$
Or:
$ afs_login
password: (type in your password)
$
kinit will create a new TGT after authentication, aklog then uses this
ticket to generate an AFS authentication token from one of our AFS database
servers. The "afs_login" command combines both functions for you, so you
only have to remember one command. Once you have the new TGT and AFS token
you're good to go, without having logged out and back in.
Sure!
In a local shell type:
$ kinit
password: (type in your password)
$ aklog
$
Or:
$ afs_login
password: (type in your password)
$
Done!
See: man klist
$ klist Ticket cache: FILE:/tmp/krb5cc_1120_bZ8551 Default principal: tony@LIONS.ODU.EDU Valid starting Expires Service principal 01/27/05 16:41:20 01/28/05 02:41:20 krbtgt/LIONS.ODU.EDU@LIONS.ODU.EDU 01/27/05 16:41:26 01/28/05 02:41:20 afs@LIONS.ODU.EDU Kerberos 4 ticket cache: /tmp/tkt1120 klist: You have no tickets cached
The simple answer: use Sun Grid Engine.
The long answer: Many of the LIONS2 clients, and especially the sol-login boxes,
have access to the Sun Grid Engine software. A special routine has been coded in SGE
which will copy those tickets over to the machine which will run the job. If given
renewable tickets, a coroutine will watch over the running job and automatically
renew the tickets and get fresh tokens prior to the job's expiration. If not given
renewable tickets when issuing a 'qsub' or 'qrsh' command, SGE will remind
you with the following non-fatal warning message:
get_cred stderr: WARNING: non-renewable tickets - you may want to resubmit with 'kinit -r'
To run your job under SGE, issue the following commands:
$ afs_login -r 30d
Password for yourid@LIONS.ODU.EDU: (type your password)
$ qsub -cwd yourjob.csh
Your job 285 ("yourjob.csh") has been submitted.
$
Please do not hesitate to send an e-mail with "LIONS2" in the subject to
our LIONS WREQ help
system. We want to make sure your questions and concerns are addressed
and that the transition from LIONS1 to LIONS2 is as smooth as possible.
# # $Log: LIONS2_FAQ.html,v $ # Revision 1.14 2005/05/10 13:08:20 tony # mention 'rangiroa' to copy files # # Revision 1.13 2005/04/26 14:49:42 tony # added chart comparing LSF and SGE commands # # Revision 1.12 2005/04/12 19:28:39 tony # updated site for GDM # # Revision 1.11 2005/04/11 17:43:03 tony # added email/web warnings to copyfile section # # Revision 1.10 2005/04/11 14:34:33 tony # clean up a little and add notes refering to SGE # # Revision 1.9 2005/04/07 21:10:46 tony # rearrange switchover questions and add 'How do I login/ftp/etc into LIONS2' # # Revision 1.8 2005/04/06 20:26:07 tony # added 2.2 and renumbered # # Revision 1.7 2005/02/28 21:25:02 tony # updated copy procedure to add info about different LIONS1 vs LIONS2 names # # Revision 1.6 2005/02/10 20:13:14 tony # added some cleanups and more questions # # Revision 1.5 2005/01/27 20:22:40 tony # change references from help to unixhelp and add question 2.3 # update what's going on with the switchover # # Revision 1.4 2004/12/28 10:01:36 tony # *** empty log message *** # # Revision 1.3 2004/12/02 11:37:41 tony # *** empty log message *** # # Revision 1.2 2004/11/08 13:37:40 tony # *** empty log message *** # # Revision 1.1 2004/10/07 15:10:03 tony # Initial revision # #