LIONS2 Frequently Asked Questions

Last updated:

0.0 AN INTRODUCTION TO THE LIONS2 ENVIRONMENT 1.0 GENERAL QUESTIONS ABOUT LIONS2:
2.0 QUESTIONS ABOUT THE SWITCHOVER:
3.0 KERBEROS TICKET AND AFS TOKEN QUESTIONS:
9.9 What if I have questions that are not answered here?

0.0 AN INTRODUCTION TO THE LIONS2 ENVIRONMENT

Welcome to the LIONS2 FAQ! The intent of this document is to announce the new UNIX LIONS2 environment to Old Dominion University faculty, staff and students, as well as to provide an area with which to document frequently asked questions with answers for future reference.

LIONS2 is the next generation of the Old Dominion University central UNIX and GNU/Linux-based computing services. Whereas the original LIONS environment (also refered to as LIONS1 in this document) used DCE and DFS for centralized account management and home-directories, LIONS2 will be built using Kerberos 5, OpenLDAP and OpenAFS for those services.

In keeping with new University standards, access to the LIONS2 environment will be set up using your Old Dominion University MIDAS (Monarch IDentification and Authorization System) user ID. MIDAS is a log-in ID and password management system that "allows the user to have one account ID and password for accessing some computing resources at the University."

We in OCCS UNIX Support sincerely hope readers find the information in this document useful. However, we also hope readers will pose new questions so that the document remains fresh with relevant information current to any future situation at hand. Please do not hesitate to contact OCCS UNIX Support with a link as reference upon noticing any factual error, new question, or confusion over any of the contents published in this FAQ.

One additional note - the layout of this FAQ and contents are based upon a similar document originally written for MIT by J. Maynard Gelinas. His original document is located here, and we thank him for allowing us to do this - Thanks again!!

0.1 Why the switch from DCE/DFS to Kerberos/OpenAFS?

Bottom line, the lessons that we learned during the deployment of the LIONS1 environment are being used to make the LIONS2 environment more flexible and easier to use.

0.2 Why MIDAS?

MIDAS is a log-in ID and password management system that stores user information and communicates that information to each of the University's networked resources. This allows the user to log in to each of those resources with the same user ID and password.

After creating your MIDAS ID and password from this site, you will use that ID and password to access other systems as soon as they become linked to MIDAS. The University Portal will be the first application to use the MIDAS ID and password. Other applications that will eventually use the MIDAS ID and password are:

 » Novell (LAN)  » LIONS2 (!)
 » Wireless LAN  » Open Jack Access
 » Blackboard  » Authenticated SMTP
 » Lotus Notes

When you create a MIDAS ID and password, you will also create a security profile - a group of questions/answers that will make it possible for you to reset your MIDAS password on-line even if you do not know the current password. Guidelines for creating your MIDAS ID and password are provided on the ID creation page.


1.0 GENERAL QUESTIONS ABOUT LIONS2:

1.1 What is Kerberos, OpenAFS, and LDAP?

Kerberos FAQ

AFS FAQ (a little dated, but still complete)

LDAP FAQ


1.2 How are Kerberos, OpenAFS, and LDAP used with LIONS2?

MIT Kerberos 5 is used as the primary authentication method. OpenAFS, a distributed network filesystem, is used to serve home directories, application binaries, and project space. OpenLDAP is used to serve Username to UID/GID mappings, Full Names, and /path/to/shell (it's an /etc/passwd map without any passwords, served across the network).

1.3 Where are the servers located?

The primary Kerberos, OpenAFS database, and OpenLDAP servers are all located in the OCCS machine room in the new Engineering and Computational Sciences building. These are the "top level" servers which hold master copies of each database. Each fileserver has internal hardware SCSI RAID connected to the server host with about 128GB of disk space available on the main file server and 250GB of disk available on each of the two auxilary file servers.

1.4 What happens if a server fails?

If a Kerberos, LDAP, or OpenAFS database server fails very little happens, the clients just fall over to another server and go on their merry way. However, if a fileserver fails, every Read/Write volume served from that fileserver goes offline. Basically, this means that every home-directory volume served from that fileserver will be unavailable until that server comes back online or we restore all the affected volumes from tape to a new fileserver. The SCSI RAID feature will help protect against disk failures on these systems. A single disk failure will not take down a file server.

1.5 Doesn't this create a single point of failure for every client?

Yes and no. Data volumes aren't tied to any individual fileserver, so the very act of restoring each volume to any live fileserver makes that volume available to the entire OpenAFS cell instantly. Note that for a RAID array to lose data the system must either experience two disk failures at once, or the host has to damage the filesystem(s) stored on that RAID beyond repair. The first case is extremely rare, the second less so for any one filesystem but extremely rare for all filesystems stored on said RAID. However, OCCS has planned for quick disaster recovery in the event of total failure of any and all hardware associated with a fileserver.

1.6 Couldn't you have used NIS or LDAP with NFS instead?

Unfortunately, NFS has it's own set of problems which make it unsuitable for serving large amounts of data and large numbers of home-directories across a WAN. Note that we actually are using LDAP to pass out Username to UID/GID mappings, full names, shell location, etc; a network served "/etc/passwd" map without any encrypted passwords, authentication being handled by Kerberos.

1.7 How is OpenAFS better?

NFS with NIS or LDAP has none of these features, with the possible exception of the last one.

1.8 Isn't serving home-directories across a network much slower than local disks?

Yes. OpenAFS mitigates the issue somewhat by caching reads on the local workstation, but OpenAFS writes will always be slower than writing to your local disk. Even ancient workstations (Older UltraSPARCs and Pentium II and below which lacks IDE DMA) will still find network writes to be slower than writing to a local disk. However, our Network group has put a lot of effort into keeping our network up to date with the most modern hardware. Most every workstation has it's own 100Mb Full Duplex connection directly to a Cisco switch, while most fileservers have a 1Gb Full Duplex SX fiber connection to each switch. In practice this means that any arbitrary workstation has a theoretical maximum OpenAFS write transfer rate of ~3MB/sec or so after subtracting layer 2 and 3 network plus OpenAFS protocol overhead. On the fileserver side it's roughly about five to seven times that, or between 15MB/sec to 21MB/sec (about 2/3rds the transfer rate of IDE ATA100 with DMA). So, a file server's network pipe should saturate given about five to seven users maxing out their network pipes at the same time, either reading new files (files not locally cached), or committing writes.

Under DCE/DFS, cache sizes were restricted to about 100 megabytes. For OpenAFS, we have increased the default cache sizes on each workstation and server to about one gigabyte of data. This will speed up disk reads, but not disk writes, as writes will be buffered back to the AFS servers.

1.9 If I can log into any LIONS2 host throughout Old Dominion University, does this mean I can run compute intensive jobs on any host I like?

Only if you have the explicit permission of that workstation's primary user. (Please note that the sol-login machines are off-limits for such use.) If you need a large compute job which requires several systems to finish in a reasonable time frame, please use the Sun Grid Engine batch system, described in a very rough tutorial located here.

1.10 What are the differences between DCE/DFS and Kerberos/OpenAFS?

File paths will be different.

One change you will notice right off is that file paths are different. Under DCE/DFS, all software resides in the directory /dfs. Under OpenAFS, they reside under /afs/lions.odu.edu. This change is due to the fact that OpenAFS was designed as a wide-area networking protocol. (Some sites distribute software over "anonymous" AFS connections, like /afs/sitename.edu/pub/software for example.) One result of this is that programs paths can become quite long - currently under DCE/DFS, most user software is located in /dfs/usr/@sys/bin. To shrink this down somewhat (and make it easier to remember!), we've set up symbolic links on each workstation/server to place the user software in /usr/pubsw - this actually expands out to /afs/lions.odu.edu/@sys/pubsw.

Terminology will change.

Some of the terminology changes from DCE/DFS to Kerberos/OpenAFS. The following is a chart of the differences:

Terminology DCE/DFS: OpenAFS:
Authentication
Name:
What DFS calls a credential... ...Kerberos refers to it as a ticket and OpenAFS calls it a token
File "Object"
Name:
What DFS calls a Fileset... ...OpenAFS calls a Volume
Fileset/Volume
Lengths:
DFS fileset names can be no longer than 102 characters in length AFS volume names can be no longer than 22 characters in length
Definition
of ACLs:
One of rwxcid:

Files/Dirs:
r = read
w = write
x = execute
c = control

Dir Only:
i = insert
d = delete
One of rlidwka:

Files:
r = read
w = write
k = lock

Dirs:
l = lookup
i = insert
d = delete
a = administer

Differences between commands

Another change is that the various commands that you will be using will be different. The following is a list of the commands that will be changing from the old environment to the new environment, and how they will be changing:

Command DCE/DFS: OpenAFS:
Refreshing Credentials (DCE)
or
Tickets and Tokens (AFS):
kinit <username> afs_login  <username>
Destroying Credentials (DCE)
or
Tickets and Tokens (AFS):
kdestroy kdestroy
unlog
Showing
ACLs:
lsacl [-io|-ic] <dir/file path>
-or-
dcecp -c acl show <dir/file path>
fs listacl -path <dir/file path>
Setting
ACLs:
aclmod [-hVvDFxcfdOR] [-ic|-io] <acl-entry-list> <file>+
-or-
dcecp -c acl modify <dir/file path> '-add {<access list entries>}'
-or-
dcecp -c acl modify <dir/file path> '-change {<access list entries>}'
fs setacl -dir <directory>+ -acl <access list entries>+ [-clear] [-negative]
Listing
fileset/volume info:
fts lsft [{-path {<filename> | <directory name>} | -fileset {<name> | <ID>}}] fs examine [-path <dir/file path>+]
-or-
vos examine -id <volume name or ID> [-extended]
Listing
FLDB/VLDB info:
fts lsfldb [-fileset {<name> | <ID>}] [-server <machine>] [-aggregate <name>] vos listvldb [-name <volume name or ID>] [-server <machine>] [-partition <name>]
Listing
quota info:
fts lsquota [{-path {<filename> | <directory_name>}... | -fileset {<name> | <ID>}...}] fs listquota [-path <dir/file path>+]
Setting
fileset/volume
quotas:
fts setquota {-fileset {<name> | <ID>} -size <quota (KB)> fs setquota [-path <dir/file path>] -max <max quota in kbytes>
Creating
filesets/volumes:
fts create -ftname <name> -server <machine> -aggregate <name>

fts setquota {-fileset {<name> | <ID>} -size <quota (KB)>
vos create -server <machine name> -partition <partition name> -name <volume name> [-maxquota <initial quota (KB)>]

Graphical Login screen

Another visible change is that the Solaris graphical client login will change from Sun's dtlogin over to the GNOME Display Manager - this change is due to the lack of good authentication for Kerberos and OpenAFS in dtlogin.

Sun Grid Engine

In LIONS2, we have replaced Platform Corporation's LSF batch subsystem with Sun Grid Engine. Sun Grid Engine (or SGE for short) is a open source project that allows one to build distributed computing solutions. The following is a list comparing the LSF and their equivalent SGE commands.

Command Platform LSF: Sun Grid Engine:
Job
Submission:
bsub [-q <queuename>] <command> qsub [-q <queuename>] <shellscript>
-or-
qsub [-q <queuename>] -b y <binarycommand>
-or-
qrsh [-q <queuename>] <binarycommand>
Job Status: bjobs -u all qstat
Queue Status: bqueues qstat -f

Please see the section below on how to run your jobs under SGE.

Newer Hardware

For LIONS2, we have purchased the latest Sun hardware to replace the aging systems that we have. Three Sun 280R's will replace the existing sol-login boxes, a SunFire 2900 with eight UltraSPARC IV processors and 64 gigabytes of memory will join Helios E10000 for High Performance Computing, as well as a new Sun Opteron-based V20Z 32-node cluster.

2.0 QUESTIONS ABOUT THE SWITCHOVER:

2.1 How are you handling the switch from DCE/DFS over to Kerberos/OpenAFS?

We were planning to phase in LIONS2 before the end of 2004, but unforseen circumstances have forced us to slow the migration to the end of 2004 and the beginning of 2005. To assist us in identifying those with active accounts who need to migrate from DCE/DFS to OpenAFS (and to avoid copying accounts which are no longer in use), please sign up for a MIDAS account as soon as possible.  In addition, please bookmark

http://www.lions.odu.edu/LIONS2_FAQ.html

in your web browser, as well as

http://www.lions2.odu.edu/

so you can get up-to-date news as to changes in migration schedules, et al. The new LIONS2 servers are already set up, and we have enlisted a group of "early adopters" to wring out the kinks in the new systems. If you wish to join, please obtain your MIDAS account NOW and contact us via "unixhelp" at odu.edu. Once the migration begins, please log in and help us make sure your existing applications will function.

2.2 How do I get a LIONS2 account?

To get your LIONS2 account and have your files copied over to LIONS2, here's what you'll need to do:

  1. Log into MIDAS and activate your MIDAS account (if you've not already done so.) Please go to this page:

    http://www.lions2.odu.edu:8080/lions/documentation/account_request

  2. Once your MIDAS account is activated, activate your LIONS2 account.

  3. Wait about 15 minutes for the LIONS2 to activate. If it still says "In Progress" after 30 minutes, try logging into sol-login.lions2.odu.edu or ftp.lions2.odu.edu with your MIDAS user id and password. If that works, you're okay, and ignore MIDAS. However, if after one hour you still can't log in, please let us know and we will see what happened.

Once your LIONS2 account is established, please copy your files over to LIONS2 using the instructions in the next section.

If you feel uncomfortable doing this yourself, please let us know and we can copy your files for you. This may take a while due to the load of requests, so please bear with us.

2.3 How do I login/ftp/etc into LIONS2?

We have set up LIONS2 as a parallel environment to the original LIONS. In fact, most of the system names have their LIONS2 counterparts:

Service original LIONS LIONS2
Web Services www.lions.odu.edu www.lions2.odu.edu
FTP Services ftp.lions.odu.edu ftp.lions2.odu.edu
Email pop.lions.odu.edu imap.lions2.odu.edu
Streaming Video Services real.lions.odu.edu real.lions2.odu.edu
Solaris Login Services sol-login.lions.odu.edu sol-login.lions2.odu.edu

Please see the appropriate sections of the LIONS2 web pages for more information.

2.4 What will happen to 'www.lions.odu.edu', 'ftp.lions.odu.edu', 'sol-login.lions.odu.edu' after the conversion?

We will not abandon those names. When we 'retire' the older servers (announcements will be made in advance), we will move the old names over to the newer servers. However, we will not get rid of the new names, so you may continue to use the 'lions2.odu.edu' as well as the original 'lions.odu.edu' names after the switchover.

2.5 How may I copy my files from my original LIONS account over to my LIONS2 account?

First ask yourself: "Do I really need to copy everything over?" If not, please delete what you do not want from your home directory prior to this process. Next, after reading over this process and you do not understand or have any questions, please contact us and we can walk you thru it.

Before you begin, you will need to make sure your new home directory has enough space to hold your new data. To discover this, login into your LIONS1 by issuing an 'ssh' command to the machine rangiroa.lions.odu.edu and then issue the following command:

fts lsquota $HOME

The output should look something like this (note that the numbers will be different):

Fileset Name          Quota    Used  Used   Aggregate
home.tdamato         524288  204472   39%    72% = 25179974/34710192 (LFS)
Next, log into your LIONS2 account via sol-login.lions2.odu.edu and issue this command:

fs listquota $HOME

Your output should look something like this (again, the numbers will be different):
Volume Name                   Quota      Used %Used   Partition
home.tdamato                 524288     10485    2%         36%
In this case, both home directories are using 512 megabytes of storage, "home.tdamato" on LIONS1 is using 204472 kilobytes (about 203 megabytes) of storage, and "home.tdamato" on LIONS2 is using only 10485 kilobytes (about 10 megabytes) of storage. This means that there is at least 513803 kilobytes of storage left on LIONS2 to hold my LIONS1 data. If you do not have enough quota on the new system to hold the data you wish to copy over, please contact us and we can make arrangements for you.

Once you have confirmed that you have enough space to copy your files over, again log into your LIONS1 account via rangiroa.lions.odu.edu and, at the shell prompt, issue the following command:

( cd $HOME; /usr/bin/tar cpf - . ) | ssh sol-login.lions2.odu.edu "/usr/bin/tar xpf -"

In a nutshell, this command is (1) creating a tar archive of your home directory, sending the output to a UNIX pipe, (2) logging into one of the LIONS2 sol-login using the "ssh" command, and then (3) running the "tar" command on that machine to read input from the UNIX pipe and 'untar' the output into your home directory. It sounds more complicated than it really is.

NOTE! If your LIONS1 account and LIONS2 accounts are different, you'll need to add the account name of your LIONS2 account between the 'ssh' command and 'sol-login', like so:

( cd $HOME; /usr/bin/tar cpf - . ) | ssh l2acct@sol-login.lions2.odu.edu "/usr/bin/tar xpf -"

Again, note the command is the same, but you've now added your LIONS2 account name (indicated by l2acct) and an 'at sign' before the 'sol-login' name. You'll now get a prompt for your LIONS2 account password.

Please a couple of cautions to take once you've copied your files over to LIONS2:

  1. If you've used Pine under the original LIONS, you probably have a 'Mail' directory which consists of your old email. Email under LIONS2 has changed and will be unable to use that directory as it stands. To make sure that you can still receive email, log into a LIONS2 box such as sol-login.lions2.odu.edu and issue the following command:

    chk_mailbox

    If you see a message similar to the following:

    ERROR: /afs/lions.odu.edu/home/m/myhomedir/Mail/inbox is NOT a directory!
    ERROR: Please rename /afs/lions.odu.edu/home/m/myhomedir/Mail/inbox and run this script again!


    Then you will need to rename your 'Mail' directory to 'Mail.old' and rerun the 'chk_mailbox' script to set up a proper 'Mail' directory. Please see the following page for more information:

    http://www.lions.odu.edu:8080/lions/documentation/services/email

  2. If you have web pages which you were maintaining under your old account, to make them viewable under LIONS2 you will need to issue the 'mkpublic_html' command - this will set up the proper permissions to allow the web server to access your pages. Please see the following for more information:

    http://www.lions.odu.edu:8080/lions/documentation/services/webserv

Again, if there are any questions about this process, please do not hesitate to contact us!

3.0 KERBEROS TICKET AND AFS TOKEN QUESTIONS:

3.1 Sometimes, some number of hours after login, my machine says "permission denied" when I try to access my home-directory. What's going on?

Kerberos works by handing out a master authentication "ticket", called the Ticket Granting Ticket, or TGT, on login, which is then used to generate "service tickets" for various other kinds of network services, for example, OpenAFS. This TGT is presented to the OpenAFS servers to obtain a "token". Both the ticket and the token grant access to all of a user's autorized resources in LIONS2. This master ticket has a Time To Live (TTL) set by default to ten hours. Once the ticket expires, authentication fails, and the user is forced to re-authenticate. See the Ticket Expiration in the Kerberos Frequently Asked Questions page for additional details. Since the TGT has a TTL of ten hours it's likely that a "permission denied" message when trying to access your home-directory, even after login, is due to your ticket having expired. The simplest fix is to simply log in in the morning and log off in the evening, instead of leaving your session running overnight.

3.2 Isn't there a way around this?

No. Kerberos works this way by design. Users can re-authenticate with command line tools, so if your TGT expires and you have a local shell available it's easy enough to generate a new ticket and AFS token.

3.3 Isn't it possible to extract a Kerberos principal into a keyfile for automatic authentication?

Yes. But it's a VERY BAD idea to implement. We use this Kerberos feature to pass administrative level authentication out to our slave servers for cron jobs and such. However, those keyfiles are kept strictly on critical servers with carefully chosen filesystem permissions, on which no one but OCCS UNIX administrative staff has access. Think about it: if OCCS provided you with a keyfile for your Kerberos principal you would have to store that file on some workstation in one of its local filesystems such as /scratch. And if anyone EVER gained access to that file they could become YOU on every LIONS2 system at any time they chose. OCCS simply cannot do this, no matter how convenient to the LIONS2 community, for very good security reasons.

3.4 I can't read my home-directory because my ticket expired! What am I supposed to do?

In a local shell type either:

       $ kinit
         password: (type in your password)
       $ aklog
       $

Or:

       $ afs_login
         password: (type in your password)
       $
kinit will create a new TGT after authentication, aklog then uses this ticket to generate an AFS authentication token from one of our AFS database servers. The "afs_login" command combines both functions for you, so you only have to remember one command. Once you have the new TGT and AFS token you're good to go, without having logged out and back in.

3.5 Could you please repeat how to re-authenticate a live session with kinit and aklog or afs_login?

Sure!

In a local shell type:

   $ kinit 
     password: (type in your password)
   $ aklog
   $

Or:

   $ afs_login 
     password: (type in your password)
   $
Done!

3.6 How do I find out when my tickets will expire?

See: man klist

   $ klist

Ticket cache: FILE:/tmp/krb5cc_1120_bZ8551
Default principal: tony@LIONS.ODU.EDU

Valid starting     Expires            Service principal
01/27/05 16:41:20  01/28/05 02:41:20  krbtgt/LIONS.ODU.EDU@LIONS.ODU.EDU
01/27/05 16:41:26  01/28/05 02:41:20  afs@LIONS.ODU.EDU


Kerberos 4 ticket cache: /tmp/tkt1120
klist: You have no tickets cached

3.7 If my AFS token expires over time how am I supposed to run long compute jobs?

The simple answer: use Sun Grid Engine.

The long answer: Many of the LIONS2 clients, and especially the sol-login boxes, have access to the Sun Grid Engine software. A special routine has been coded in SGE which will copy those tickets over to the machine which will run the job. If given renewable tickets, a coroutine will watch over the running job and automatically renew the tickets and get fresh tokens prior to the job's expiration. If not given renewable tickets when issuing a 'qsub' or 'qrsh' command, SGE will remind you with the following non-fatal warning message:

get_cred stderr: WARNING: non-renewable tickets - you may want to resubmit with 'kinit -r'

To run your job under SGE, issue the following commands:

	$ afs_login -r 30d
	Password for yourid@LIONS.ODU.EDU: (type your password)
	$ qsub -cwd yourjob.csh
	Your job 285 ("yourjob.csh") has been submitted.
	$

9.9 What if I have questions that are not answered here?

Please do not hesitate to send an e-mail with "LIONS2" in the subject to our LIONS WREQ help system. We want to make sure your questions and concerns are addressed and that the transition from LIONS1 to LIONS2 is as smooth as possible.


#
# $Log: LIONS2_FAQ.html,v $
# Revision 1.14  2005/05/10 13:08:20  tony
# mention 'rangiroa' to copy files
#
# Revision 1.13  2005/04/26 14:49:42  tony
# added chart comparing LSF and SGE commands
#
# Revision 1.12  2005/04/12 19:28:39  tony
# updated site for GDM
#
# Revision 1.11  2005/04/11 17:43:03  tony
# added email/web warnings to copyfile section
#
# Revision 1.10  2005/04/11 14:34:33  tony
# clean up a little and add notes refering to SGE
#
# Revision 1.9  2005/04/07 21:10:46  tony
# rearrange switchover questions and add 'How do I login/ftp/etc into LIONS2'
#
# Revision 1.8  2005/04/06 20:26:07  tony
# added 2.2 and renumbered
#
# Revision 1.7  2005/02/28 21:25:02  tony
# updated copy procedure to add info about different LIONS1 vs LIONS2 names
#
# Revision 1.6  2005/02/10 20:13:14  tony
# added some cleanups and more questions
#
# Revision 1.5  2005/01/27 20:22:40  tony
# change references from help to unixhelp and add question 2.3
# update what's going on with the switchover
#
# Revision 1.4  2004/12/28 10:01:36  tony
# *** empty log message ***
#
# Revision 1.3  2004/12/02 11:37:41  tony
# *** empty log message ***
#
# Revision 1.2  2004/11/08 13:37:40  tony
# *** empty log message ***
#
# Revision 1.1  2004/10/07 15:10:03  tony
# Initial revision
#
#