wiki:HowAthenaWorks

This is a loosely adapted version of an e-mail geofft sent to someone who e-mailed SIPB asking, more or less, how Athena works. Feel free to edit and wikify it. There are some links to other sources at the bottom.


Athena cluster machines boot off of local disk, not NFS. The system is a commodity operating system (currently Ubuntu, most recently previously RHEL 4 and Solaris 5.10 but dating back to 4.3BSD or so) that's been customized to provide integration with the Athena network services, facilities to allow serial reusability of public machines and to clean them up in some fashion when a user logs out, and if applicable a consistent UI/UX across multiple base operating systems. I assume the original motivation for this was the speed of local disk followed by the nonexistence of serious network-booting mechanisms; today, speed is still a little relevant (within the last several years we stopped using a configuration called "system packs", where large portions of the OS not relevant for booting were kept on the network) but more important is the fact that we can just use the unmodified Ubuntu etc. boot process, and only need to provide Athena customizations as additional Ubuntu packages. We use netbooting for installations and certain upgrades, though, so as to not require running around with a CD.

The login infrastructure depends on Kerberos for authentication, Hesiod for account information, and AFS for networked home directories. All of these components are now part of commodity OSes (although Kerberos and Hesiod were originally developed for Athena, and AFS for CMU's similar Project Andrew), so we just need to configure those subsystems and set up the login process to use Kerberos (e.g., pam_krb5 for Linux-PAM) and the standard library to use Hesiod for user accounts and the like (e.g., nss_hesiod for GNU libc), and Hesiod gives paths in AFS for users' home directories. These days Kerberos is extremely popular for a networked authentication system and AFS is the most full-featured networked filesystem even if less popular than NFS, but the functionality of Hesiod seems to be more commonly provided by LDAP (in fact we are planning on changing Athena workstations to use LDAP instead). If you want to set up something small, there are plenty of howtos on the web about Kerberos and LDAP as well as Kerberos and AFS, and you can use the straightforward pam_krb5 and nss_ldap configurations.

To a large extent, Athena-specific client software that doesn't change much -- the printing utilities, the Moira clients, a couple other apps like discuss and so forth -- is just provided in the form of standard Ubuntu packages and installed everywhere. We do have an auto-upgrade process on each public cluster machine so we can deploy updates within a few hours, and this has the advantage that this software is available for users who may have a lot of disconnected operation, e.g. laptop users, to install piecemeal and use with their local account.

However, a lot of software including courseware and third-party software is deployed in "lockers", a simple abstraction of symbolic links from /mit/* to longer paths in AFS; the "add" command will add to your path a directory in that locker for the appropriate architecture/OS (something like /mit/acro/arch/i386_ubuntu1004/bin for the "acro" locker, for Acrobat Reader), so you can access software by just typing e.g. "add acro", and you can make software available for other users either in your home directory or by requesting a locker. The mapping of lockers to full paths is maintained as another table in Hesiod and provided by an automounter running at /mit/. For a small setup, you can essentially just compile and install software with e.g. --prefix=/afs/mycell.example.com/software. (If you have multiple architectures you will need to also use --exec-prefix and disambiguate between architectures somehow, as the "add" command does.)

As far as printing is concerned, the current setup is a single CUPS server, printers.mit.edu (well, actually several behind a hardware load balancer), and clients configured to "BrowsePoll?" against that server. We have custom lpr/lpq/etc. commands that themselves end up calling the CUPS versions of those commands, partly to translate syntax from our previous printing system (LPRng) and partly to run lpq, lprm, etc. against the server behind the load balancer that actually owns the print job. For a smaller setup you can just run a single CUPS server, set /etc/cups/client.conf to point all clients to that server (which will break non-networked printers as a side effect), and use CUPS' own lpr/lpq/etc. -- this is what CSAIL does, for instance, since they only have a couple dozen printers in one building.

If you're interested in historical context behind the design, I'm a fan of the 20-year-old paper "Berkeley UNIX on 1000 Workstations: Athena Changes to 4.3BSD"; although operating systems have become much more modular and lots of Athena software, such as Kerberos, has gotten adopted in the world at large, surprisingly much of the paper is still relevant.

 http://geofft.mit.edu/p/berkeley-unix-on-1000-workstations.pdf

Other resources you might find helpful are a list by the previous Athena release engineer, Greg Hudson, on what services Athena offered (as of 2007, when IS&T was starting to design what would merge into Debathena):

 http://diswww.mit.edu/menelaus/release-team/5846

and a somewhat older writeup from him on "How Athena Works":

 http://web.mit.edu/ghudson/info/athena