source: trunk/doc/maintenance @ 22219

Revision 22219, 23.6 KB checked in by ghudson, 19 years ago (diff)
Add notes about mount entries for /etc/mnttab on Solaris build machines.
[8938]1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
[8978]3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
7The areas covered in this file are:
[20448]9  Mailing lists
10  Permissions
11  Build machines
12  The wash process
13  Imake templates
14  Release notes
15  Release cycles
16  Patch releases
17  Third-party pullups for patch releases
18  Rel-eng machines
19  Cluster information
[8978]21Mailing lists
24Here are descriptions of the mailing lists related to the source tree:
[20448]26  * source-developers
[20448]28    For discussion of the policy and day-to-day maintenance of the
29    repository.  This is a public list, and there is a public discuss
30    archive on menelaus.
[20448]32  * source-reviewers
[20448]34    For review of changes to be checked into the repository.  To be a
35    member of this mailing list, you must have read access to the
36    non-public parts of the source tree, but you do not need to be a
37    staff member.  There is a non-public discuss archive on menelaus.
[20448]39  * source-commits
[20448]41    This mailing lists receives commit logs for all commits to the
42    repository.  This is a public mailing list.  There is a public
43    discuss archive on menelaus.
[20448]45  * source-diffs
[20448]47    This mailing list receives commit logs with diffs for all commits
48    to the repository.  To be on this mailing list, you must have read
49    access to the non-public parts of the source tree.  There is no
50    discuss archive for this list.
[20448]52  * source-wash
[20448]54    This mailing list receives mail when the wash process blows out.
55    This is a public mailing list.  There is no discuss archive for
56    this list.
[20448]58  * rel-eng
[20448]60    The release engineering mailing list.  Mail goes here about patch
61    releases and other release details.  There is a public archive on
62    menelaus.
[20448]64  * release-team
[20448]66    The mailing list for the release team, which sets policy for
[20449]67    releases.  There is a public archive on menelaus, with the name
68    "release-77".
73Following are descriptions of the various groups found on the acls of
74the source tree:
[20448]76  * read:source
77    read:staff
[20448]79    These two groups have identical permissions in the repository, but
80    read:source contains artificial constructs (the builder user and
81    service principals) while read:staff contains people.  In the
82    future, highly restricted source could have access for read:source
83    and not read:staff.
[20448]85    Both of these groups have read access to non-public areas of the
86    source tree.
[20448]88  * write:staff
[20448]90    Contains developers with commit access to the source tree.  This
91    group has write access to the repository, but not to the
92    checked-out copy of the mainline (/mit/source).
[20448]94  * write:update
[20448]96    Contains the service principal responsible for updating
97    /mit/source.  This group has write access to /mit/source but not
98    to the repository.
[20448]100  * adm:source
[20448]102    This group has administrative access to the repository and to
103    /mit/source.
105system:anyuser has read access to public areas of the source tree and
106list access to the rest.  system:authuser occasionally has read access
107to areas that system:anyuser does not (synctree is the only current
110The script CVSROOT/ in the repository makes sure the
111permissions are correct in the repository or in a working directory.
[10296]112Run it from the top level of the repository or of /mit/source, giving
113it the argument "repository" or "wd".
[16977]115Build machines
118We do release builds in a chrooted environment to avoid damaging the
119machines we are building on.  So that builds can have access to AFS,
120we mount AFS inside the chrooted environments and make a symlink from
121/afs to the place AFS is mounted.  Each build machine has two such
122environments, one in /rel (for the release build) and one in /rel/wash
123(for the wash).  The second environment has to be located within the
124first, of course, so that AFS can be visible from both.
126To set up a build machine, follow these instructions after installing:
[20448]128  * Set the root password.
129  * Put "builder rl" in /etc/athena/access.
130  * In /etc/athena/rc.conf, set SSHD and ACCESSON to true.  Set
[22210]131    PUBLIC, and AUTOUPDATE to false.
132  * On Solaris, add a line "/afs - /rel/afs lofs - yes -" to
133    /etc/vfstab, and similarly for /rel/wash/afs.  mount /rel/afs and
134    /rel/wash/afs.
[22219]135  * On Solaris, add "/etc/mnttab - /rel/etc/mnttab lofs - yes -"
136    to /etc/vfstab, and similarly for /rel/wash/etc/mnttab.  Mount
137    /rel/etc/mnttab and /rel/wash/etc/mnttab.
[22210]138  * On Linux, add a line "/afs /rel/afs none bind" to /etc/fstab, and
139    similarly for /rel/afs.
[20448]140  * Run "/mit/source/packs/build/ /rel X.Y", where X.Y is
141    the full release this build is for.
142  * Run "/mit/source/packs/build/ /rel/wash".
[21126]143  * Make a symlink from /rel/.srvd to the AFS srvd volume, if you're
144    at that stage.
[21193]145  * On Solaris, ensure that procfs is mounted on /rel/proc and
146    /rel/wash/proc.  (A host of system tools fail if procfs is not
147    mounted in the chroot environment.)  Add lines to /etc/vfstab to
148    make this happen at boot.
[21126]149  * On Solaris, install the Sun compiler locally.  Run:
[22210]150      cd /afs/
151      pkgadd -R /rel -a /usr/athena/lib/update/noask \
152        `cat ../installed-packages`
[21126]153    and follow the directions in
154    /afs/  Repeat for /rel/wash.
[21213]156Right now we have an issue doing a complete build of the source tree
157from scratch, because programs which use gdk-pixbuf-csource at build
158time (like gnome-panel) require /etc/athena/gtk-2.0/gdk-pixbuf.loaders
159to be set up.  Since we lack machinery to deal with that kind of
160problem, the workaround is to run the build at least as far as
161third/gtk2 and then run, from within the chrooted environment:
163  mkdir -p /etc/athena/gtk-2.0
164  gdk-pixbuf-query-loaders > /etc/athena/gtk-2.0/gdk-pixbuf.loaders
165  gtk-query-immodules-2.0 > /etc/athena/gtk-2.0/gtk.immodules
[8938]167The wash process
[8978]170The wash process is a nightly rebuild of the source repository from
171scratch, intended to alert the source tree maintainers when someone
172checks in a change which causes the source tree to stop building.  The
173general architecture of the wash process is:
[20448]175  * Each night at midnight, a machine performs a cvs update of the
176    checked-out tree in /afs/  If the
177    cvs update fails, the update script sends mail to
178  This machine is on read:source and
179    write:update.
[20448]181  * Each night at 4:30am, a machine of each architecture performs a
[20449]182    build of the tree in /rel/wash/build, using the /rel/wash chroot
183    environment.  If the build fails, the wash script copies the log
184    of the failed build into AFS and sends mail to
185    with the last few lines of the log.
[9003]187Source for the wash scripts lives in /afs/
[12069]188They are installed in /usr/local on the wash machines.  Logs of the
189start and end times of the wash processes on each machine live in
[13160]190/afs/`hostname`.  See "Rel-eng
191machines" below to find out which machines take part in the wash
[13162]194To set up the source update on a machine:
[20448]196  * Ensure that it is in the set of machines installed onto by
197    /afs/, and run that script to install
198    the wash scripts onto that machine.
[20448]200  * Set up the cron job on the machine according to
201    /afs/
[20449]203  * Ensure that the machine has a host key.
[20448]205  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
[20448]207  * Ensure that rcmd.machinename is in write:update.
[16977]209To set up the wash on a build machine:
[20448]211  * Ensure that it is in the set of machines installed onto by
212    /afs/, and run that script to install
213    the wash scripts onto that machine.
[20449]215  * Set up the cron job on the machine according to
[20448]216    /afs/
[20449]218  * Ensure that the machine has a host key.
[20448]220  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
[20448]222  * Ensure that rcmd.machinename is in read:source.
[20448]224  * Ensure that
225    /afs/ exists
226    and that rcmd.machinename has write access to it.
[8938]228Imake templates
[16976]231We don't like imake, but we have two sets of imake templates:
[20448]233  * packs/build/config
[20448]235    These templates are the legacy Athena build system.  They are no
236    longer used by any software in the release; we install them in
237    case someone wants to build some very old software.
[20448]239  * packs/build/xconfig
[20448]241    These templates are used for building software which uses X-style
242    Imakefiles.  They may need periodic updating as new versions of X
243    are released.  These templates are full of a lot of hacks, mostly
244    because the imake model isn't really adequate for dealing with
245    third-party software and local site customizations.
[9708]247Release notes
250There are two kinds of release notes, the system release notes and the
251user release notes.  The system release notes are more comprehensive
252and assume a higher level of technical knowledge, and are used in the
253construction of the user release notes.  It is the job of the release
254engineer to produce a set of system release notes for every release,
255with early versions towards the beginning of the release cycle.  The
256best way to make sure this happens is to maintain the system release
257notes throughout the entire development cycle.
259Thus, it is the job of the release engineer to watch the checkins to
260the source tree and enter a note about all user-visible changes in the
261system release notes, which live in /afs/
262Highly visible changes should appear near the beginning of the file,
263and less visible changes should appear towards the end.  Changes to
264particular subsystems should be grouped together when possible.
[8938]266Release cycles
[9869]269Release cycles have five phases: crash and burn, alpha, beta, early,
270and the public release.  The release team has a set of criteria for
271entering and exiting each phase, which won't be covered here.  The
272following guidelines should help the release go smoothly:
[20448]274  * Crash and burn
[20448]276    This phase is for rel-eng internal testing.  The release engineer
277    needs to make sure that the current source base works well enough
278    for testers to use it and find bugs.  For crash and burn to begin,
279    the operating system support person for each platform must provide
280    a way to install or update a machine to the new version of the
281    operating system for that platform.
[20448]283    Each platform needs a build tree and system packs volume.  The
284    build tree should be mounted in
285    /afs/<version>/build/<sysname>.  The
286    system packs volume should be mounted in
287    /afs/<sysname>/srvd-<version>.
[20448]289    Each platform needs a new-release build machine to generate system
290    packs to test.  Set it up according to the directions in "Build
291    Machines" above.
[20448]293    To do a full build for release testing:
[20448]295    # Get tickets as builder and ssh to the wash machine
296    rm -rf /rel/.srvd/* /rel/.srvd/.??*
297    rm -rf /rel/build/* /rel/build/.??*
298    chroot /rel sh /mit/source-X.Y/packs/build/ -l &
[20448]300    (It can be useful to run the ssh to the build machine inside a
301    screen session so you don't have to log out of the build machine
302    until the build is finished.)
[20448]304    The crash and burn machines should be identified and used to test
305    the update (and install, if possible).  System packs may be
306    regenerated at will.  The system packs volume does not need any
307    replication.
[20448]309    Before the transition from crash and burn to alpha, the release
310    engineer should do a sanity check on the new packs by comparing a
311    file listing of the new packs to a file listing of the previous
312    release's packs.  The release engineer should also check the list
313    of configuration files for each platform (in
314    packs/update/platform/*/configfiles) and make sure that any
315    configuration files which have changed are listed as changed in
316    the version script.  Finally, the release should be checked to
[20449]317    make sure it won't overflow partitions on any client machines.
[20448]319    A note on the wash: it is not especially important that the wash
320    be running during the release cycle, but currently the wash can
321    run on the new release build machine without interfering with the
322    build functions of the machine.  So after updating the wash
323    machine to the new OS for new release builds, the release engineer
324    can set up the wash right away.
[20448]326  * Alpha
[20448]328    The alpha phase is for internal testing by the release team.
329    System packs may still be regenerated at will, but the system
330    packs volume (and os volume) should be read-only so it can be
331    updated by a vos release.  Changes to the packs do not need to be
332    propagated in patch releases; testers are expected to be able to
333    ensure consistency by forcing repeat updates or reinstalling their
334    machines.
[20449]336    System release notes should be prepared during this phase.
[20448]338    Before the transition from alpha to beta, doc/third-party should
339    be checked to see if miscellaneous third-party files (the ones not
340    under the "third" hierarchy) should be updated.
[20448]342  * Beta
[20448]344    The beta phase involves outside testers.  System packs and os
345    volumes should be replicated on multiple servers, and permissions
346    should be set to avoid accidental changes (traditionally this
347    means giving write access to system:packs, a normally empty
348    group).  Changes to the packs must be propagated by patch
349    releases.
[20449]351    User release notes should be prepared during this phase.  Ideally,
352    no new features should be committed to the source tree during the
353    beta phase.
[20448]355    For the transition from beta to early:
[20448]357    - Prepare a release branch with a name of the form athena-8_1.
358      Tag it with athena-8_1-early.
[20448]360    - Create a volume with a mountpoint of the form
361      /afs/ and check out a tree on the
362      branch there.  Set the permissions by doing an fs copyacl from
363      an older source tree before the checkout, and run
364      CVSROOT/ after the checkout.  Copy over the
365      .rconf file from the src-current directory.  Have a filsys entry
366      of the form source-8.1 created for the new tree.
[20448]368    - attach and lock the branch source tree on each build machine.
[20448]370    - Do a final full build of the release from the branch source
371      tree.
[20448]373  * Early
[20448]375    The early release involves more outside testers and some cluster
376    machines.  The release should be considered ready for public
377    consumption.
[20448]379    The release branch should be tagged with a name of the form
380    athena-8_1-early.
[20448]382  * Release
[20448]384    The release branch should be tagged with a name of the form
385    athena-8_1-release.
[20448]387    Once the release has gone public, the current-release machines
388    should be updated to the release and set up as the build machines
389    for the now-current release.  Remove the /build and /.srvd
390    symlinks on the new-release build machines, and make sure the wash
391    is running on them if you didn't do so back in the crash and burn
392    phase.
[9693]394One thing that needs to happen externally during a release cycle, if
395there is an OS upgrade involved, is the addition of compatibility
[13186]396symlinks under the arch directories of various lockers. All of the
397lockers listed in packs/glue/specs, as well as tellme, mkserv, and
398andrew, definitely need to be hit, and the popular software lockers
399need to be hit as well. Here is a reasonable list of popular lockers
400to get in addition to the glue ones:
[20448]402  consult
403  games
404  gnu
405  graphics
406  outland
407  sipb
408  tcl
409  watchmaker
410  windowmanagers
411  /afs/sipb/project/tcsh
[9869]413In addition, the third-party software lockers need to be updated; the
414third-party software group keeps their own list.
[10010]416Patch releases
419Once a release has hit beta test, all changes to the release must be
420propagated through patch releases.  The steps to performing a patch
421release are:
[20448]423  * Check in the changes on the mainline (if they apply) and on the
424    release branch and update the relevant sections of the source tree
425    in /mit/source-<version>.
[20448]427  * If the update needs to do anything other than track against the
428    system packs, you must prepare a version script which deals with
429    any transition issues, specifies whether to track the OS volume,
430    specifies whether to deal with a kernel update, and specifies
431    which if any configuration files need to be updated.  See the
432    update script (packs/update/ for details.  See
433    packs/build/update/os/*/configfiles for a list of configuration
434    files for a given platform.  The version script should be checked
435    in on the mainline and on the release branch.
[20448]437  * Do the remainder of the steps as "builder" on the build machine.
438    Probably the best way is to get Kerberos tickets as "builder" and
439    ssh to the build machine.
[20448]441  * Make sure to add symlinks under /build tree for any files you have
442    added.  Note that you probably added a build script if the update
443    needs to do anything other than track against the system packs.
[20448]445  * In the build tree, bump the version number in packs/build/version
446    (the symlink should be broken for this file to avoid having to
447    change it in the source tree).
[20448]449  * If you are going to need to update binaries that users run from
450    the packs, go into the packs and move (don't copy) them into a
451    .deleted directory at the root of the packs.  This is especially
452    important for binaries like emacs and dash which people run for
453    long periods of time, to avoid making the running processes dump
454    core when the packs are released.
[20448]456  * Update the read-write volume of the packs to reflect the changes
457    you've made.  You can use the script to build and install
458    specific packages, or you can use the script to build the
459    package and then install specific files (cutting and pasting from
460    the output of "gmake -n install DESTDIR=/srvd" is the safest way);
461    updating the fewest number of files is preferrable.  Remember to
462    install the version script.
[20448]464  * Use the script to build and install packs/build/finish.
465    This will fix ownerships and update the track lists and the like.
[20448]467  * It's a good idea to test the update from the read-write packs by
468    symlinking the read-write packs to /srvd on a test machine and
469    taking the update.  Note that when the machine comes back up with
470    the new version, it will probably re-attach the read-write packs,
471    so you may have to re-make the symlink if you want to test stuff
472    that's on the packs.
[20448]474  * At some non-offensive time, release the packs in the dev cell.
[20448]476  * Send mail to rel-eng saying that the patch release went out, and
477    what was in it.  (You can find many example pieces of mail in the
478    discuss archive.)  Include instructions explaining how to
479    propagate the release to the athena cell.
[20247]481Third-party pull-ups for patch releases
484In CVS, unmodified imported files have the default branch set to
4851.1.1.  When a new version is imported, such files need no merging;
486the new version on the vendor branch automatically becomes the current
487version of the file.  This optimization reduces storage requirements
488and makes the merge step of an import faster and less error-prone, at
489the cost of rendering a third-party module inconsistent between an
490import and a merge.
492Due to an apparent bug in CVS (as of version 1.11.2), a commit to a
493branch may reset the default branch of an unmodified imported file as
494if the commit were to the trunk.  The practical effect for us is that
495pulling up versions of third-party packages to a release branch
496results in many files being erroneously shifted from the unmodified
497category to the modified category.
499To account for this problem as well as other corner cases, use the
500following procedure to pull up third-party packages for a patch
[20448]503  cvs co -r athena-X_Y third/module
504  cd third/module
505  cvs update -d
506  cvs update -j athena-X_Y -j HEAD
507  cvs ci
508  cd /afs/
509  find . -name "*,v" -print0 | xargs -0 sh /tmp/
511Where /tmp/ is:
[20448]513  #!/bin/sh
[20448]515  for f; do
516    if rlog -h "$f" | grep -q '^head: 1\.1$' && \
517       rlog -h "$f" | grep -q '^branch:$' && \
518       rlog -h "$f" | grep -q 'vendor: 1\.1\.1$'; then
519      rcs -bvendor "$f"
520    fi
521  done
523The find -print0 and xargs -0 flags are not available on the native
524Solaris versions of find and xargs, so the final step may be best
525performed under Linux.
[9710]527Rel-eng machines
[15229]530The machine running the wash update is
[11930]532There are three rel-eng machines for each platform:
[20448]534  * A current release build machine, for doing incremental updates to
535    the last public release.  This machine may also be used by
536    developers for building software.
[20448]538  * A new release build machine, for building and doing incremental
539    updates to releases which are still in testing.  This machine also
540    performs the wash.  This machine may also be used by developers
541    for building software, or if they want a snapshot of the new
542    system packs to build things against.
[20448]544  * A crash and burn machine, usually located in the release
545    engineer's office for easy physical access.
[11930]547Here is a list of the rel-eng machines for each platform:
[20449]549                       Sun       Linux
[20449]551Current release build  maytag    kenmore
552New release build      downy     snuggle
553Crash and burn         pyramids  men-at-arms
[10294]555For reference, here are some names that fit various laundry and
556construction naming schemes:
[20448]558  * Washing machines: kenmore, whirlpool, ge, maytag
559  * Laundry detergents: fab, calgon, era, cheer, woolite,
560    tide, ultra-tide, purex
561  * Bleaches: clorox, ajax
562  * Fabric softeners: downy, final-touch, snuggle, bounce
563  * Heavy machinery: steam-shovel, pile-driver, dump-truck,
564    wrecking-ball, crane
565  * Construction kits: lego, capsela, technics, k-nex, playdoh,
566    construx
567  * Construction materials: rebar, two-by-four, plywood,
568    sheetrock
569  * Heavy machinery companies: caterpillar, daewoo, john-deere,
570    sumitomo
571  * Buildings: empire-state, prudential, chrysler
576The getcluster(8) man explains how clients interpret cluster
577information.  This section documents the clusters related to the
578release cycle, and how they should be managed.
[9588]580There are five clusters for each platform, each of the form
581PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
582alpha, beta, early, public) and PLATFORM is the machtype name of the
583platform.  There are two filsys entries for each platform and release
584pointing to the athena cell and dev cell system packs for the release;
585they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
[20449]586X and Y are the major and minor numbers of the release.
588At the crash and burn, alpha, and beta phases of the release cycle,
589the appropriate cluster (PHASE-PLATFORM) should be updated to include
590data records of the form:
[20448]592       Label: syslib     Data: dev-PLATFORMsys-XY X.Y t
594This change will cause console messages to appear on the appropriate
595machines informing their maintainers of a new testing release which
596they can take manually.
598At the early and public phases of the release cycle, the 't' should be
599removed from the new syslib records in the crash, alpha, and beta
600clusters, and the appropriate cluster (early-PLATFORM or
601public-PLATFORM) should be updated to include data records:
[20448]603       Label: syslib     Data: athena-PLATFORMsys-XY X.Y
605This change will cause AUTOUPDATE machines in the appropriate cluster
606(as well as the crash, alpha, and beta clusters) to take the new
607release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.