source: trunk/doc/maintenance @ 22210

Revision 22210, 23.4 KB checked in by ghudson, 19 years ago (diff)
Update to use lofs and bind mounts for AFS instead of symlinks. (We currently do this on Solaris; on Linux, this is largely untested, and the fstab syntax may be slightly wrong. But I think it's right. I've definitely tested bind mounts of /afs and had them work.)
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
7The areas covered in this file are:
9  Mailing lists
10  Permissions
11  Build machines
12  The wash process
13  Imake templates
14  Release notes
15  Release cycles
16  Patch releases
17  Third-party pullups for patch releases
18  Rel-eng machines
19  Cluster information
21Mailing lists
24Here are descriptions of the mailing lists related to the source tree:
26  * source-developers
28    For discussion of the policy and day-to-day maintenance of the
29    repository.  This is a public list, and there is a public discuss
30    archive on menelaus.
32  * source-reviewers
34    For review of changes to be checked into the repository.  To be a
35    member of this mailing list, you must have read access to the
36    non-public parts of the source tree, but you do not need to be a
37    staff member.  There is a non-public discuss archive on menelaus.
39  * source-commits
41    This mailing lists receives commit logs for all commits to the
42    repository.  This is a public mailing list.  There is a public
43    discuss archive on menelaus.
45  * source-diffs
47    This mailing list receives commit logs with diffs for all commits
48    to the repository.  To be on this mailing list, you must have read
49    access to the non-public parts of the source tree.  There is no
50    discuss archive for this list.
52  * source-wash
54    This mailing list receives mail when the wash process blows out.
55    This is a public mailing list.  There is no discuss archive for
56    this list.
58  * rel-eng
60    The release engineering mailing list.  Mail goes here about patch
61    releases and other release details.  There is a public archive on
62    menelaus.
64  * release-team
66    The mailing list for the release team, which sets policy for
67    releases.  There is a public archive on menelaus, with the name
68    "release-77".
73Following are descriptions of the various groups found on the acls of
74the source tree:
76  * read:source
77    read:staff
79    These two groups have identical permissions in the repository, but
80    read:source contains artificial constructs (the builder user and
81    service principals) while read:staff contains people.  In the
82    future, highly restricted source could have access for read:source
83    and not read:staff.
85    Both of these groups have read access to non-public areas of the
86    source tree.
88  * write:staff
90    Contains developers with commit access to the source tree.  This
91    group has write access to the repository, but not to the
92    checked-out copy of the mainline (/mit/source).
94  * write:update
96    Contains the service principal responsible for updating
97    /mit/source.  This group has write access to /mit/source but not
98    to the repository.
100  * adm:source
102    This group has administrative access to the repository and to
103    /mit/source.
105system:anyuser has read access to public areas of the source tree and
106list access to the rest.  system:authuser occasionally has read access
107to areas that system:anyuser does not (synctree is the only current
110The script CVSROOT/ in the repository makes sure the
111permissions are correct in the repository or in a working directory.
112Run it from the top level of the repository or of /mit/source, giving
113it the argument "repository" or "wd".
115Build machines
118We do release builds in a chrooted environment to avoid damaging the
119machines we are building on.  So that builds can have access to AFS,
120we mount AFS inside the chrooted environments and make a symlink from
121/afs to the place AFS is mounted.  Each build machine has two such
122environments, one in /rel (for the release build) and one in /rel/wash
123(for the wash).  The second environment has to be located within the
124first, of course, so that AFS can be visible from both.
126To set up a build machine, follow these instructions after installing:
128  * Set the root password.
129  * Put "builder rl" in /etc/athena/access.
130  * In /etc/athena/rc.conf, set SSHD and ACCESSON to true.  Set
131    PUBLIC, and AUTOUPDATE to false.
132  * On Solaris, add a line "/afs - /rel/afs lofs - yes -" to
133    /etc/vfstab, and similarly for /rel/wash/afs.  mount /rel/afs and
134    /rel/wash/afs.
135  * On Linux, add a line "/afs /rel/afs none bind" to /etc/fstab, and
136    similarly for /rel/afs.
137  * Run "/mit/source/packs/build/ /rel X.Y", where X.Y is
138    the full release this build is for.
139  * Run "/mit/source/packs/build/ /rel/wash".
140  * Make a symlink from /rel/.srvd to the AFS srvd volume, if you're
141    at that stage.
142  * On Solaris, ensure that procfs is mounted on /rel/proc and
143    /rel/wash/proc.  (A host of system tools fail if procfs is not
144    mounted in the chroot environment.)  Add lines to /etc/vfstab to
145    make this happen at boot.
146  * On Solaris, install the Sun compiler locally.  Run:
147      cd /afs/
148      pkgadd -R /rel -a /usr/athena/lib/update/noask \
149        `cat ../installed-packages`
150    and follow the directions in
151    /afs/  Repeat for /rel/wash.
153Right now we have an issue doing a complete build of the source tree
154from scratch, because programs which use gdk-pixbuf-csource at build
155time (like gnome-panel) require /etc/athena/gtk-2.0/gdk-pixbuf.loaders
156to be set up.  Since we lack machinery to deal with that kind of
157problem, the workaround is to run the build at least as far as
158third/gtk2 and then run, from within the chrooted environment:
160  mkdir -p /etc/athena/gtk-2.0
161  gdk-pixbuf-query-loaders > /etc/athena/gtk-2.0/gdk-pixbuf.loaders
162  gtk-query-immodules-2.0 > /etc/athena/gtk-2.0/gtk.immodules
164The wash process
167The wash process is a nightly rebuild of the source repository from
168scratch, intended to alert the source tree maintainers when someone
169checks in a change which causes the source tree to stop building.  The
170general architecture of the wash process is:
172  * Each night at midnight, a machine performs a cvs update of the
173    checked-out tree in /afs/  If the
174    cvs update fails, the update script sends mail to
175  This machine is on read:source and
176    write:update.
178  * Each night at 4:30am, a machine of each architecture performs a
179    build of the tree in /rel/wash/build, using the /rel/wash chroot
180    environment.  If the build fails, the wash script copies the log
181    of the failed build into AFS and sends mail to
182    with the last few lines of the log.
184Source for the wash scripts lives in /afs/
185They are installed in /usr/local on the wash machines.  Logs of the
186start and end times of the wash processes on each machine live in
187/afs/`hostname`.  See "Rel-eng
188machines" below to find out which machines take part in the wash
191To set up the source update on a machine:
193  * Ensure that it is in the set of machines installed onto by
194    /afs/, and run that script to install
195    the wash scripts onto that machine.
197  * Set up the cron job on the machine according to
198    /afs/
200  * Ensure that the machine has a host key.
202  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
204  * Ensure that rcmd.machinename is in write:update.
206To set up the wash on a build machine:
208  * Ensure that it is in the set of machines installed onto by
209    /afs/, and run that script to install
210    the wash scripts onto that machine.
212  * Set up the cron job on the machine according to
213    /afs/
215  * Ensure that the machine has a host key.
217  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
219  * Ensure that rcmd.machinename is in read:source.
221  * Ensure that
222    /afs/ exists
223    and that rcmd.machinename has write access to it.
225Imake templates
228We don't like imake, but we have two sets of imake templates:
230  * packs/build/config
232    These templates are the legacy Athena build system.  They are no
233    longer used by any software in the release; we install them in
234    case someone wants to build some very old software.
236  * packs/build/xconfig
238    These templates are used for building software which uses X-style
239    Imakefiles.  They may need periodic updating as new versions of X
240    are released.  These templates are full of a lot of hacks, mostly
241    because the imake model isn't really adequate for dealing with
242    third-party software and local site customizations.
244Release notes
247There are two kinds of release notes, the system release notes and the
248user release notes.  The system release notes are more comprehensive
249and assume a higher level of technical knowledge, and are used in the
250construction of the user release notes.  It is the job of the release
251engineer to produce a set of system release notes for every release,
252with early versions towards the beginning of the release cycle.  The
253best way to make sure this happens is to maintain the system release
254notes throughout the entire development cycle.
256Thus, it is the job of the release engineer to watch the checkins to
257the source tree and enter a note about all user-visible changes in the
258system release notes, which live in /afs/
259Highly visible changes should appear near the beginning of the file,
260and less visible changes should appear towards the end.  Changes to
261particular subsystems should be grouped together when possible.
263Release cycles
266Release cycles have five phases: crash and burn, alpha, beta, early,
267and the public release.  The release team has a set of criteria for
268entering and exiting each phase, which won't be covered here.  The
269following guidelines should help the release go smoothly:
271  * Crash and burn
273    This phase is for rel-eng internal testing.  The release engineer
274    needs to make sure that the current source base works well enough
275    for testers to use it and find bugs.  For crash and burn to begin,
276    the operating system support person for each platform must provide
277    a way to install or update a machine to the new version of the
278    operating system for that platform.
280    Each platform needs a build tree and system packs volume.  The
281    build tree should be mounted in
282    /afs/<version>/build/<sysname>.  The
283    system packs volume should be mounted in
284    /afs/<sysname>/srvd-<version>.
286    Each platform needs a new-release build machine to generate system
287    packs to test.  Set it up according to the directions in "Build
288    Machines" above.
290    To do a full build for release testing:
292    # Get tickets as builder and ssh to the wash machine
293    rm -rf /rel/.srvd/* /rel/.srvd/.??*
294    rm -rf /rel/build/* /rel/build/.??*
295    chroot /rel sh /mit/source-X.Y/packs/build/ -l &
297    (It can be useful to run the ssh to the build machine inside a
298    screen session so you don't have to log out of the build machine
299    until the build is finished.)
301    The crash and burn machines should be identified and used to test
302    the update (and install, if possible).  System packs may be
303    regenerated at will.  The system packs volume does not need any
304    replication.
306    Before the transition from crash and burn to alpha, the release
307    engineer should do a sanity check on the new packs by comparing a
308    file listing of the new packs to a file listing of the previous
309    release's packs.  The release engineer should also check the list
310    of configuration files for each platform (in
311    packs/update/platform/*/configfiles) and make sure that any
312    configuration files which have changed are listed as changed in
313    the version script.  Finally, the release should be checked to
314    make sure it won't overflow partitions on any client machines.
316    A note on the wash: it is not especially important that the wash
317    be running during the release cycle, but currently the wash can
318    run on the new release build machine without interfering with the
319    build functions of the machine.  So after updating the wash
320    machine to the new OS for new release builds, the release engineer
321    can set up the wash right away.
323  * Alpha
325    The alpha phase is for internal testing by the release team.
326    System packs may still be regenerated at will, but the system
327    packs volume (and os volume) should be read-only so it can be
328    updated by a vos release.  Changes to the packs do not need to be
329    propagated in patch releases; testers are expected to be able to
330    ensure consistency by forcing repeat updates or reinstalling their
331    machines.
333    System release notes should be prepared during this phase.
335    Before the transition from alpha to beta, doc/third-party should
336    be checked to see if miscellaneous third-party files (the ones not
337    under the "third" hierarchy) should be updated.
339  * Beta
341    The beta phase involves outside testers.  System packs and os
342    volumes should be replicated on multiple servers, and permissions
343    should be set to avoid accidental changes (traditionally this
344    means giving write access to system:packs, a normally empty
345    group).  Changes to the packs must be propagated by patch
346    releases.
348    User release notes should be prepared during this phase.  Ideally,
349    no new features should be committed to the source tree during the
350    beta phase.
352    For the transition from beta to early:
354    - Prepare a release branch with a name of the form athena-8_1.
355      Tag it with athena-8_1-early.
357    - Create a volume with a mountpoint of the form
358      /afs/ and check out a tree on the
359      branch there.  Set the permissions by doing an fs copyacl from
360      an older source tree before the checkout, and run
361      CVSROOT/ after the checkout.  Copy over the
362      .rconf file from the src-current directory.  Have a filsys entry
363      of the form source-8.1 created for the new tree.
365    - attach and lock the branch source tree on each build machine.
367    - Do a final full build of the release from the branch source
368      tree.
370  * Early
372    The early release involves more outside testers and some cluster
373    machines.  The release should be considered ready for public
374    consumption.
376    The release branch should be tagged with a name of the form
377    athena-8_1-early.
379  * Release
381    The release branch should be tagged with a name of the form
382    athena-8_1-release.
384    Once the release has gone public, the current-release machines
385    should be updated to the release and set up as the build machines
386    for the now-current release.  Remove the /build and /.srvd
387    symlinks on the new-release build machines, and make sure the wash
388    is running on them if you didn't do so back in the crash and burn
389    phase.
391One thing that needs to happen externally during a release cycle, if
392there is an OS upgrade involved, is the addition of compatibility
393symlinks under the arch directories of various lockers. All of the
394lockers listed in packs/glue/specs, as well as tellme, mkserv, and
395andrew, definitely need to be hit, and the popular software lockers
396need to be hit as well. Here is a reasonable list of popular lockers
397to get in addition to the glue ones:
399  consult
400  games
401  gnu
402  graphics
403  outland
404  sipb
405  tcl
406  watchmaker
407  windowmanagers
408  /afs/sipb/project/tcsh
410In addition, the third-party software lockers need to be updated; the
411third-party software group keeps their own list.
413Patch releases
416Once a release has hit beta test, all changes to the release must be
417propagated through patch releases.  The steps to performing a patch
418release are:
420  * Check in the changes on the mainline (if they apply) and on the
421    release branch and update the relevant sections of the source tree
422    in /mit/source-<version>.
424  * If the update needs to do anything other than track against the
425    system packs, you must prepare a version script which deals with
426    any transition issues, specifies whether to track the OS volume,
427    specifies whether to deal with a kernel update, and specifies
428    which if any configuration files need to be updated.  See the
429    update script (packs/update/ for details.  See
430    packs/build/update/os/*/configfiles for a list of configuration
431    files for a given platform.  The version script should be checked
432    in on the mainline and on the release branch.
434  * Do the remainder of the steps as "builder" on the build machine.
435    Probably the best way is to get Kerberos tickets as "builder" and
436    ssh to the build machine.
438  * Make sure to add symlinks under /build tree for any files you have
439    added.  Note that you probably added a build script if the update
440    needs to do anything other than track against the system packs.
442  * In the build tree, bump the version number in packs/build/version
443    (the symlink should be broken for this file to avoid having to
444    change it in the source tree).
446  * If you are going to need to update binaries that users run from
447    the packs, go into the packs and move (don't copy) them into a
448    .deleted directory at the root of the packs.  This is especially
449    important for binaries like emacs and dash which people run for
450    long periods of time, to avoid making the running processes dump
451    core when the packs are released.
453  * Update the read-write volume of the packs to reflect the changes
454    you've made.  You can use the script to build and install
455    specific packages, or you can use the script to build the
456    package and then install specific files (cutting and pasting from
457    the output of "gmake -n install DESTDIR=/srvd" is the safest way);
458    updating the fewest number of files is preferrable.  Remember to
459    install the version script.
461  * Use the script to build and install packs/build/finish.
462    This will fix ownerships and update the track lists and the like.
464  * It's a good idea to test the update from the read-write packs by
465    symlinking the read-write packs to /srvd on a test machine and
466    taking the update.  Note that when the machine comes back up with
467    the new version, it will probably re-attach the read-write packs,
468    so you may have to re-make the symlink if you want to test stuff
469    that's on the packs.
471  * At some non-offensive time, release the packs in the dev cell.
473  * Send mail to rel-eng saying that the patch release went out, and
474    what was in it.  (You can find many example pieces of mail in the
475    discuss archive.)  Include instructions explaining how to
476    propagate the release to the athena cell.
478Third-party pull-ups for patch releases
481In CVS, unmodified imported files have the default branch set to
4821.1.1.  When a new version is imported, such files need no merging;
483the new version on the vendor branch automatically becomes the current
484version of the file.  This optimization reduces storage requirements
485and makes the merge step of an import faster and less error-prone, at
486the cost of rendering a third-party module inconsistent between an
487import and a merge.
489Due to an apparent bug in CVS (as of version 1.11.2), a commit to a
490branch may reset the default branch of an unmodified imported file as
491if the commit were to the trunk.  The practical effect for us is that
492pulling up versions of third-party packages to a release branch
493results in many files being erroneously shifted from the unmodified
494category to the modified category.
496To account for this problem as well as other corner cases, use the
497following procedure to pull up third-party packages for a patch
500  cvs co -r athena-X_Y third/module
501  cd third/module
502  cvs update -d
503  cvs update -j athena-X_Y -j HEAD
504  cvs ci
505  cd /afs/
506  find . -name "*,v" -print0 | xargs -0 sh /tmp/
508Where /tmp/ is:
510  #!/bin/sh
512  for f; do
513    if rlog -h "$f" | grep -q '^head: 1\.1$' && \
514       rlog -h "$f" | grep -q '^branch:$' && \
515       rlog -h "$f" | grep -q 'vendor: 1\.1\.1$'; then
516      rcs -bvendor "$f"
517    fi
518  done
520The find -print0 and xargs -0 flags are not available on the native
521Solaris versions of find and xargs, so the final step may be best
522performed under Linux.
524Rel-eng machines
527The machine running the wash update is
529There are three rel-eng machines for each platform:
531  * A current release build machine, for doing incremental updates to
532    the last public release.  This machine may also be used by
533    developers for building software.
535  * A new release build machine, for building and doing incremental
536    updates to releases which are still in testing.  This machine also
537    performs the wash.  This machine may also be used by developers
538    for building software, or if they want a snapshot of the new
539    system packs to build things against.
541  * A crash and burn machine, usually located in the release
542    engineer's office for easy physical access.
544Here is a list of the rel-eng machines for each platform:
546                       Sun       Linux
548Current release build  maytag    kenmore
549New release build      downy     snuggle
550Crash and burn         pyramids  men-at-arms
552For reference, here are some names that fit various laundry and
553construction naming schemes:
555  * Washing machines: kenmore, whirlpool, ge, maytag
556  * Laundry detergents: fab, calgon, era, cheer, woolite,
557    tide, ultra-tide, purex
558  * Bleaches: clorox, ajax
559  * Fabric softeners: downy, final-touch, snuggle, bounce
560  * Heavy machinery: steam-shovel, pile-driver, dump-truck,
561    wrecking-ball, crane
562  * Construction kits: lego, capsela, technics, k-nex, playdoh,
563    construx
564  * Construction materials: rebar, two-by-four, plywood,
565    sheetrock
566  * Heavy machinery companies: caterpillar, daewoo, john-deere,
567    sumitomo
568  * Buildings: empire-state, prudential, chrysler
573The getcluster(8) man explains how clients interpret cluster
574information.  This section documents the clusters related to the
575release cycle, and how they should be managed.
577There are five clusters for each platform, each of the form
578PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
579alpha, beta, early, public) and PLATFORM is the machtype name of the
580platform.  There are two filsys entries for each platform and release
581pointing to the athena cell and dev cell system packs for the release;
582they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
583X and Y are the major and minor numbers of the release.
585At the crash and burn, alpha, and beta phases of the release cycle,
586the appropriate cluster (PHASE-PLATFORM) should be updated to include
587data records of the form:
589       Label: syslib     Data: dev-PLATFORMsys-XY X.Y t
591This change will cause console messages to appear on the appropriate
592machines informing their maintainers of a new testing release which
593they can take manually.
595At the early and public phases of the release cycle, the 't' should be
596removed from the new syslib records in the crash, alpha, and beta
597clusters, and the appropriate cluster (early-PLATFORM or
598public-PLATFORM) should be updated to include data records:
600       Label: syslib     Data: athena-PLATFORMsys-XY X.Y
602This change will cause AUTOUPDATE machines in the appropriate cluster
603(as well as the crash, alpha, and beta clusters) to take the new
604release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.