source: trunk/doc/maintenance @ 22210

Revision 22210, 23.4 KB checked in by ghudson, 19 years ago (diff)
Update to use lofs and bind mounts for AFS instead of symlinks. (We currently do this on Solaris; on Linux, this is largely untested, and the fstab syntax may be slightly wrong. But I think it's right. I've definitely tested bind mounts of /afs and had them work.)
Line 
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
6
7The areas covered in this file are:
8
9  Mailing lists
10  Permissions
11  Build machines
12  The wash process
13  Imake templates
14  Release notes
15  Release cycles
16  Patch releases
17  Third-party pullups for patch releases
18  Rel-eng machines
19  Cluster information
20
21Mailing lists
22-------------
23
24Here are descriptions of the mailing lists related to the source tree:
25
26  * source-developers
27
28    For discussion of the policy and day-to-day maintenance of the
29    repository.  This is a public list, and there is a public discuss
30    archive on menelaus.
31
32  * source-reviewers
33
34    For review of changes to be checked into the repository.  To be a
35    member of this mailing list, you must have read access to the
36    non-public parts of the source tree, but you do not need to be a
37    staff member.  There is a non-public discuss archive on menelaus.
38
39  * source-commits
40
41    This mailing lists receives commit logs for all commits to the
42    repository.  This is a public mailing list.  There is a public
43    discuss archive on menelaus.
44
45  * source-diffs
46
47    This mailing list receives commit logs with diffs for all commits
48    to the repository.  To be on this mailing list, you must have read
49    access to the non-public parts of the source tree.  There is no
50    discuss archive for this list.
51
52  * source-wash
53
54    This mailing list receives mail when the wash process blows out.
55    This is a public mailing list.  There is no discuss archive for
56    this list.
57
58  * rel-eng
59
60    The release engineering mailing list.  Mail goes here about patch
61    releases and other release details.  There is a public archive on
62    menelaus.
63
64  * release-team
65
66    The mailing list for the release team, which sets policy for
67    releases.  There is a public archive on menelaus, with the name
68    "release-77".
69
70Permissions
71-----------
72
73Following are descriptions of the various groups found on the acls of
74the source tree:
75
76  * read:source
77    read:staff
78
79    These two groups have identical permissions in the repository, but
80    read:source contains artificial constructs (the builder user and
81    service principals) while read:staff contains people.  In the
82    future, highly restricted source could have access for read:source
83    and not read:staff.
84
85    Both of these groups have read access to non-public areas of the
86    source tree.
87
88  * write:staff
89
90    Contains developers with commit access to the source tree.  This
91    group has write access to the repository, but not to the
92    checked-out copy of the mainline (/mit/source).
93
94  * write:update
95
96    Contains the service principal responsible for updating
97    /mit/source.  This group has write access to /mit/source but not
98    to the repository.
99
100  * adm:source
101
102    This group has administrative access to the repository and to
103    /mit/source.
104
105system:anyuser has read access to public areas of the source tree and
106list access to the rest.  system:authuser occasionally has read access
107to areas that system:anyuser does not (synctree is the only current
108example).
109
110The script CVSROOT/afs-protections.sh in the repository makes sure the
111permissions are correct in the repository or in a working directory.
112Run it from the top level of the repository or of /mit/source, giving
113it the argument "repository" or "wd".
114
115Build machines
116--------------
117
118We do release builds in a chrooted environment to avoid damaging the
119machines we are building on.  So that builds can have access to AFS,
120we mount AFS inside the chrooted environments and make a symlink from
121/afs to the place AFS is mounted.  Each build machine has two such
122environments, one in /rel (for the release build) and one in /rel/wash
123(for the wash).  The second environment has to be located within the
124first, of course, so that AFS can be visible from both.
125
126To set up a build machine, follow these instructions after installing:
127
128  * Set the root password.
129  * Put "builder rl" in /etc/athena/access.
130  * In /etc/athena/rc.conf, set SSHD and ACCESSON to true.  Set
131    PUBLIC, and AUTOUPDATE to false.
132  * On Solaris, add a line "/afs - /rel/afs lofs - yes -" to
133    /etc/vfstab, and similarly for /rel/wash/afs.  mount /rel/afs and
134    /rel/wash/afs.
135  * On Linux, add a line "/afs /rel/afs none bind" to /etc/fstab, and
136    similarly for /rel/afs.
137  * Run "/mit/source/packs/build/makeroot.sh /rel X.Y", where X.Y is
138    the full release this build is for.
139  * Run "/mit/source/packs/build/makeroot.sh /rel/wash".
140  * Make a symlink from /rel/.srvd to the AFS srvd volume, if you're
141    at that stage.
142  * On Solaris, ensure that procfs is mounted on /rel/proc and
143    /rel/wash/proc.  (A host of system tools fail if procfs is not
144    mounted in the chroot environment.)  Add lines to /etc/vfstab to
145    make this happen at boot.
146  * On Solaris, install the Sun compiler locally.  Run:
147      cd /afs/dev.mit.edu/reference/sunpro8/packages
148      pkgadd -R /rel -a /usr/athena/lib/update/noask \
149        `cat ../installed-packages`
150    and follow the directions in
151    /afs/dev.mit.edu/reference/sunpro8/README.  Repeat for /rel/wash.
152
153Right now we have an issue doing a complete build of the source tree
154from scratch, because programs which use gdk-pixbuf-csource at build
155time (like gnome-panel) require /etc/athena/gtk-2.0/gdk-pixbuf.loaders
156to be set up.  Since we lack machinery to deal with that kind of
157problem, the workaround is to run the build at least as far as
158third/gtk2 and then run, from within the chrooted environment:
159
160  mkdir -p /etc/athena/gtk-2.0
161  gdk-pixbuf-query-loaders > /etc/athena/gtk-2.0/gdk-pixbuf.loaders
162  gtk-query-immodules-2.0 > /etc/athena/gtk-2.0/gtk.immodules
163
164The wash process
165----------------
166
167The wash process is a nightly rebuild of the source repository from
168scratch, intended to alert the source tree maintainers when someone
169checks in a change which causes the source tree to stop building.  The
170general architecture of the wash process is:
171
172  * Each night at midnight, a machine performs a cvs update of the
173    checked-out tree in /afs/dev.mit.edu/source/src-current.  If the
174    cvs update fails, the update script sends mail to
175    source-wash@mit.edu.  This machine is on read:source and
176    write:update.
177
178  * Each night at 4:30am, a machine of each architecture performs a
179    build of the tree in /rel/wash/build, using the /rel/wash chroot
180    environment.  If the build fails, the wash script copies the log
181    of the failed build into AFS and sends mail to source-wash@mit.edu
182    with the last few lines of the log.
183
184Source for the wash scripts lives in /afs/dev.mit.edu/service/wash.
185They are installed in /usr/local on the wash machines.  Logs of the
186start and end times of the wash processes on each machine live in
187/afs/dev.mit.edu/service/wash/status/`hostname`.  See "Rel-eng
188machines" below to find out which machines take part in the wash
189process.
190
191To set up the source update on a machine:
192
193  * Ensure that it is in the set of machines installed onto by
194    /afs/dev.mit.edu/service/wash/inst, and run that script to install
195    the wash scripts onto that machine.
196
197  * Set up the cron job on the machine according to
198    /afs/dev.mit.edu/service/wash/README.
199
200  * Ensure that the machine has a host key.
201
202  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
203
204  * Ensure that rcmd.machinename is in write:update.
205
206To set up the wash on a build machine:
207
208  * Ensure that it is in the set of machines installed onto by
209    /afs/dev.mit.edu/service/wash/inst, and run that script to install
210    the wash scripts onto that machine.
211
212  * Set up the cron job on the machine according to
213    /afs/dev.mit.edu/service/wash/README.
214
215  * Ensure that the machine has a host key.
216
217  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
218
219  * Ensure that rcmd.machinename is in read:source.
220
221  * Ensure that
222    /afs/dev.mit.edu/service/wash/status/machinename.mit.edu exists
223    and that rcmd.machinename has write access to it.
224
225Imake templates
226---------------
227
228We don't like imake, but we have two sets of imake templates:
229
230  * packs/build/config
231
232    These templates are the legacy Athena build system.  They are no
233    longer used by any software in the release; we install them in
234    case someone wants to build some very old software.
235
236  * packs/build/xconfig
237
238    These templates are used for building software which uses X-style
239    Imakefiles.  They may need periodic updating as new versions of X
240    are released.  These templates are full of a lot of hacks, mostly
241    because the imake model isn't really adequate for dealing with
242    third-party software and local site customizations.
243
244Release notes
245-------------
246
247There are two kinds of release notes, the system release notes and the
248user release notes.  The system release notes are more comprehensive
249and assume a higher level of technical knowledge, and are used in the
250construction of the user release notes.  It is the job of the release
251engineer to produce a set of system release notes for every release,
252with early versions towards the beginning of the release cycle.  The
253best way to make sure this happens is to maintain the system release
254notes throughout the entire development cycle.
255
256Thus, it is the job of the release engineer to watch the checkins to
257the source tree and enter a note about all user-visible changes in the
258system release notes, which live in /afs/dev.mit.edu/project/relnotes.
259Highly visible changes should appear near the beginning of the file,
260and less visible changes should appear towards the end.  Changes to
261particular subsystems should be grouped together when possible.
262
263Release cycles
264--------------
265
266Release cycles have five phases: crash and burn, alpha, beta, early,
267and the public release.  The release team has a set of criteria for
268entering and exiting each phase, which won't be covered here.  The
269following guidelines should help the release go smoothly:
270
271  * Crash and burn
272
273    This phase is for rel-eng internal testing.  The release engineer
274    needs to make sure that the current source base works well enough
275    for testers to use it and find bugs.  For crash and burn to begin,
276    the operating system support person for each platform must provide
277    a way to install or update a machine to the new version of the
278    operating system for that platform.
279
280    Each platform needs a build tree and system packs volume.  The
281    build tree should be mounted in
282    /afs/dev.mit.edu/project/release/<version>/build/<sysname>.  The
283    system packs volume should be mounted in
284    /afs/dev.mit.edu/system/<sysname>/srvd-<version>.
285
286    Each platform needs a new-release build machine to generate system
287    packs to test.  Set it up according to the directions in "Build
288    Machines" above.
289
290    To do a full build for release testing:
291
292    # Get tickets as builder and ssh to the wash machine
293    rm -rf /rel/.srvd/* /rel/.srvd/.??*
294    rm -rf /rel/build/* /rel/build/.??*
295    chroot /rel sh /mit/source-X.Y/packs/build/build.sh -l &
296
297    (It can be useful to run the ssh to the build machine inside a
298    screen session so you don't have to log out of the build machine
299    until the build is finished.)
300
301    The crash and burn machines should be identified and used to test
302    the update (and install, if possible).  System packs may be
303    regenerated at will.  The system packs volume does not need any
304    replication.
305
306    Before the transition from crash and burn to alpha, the release
307    engineer should do a sanity check on the new packs by comparing a
308    file listing of the new packs to a file listing of the previous
309    release's packs.  The release engineer should also check the list
310    of configuration files for each platform (in
311    packs/update/platform/*/configfiles) and make sure that any
312    configuration files which have changed are listed as changed in
313    the version script.  Finally, the release should be checked to
314    make sure it won't overflow partitions on any client machines.
315
316    A note on the wash: it is not especially important that the wash
317    be running during the release cycle, but currently the wash can
318    run on the new release build machine without interfering with the
319    build functions of the machine.  So after updating the wash
320    machine to the new OS for new release builds, the release engineer
321    can set up the wash right away.
322
323  * Alpha
324
325    The alpha phase is for internal testing by the release team.
326    System packs may still be regenerated at will, but the system
327    packs volume (and os volume) should be read-only so it can be
328    updated by a vos release.  Changes to the packs do not need to be
329    propagated in patch releases; testers are expected to be able to
330    ensure consistency by forcing repeat updates or reinstalling their
331    machines.
332
333    System release notes should be prepared during this phase.
334
335    Before the transition from alpha to beta, doc/third-party should
336    be checked to see if miscellaneous third-party files (the ones not
337    under the "third" hierarchy) should be updated.
338
339  * Beta
340
341    The beta phase involves outside testers.  System packs and os
342    volumes should be replicated on multiple servers, and permissions
343    should be set to avoid accidental changes (traditionally this
344    means giving write access to system:packs, a normally empty
345    group).  Changes to the packs must be propagated by patch
346    releases.
347
348    User release notes should be prepared during this phase.  Ideally,
349    no new features should be committed to the source tree during the
350    beta phase.
351
352    For the transition from beta to early:
353
354    - Prepare a release branch with a name of the form athena-8_1.
355      Tag it with athena-8_1-early.
356
357    - Create a volume with a mountpoint of the form
358      /afs/dev.mit.edu/source/src-8.1 and check out a tree on the
359      branch there.  Set the permissions by doing an fs copyacl from
360      an older source tree before the checkout, and run
361      CVSROOT/afs-permissions.sh after the checkout.  Copy over the
362      .rconf file from the src-current directory.  Have a filsys entry
363      of the form source-8.1 created for the new tree.
364
365    - attach and lock the branch source tree on each build machine.
366
367    - Do a final full build of the release from the branch source
368      tree.
369
370  * Early
371
372    The early release involves more outside testers and some cluster
373    machines.  The release should be considered ready for public
374    consumption.
375
376    The release branch should be tagged with a name of the form
377    athena-8_1-early.
378
379  * Release
380
381    The release branch should be tagged with a name of the form
382    athena-8_1-release.
383
384    Once the release has gone public, the current-release machines
385    should be updated to the release and set up as the build machines
386    for the now-current release.  Remove the /build and /.srvd
387    symlinks on the new-release build machines, and make sure the wash
388    is running on them if you didn't do so back in the crash and burn
389    phase.
390
391One thing that needs to happen externally during a release cycle, if
392there is an OS upgrade involved, is the addition of compatibility
393symlinks under the arch directories of various lockers. All of the
394lockers listed in packs/glue/specs, as well as tellme, mkserv, and
395andrew, definitely need to be hit, and the popular software lockers
396need to be hit as well. Here is a reasonable list of popular lockers
397to get in addition to the glue ones:
398
399  consult
400  games
401  gnu
402  graphics
403  outland
404  sipb
405  tcl
406  watchmaker
407  windowmanagers
408  /afs/sipb/project/tcsh
409
410In addition, the third-party software lockers need to be updated; the
411third-party software group keeps their own list.
412
413Patch releases
414--------------
415
416Once a release has hit beta test, all changes to the release must be
417propagated through patch releases.  The steps to performing a patch
418release are:
419
420  * Check in the changes on the mainline (if they apply) and on the
421    release branch and update the relevant sections of the source tree
422    in /mit/source-<version>.
423
424  * If the update needs to do anything other than track against the
425    system packs, you must prepare a version script which deals with
426    any transition issues, specifies whether to track the OS volume,
427    specifies whether to deal with a kernel update, and specifies
428    which if any configuration files need to be updated.  See the
429    update script (packs/update/do-update.sh) for details.  See
430    packs/build/update/os/*/configfiles for a list of configuration
431    files for a given platform.  The version script should be checked
432    in on the mainline and on the release branch.
433
434  * Do the remainder of the steps as "builder" on the build machine.
435    Probably the best way is to get Kerberos tickets as "builder" and
436    ssh to the build machine.
437
438  * Make sure to add symlinks under /build tree for any files you have
439    added.  Note that you probably added a build script if the update
440    needs to do anything other than track against the system packs.
441
442  * In the build tree, bump the version number in packs/build/version
443    (the symlink should be broken for this file to avoid having to
444    change it in the source tree).
445
446  * If you are going to need to update binaries that users run from
447    the packs, go into the packs and move (don't copy) them into a
448    .deleted directory at the root of the packs.  This is especially
449    important for binaries like emacs and dash which people run for
450    long periods of time, to avoid making the running processes dump
451    core when the packs are released.
452
453  * Update the read-write volume of the packs to reflect the changes
454    you've made.  You can use the build.sh script to build and install
455    specific packages, or you can use the do.sh script to build the
456    package and then install specific files (cutting and pasting from
457    the output of "gmake -n install DESTDIR=/srvd" is the safest way);
458    updating the fewest number of files is preferrable.  Remember to
459    install the version script.
460
461  * Use the build.sh script to build and install packs/build/finish.
462    This will fix ownerships and update the track lists and the like.
463
464  * It's a good idea to test the update from the read-write packs by
465    symlinking the read-write packs to /srvd on a test machine and
466    taking the update.  Note that when the machine comes back up with
467    the new version, it will probably re-attach the read-write packs,
468    so you may have to re-make the symlink if you want to test stuff
469    that's on the packs.
470
471  * At some non-offensive time, release the packs in the dev cell.
472
473  * Send mail to rel-eng saying that the patch release went out, and
474    what was in it.  (You can find many example pieces of mail in the
475    discuss archive.)  Include instructions explaining how to
476    propagate the release to the athena cell.
477
478Third-party pull-ups for patch releases
479---------------------------------------
480
481In CVS, unmodified imported files have the default branch set to
4821.1.1.  When a new version is imported, such files need no merging;
483the new version on the vendor branch automatically becomes the current
484version of the file.  This optimization reduces storage requirements
485and makes the merge step of an import faster and less error-prone, at
486the cost of rendering a third-party module inconsistent between an
487import and a merge.
488
489Due to an apparent bug in CVS (as of version 1.11.2), a commit to a
490branch may reset the default branch of an unmodified imported file as
491if the commit were to the trunk.  The practical effect for us is that
492pulling up versions of third-party packages to a release branch
493results in many files being erroneously shifted from the unmodified
494category to the modified category.
495
496To account for this problem as well as other corner cases, use the
497following procedure to pull up third-party packages for a patch
498release:
499
500  cvs co -r athena-X_Y third/module
501  cd third/module
502  cvs update -d
503  cvs update -j athena-X_Y -j HEAD
504  cvs ci
505  cd /afs/dev.mit.edu/source/repository/third/module
506  find . -name "*,v" -print0 | xargs -0 sh /tmp/vend.sh
507
508Where /tmp/vend.sh is:
509
510  #!/bin/sh
511
512  for f; do
513    if rlog -h "$f" | grep -q '^head: 1\.1$' && \
514       rlog -h "$f" | grep -q '^branch:$' && \
515       rlog -h "$f" | grep -q 'vendor: 1\.1\.1$'; then
516      rcs -bvendor "$f"
517    fi
518  done
519
520The find -print0 and xargs -0 flags are not available on the native
521Solaris versions of find and xargs, so the final step may be best
522performed under Linux.
523
524Rel-eng machines
525----------------
526
527The machine running the wash update is equal-rites.mit.edu.
528
529There are three rel-eng machines for each platform:
530
531  * A current release build machine, for doing incremental updates to
532    the last public release.  This machine may also be used by
533    developers for building software.
534
535  * A new release build machine, for building and doing incremental
536    updates to releases which are still in testing.  This machine also
537    performs the wash.  This machine may also be used by developers
538    for building software, or if they want a snapshot of the new
539    system packs to build things against.
540
541  * A crash and burn machine, usually located in the release
542    engineer's office for easy physical access.
543
544Here is a list of the rel-eng machines for each platform:
545
546                       Sun       Linux
547
548Current release build  maytag    kenmore
549New release build      downy     snuggle
550Crash and burn         pyramids  men-at-arms
551
552For reference, here are some names that fit various laundry and
553construction naming schemes:
554
555  * Washing machines: kenmore, whirlpool, ge, maytag
556  * Laundry detergents: fab, calgon, era, cheer, woolite,
557    tide, ultra-tide, purex
558  * Bleaches: clorox, ajax
559  * Fabric softeners: downy, final-touch, snuggle, bounce
560  * Heavy machinery: steam-shovel, pile-driver, dump-truck,
561    wrecking-ball, crane
562  * Construction kits: lego, capsela, technics, k-nex, playdoh,
563    construx
564  * Construction materials: rebar, two-by-four, plywood,
565    sheetrock
566  * Heavy machinery companies: caterpillar, daewoo, john-deere,
567    sumitomo
568  * Buildings: empire-state, prudential, chrysler
569
570Clusters
571--------
572
573The getcluster(8) man explains how clients interpret cluster
574information.  This section documents the clusters related to the
575release cycle, and how they should be managed.
576
577There are five clusters for each platform, each of the form
578PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
579alpha, beta, early, public) and PLATFORM is the machtype name of the
580platform.  There are two filsys entries for each platform and release
581pointing to the athena cell and dev cell system packs for the release;
582they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
583X and Y are the major and minor numbers of the release.
584
585At the crash and burn, alpha, and beta phases of the release cycle,
586the appropriate cluster (PHASE-PLATFORM) should be updated to include
587data records of the form:
588
589       Label: syslib     Data: dev-PLATFORMsys-XY X.Y t
590
591This change will cause console messages to appear on the appropriate
592machines informing their maintainers of a new testing release which
593they can take manually.
594
595At the early and public phases of the release cycle, the 't' should be
596removed from the new syslib records in the crash, alpha, and beta
597clusters, and the appropriate cluster (early-PLATFORM or
598public-PLATFORM) should be updated to include data records:
599
600       Label: syslib     Data: athena-PLATFORMsys-XY X.Y
601
602This change will cause AUTOUPDATE machines in the appropriate cluster
603(as well as the crash, alpha, and beta clusters) to take the new
604release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.