source: trunk/doc/maintenance @ 21193

Revision 21193, 22.9 KB checked in by ghudson, 20 years ago (diff)
Document that Solaris build machines should have procfs mounted in the chroot areas.
Line 
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
6
7The areas covered in this file are:
8
9  Mailing lists
10  Permissions
11  Build machines
12  The wash process
13  Imake templates
14  Release notes
15  Release cycles
16  Patch releases
17  Third-party pullups for patch releases
18  Rel-eng machines
19  Cluster information
20
21Mailing lists
22-------------
23
24Here are descriptions of the mailing lists related to the source tree:
25
26  * source-developers
27
28    For discussion of the policy and day-to-day maintenance of the
29    repository.  This is a public list, and there is a public discuss
30    archive on menelaus.
31
32  * source-reviewers
33
34    For review of changes to be checked into the repository.  To be a
35    member of this mailing list, you must have read access to the
36    non-public parts of the source tree, but you do not need to be a
37    staff member.  There is a non-public discuss archive on menelaus.
38
39  * source-commits
40
41    This mailing lists receives commit logs for all commits to the
42    repository.  This is a public mailing list.  There is a public
43    discuss archive on menelaus.
44
45  * source-diffs
46
47    This mailing list receives commit logs with diffs for all commits
48    to the repository.  To be on this mailing list, you must have read
49    access to the non-public parts of the source tree.  There is no
50    discuss archive for this list.
51
52  * source-wash
53
54    This mailing list receives mail when the wash process blows out.
55    This is a public mailing list.  There is no discuss archive for
56    this list.
57
58  * rel-eng
59
60    The release engineering mailing list.  Mail goes here about patch
61    releases and other release details.  There is a public archive on
62    menelaus.
63
64  * release-team
65
66    The mailing list for the release team, which sets policy for
67    releases.  There is a public archive on menelaus, with the name
68    "release-77".
69
70Permissions
71-----------
72
73Following are descriptions of the various groups found on the acls of
74the source tree:
75
76  * read:source
77    read:staff
78
79    These two groups have identical permissions in the repository, but
80    read:source contains artificial constructs (the builder user and
81    service principals) while read:staff contains people.  In the
82    future, highly restricted source could have access for read:source
83    and not read:staff.
84
85    Both of these groups have read access to non-public areas of the
86    source tree.
87
88  * write:staff
89
90    Contains developers with commit access to the source tree.  This
91    group has write access to the repository, but not to the
92    checked-out copy of the mainline (/mit/source).
93
94  * write:update
95
96    Contains the service principal responsible for updating
97    /mit/source.  This group has write access to /mit/source but not
98    to the repository.
99
100  * adm:source
101
102    This group has administrative access to the repository and to
103    /mit/source.
104
105system:anyuser has read access to public areas of the source tree and
106list access to the rest.  system:authuser occasionally has read access
107to areas that system:anyuser does not (synctree is the only current
108example).
109
110The script CVSROOT/afs-protections.sh in the repository makes sure the
111permissions are correct in the repository or in a working directory.
112Run it from the top level of the repository or of /mit/source, giving
113it the argument "repository" or "wd".
114
115Build machines
116--------------
117
118We do release builds in a chrooted environment to avoid damaging the
119machines we are building on.  So that builds can have access to AFS,
120we mount AFS inside the chrooted environments and make a symlink from
121/afs to the place AFS is mounted.  Each build machine has two such
122environments, one in /rel (for the release build) and one in /rel/wash
123(for the wash).  The second environment has to be located within the
124first, of course, so that AFS can be visible from both.
125
126To set up a build machine, follow these instructions after installing:
127
128  * Set the root password.
129  * Put "builder rl" in /etc/athena/access.
130  * In /etc/athena/rc.conf, set SSHD and ACCESSON to true.  Set
131    AFSADJUST, PUBLIC, and AUTOUPDATE to false.
132  * Create /rel/wash/afs and parents.
133  * Edit /usr/vice/etc/cacheinfo and change the AFS mountpoint from
134    "/afs" to "/rel/wash/afs".
135  * Reboot.  (Remote access daemons should work without AFS, more or
136    less.)
137  * Create symlinks /afs -> rel/wash/afs and /rel/afs -> wash/afs.
138  * Run "/mit/source/packs/build/makeroot.sh /rel X.Y", where X.Y is
139    the full release this build is for.
140  * Run "/mit/source/packs/build/makeroot.sh /rel/wash".
141  * Make a symlink from /rel/.srvd to the AFS srvd volume, if you're
142    at that stage.
143  * On Solaris, ensure that procfs is mounted on /rel/proc and
144    /rel/wash/proc.  (A host of system tools fail if procfs is not
145    mounted in the chroot environment.)  Add lines to /etc/vfstab to
146    make this happen at boot.
147  * On Solaris, install the Sun compiler locally.  Run:
148      pkgadd -R /rel -d /afs/dev.mit.edu/reference/sunpro8/packages \
149        -a /usr/athena/lib/update/noask `cat ../installed-packages`
150    and follow the directions in
151    /afs/dev.mit.edu/reference/sunpro8/README.  Repeat for /rel/wash.
152
153The wash process
154----------------
155
156The wash process is a nightly rebuild of the source repository from
157scratch, intended to alert the source tree maintainers when someone
158checks in a change which causes the source tree to stop building.  The
159general architecture of the wash process is:
160
161  * Each night at midnight, a machine performs a cvs update of the
162    checked-out tree in /afs/dev.mit.edu/source/src-current.  If the
163    cvs update fails, the update script sends mail to
164    source-wash@mit.edu.  This machine is on read:source and
165    write:update.
166
167  * Each night at 4:30am, a machine of each architecture performs a
168    build of the tree in /rel/wash/build, using the /rel/wash chroot
169    environment.  If the build fails, the wash script copies the log
170    of the failed build into AFS and sends mail to source-wash@mit.edu
171    with the last few lines of the log.
172
173Source for the wash scripts lives in /afs/dev.mit.edu/service/wash.
174They are installed in /usr/local on the wash machines.  Logs of the
175start and end times of the wash processes on each machine live in
176/afs/dev.mit.edu/service/wash/status/`hostname`.  See "Rel-eng
177machines" below to find out which machines take part in the wash
178process.
179
180To set up the source update on a machine:
181
182  * Ensure that it is in the set of machines installed onto by
183    /afs/dev.mit.edu/service/wash/inst, and run that script to install
184    the wash scripts onto that machine.
185
186  * Set up the cron job on the machine according to
187    /afs/dev.mit.edu/service/wash/README.
188
189  * Ensure that the machine has a host key.
190
191  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
192
193  * Ensure that rcmd.machinename is in write:update.
194
195To set up the wash on a build machine:
196
197  * Ensure that it is in the set of machines installed onto by
198    /afs/dev.mit.edu/service/wash/inst, and run that script to install
199    the wash scripts onto that machine.
200
201  * Set up the cron job on the machine according to
202    /afs/dev.mit.edu/service/wash/README.
203
204  * Ensure that the machine has a host key.
205
206  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
207
208  * Ensure that rcmd.machinename is in read:source.
209
210  * Ensure that
211    /afs/dev.mit.edu/service/wash/status/machinename.mit.edu exists
212    and that rcmd.machinename has write access to it.
213
214Imake templates
215---------------
216
217We don't like imake, but we have two sets of imake templates:
218
219  * packs/build/config
220
221    These templates are the legacy Athena build system.  They are no
222    longer used by any software in the release; we install them in
223    case someone wants to build some very old software.
224
225  * packs/build/xconfig
226
227    These templates are used for building software which uses X-style
228    Imakefiles.  They may need periodic updating as new versions of X
229    are released.  These templates are full of a lot of hacks, mostly
230    because the imake model isn't really adequate for dealing with
231    third-party software and local site customizations.
232
233Release notes
234-------------
235
236There are two kinds of release notes, the system release notes and the
237user release notes.  The system release notes are more comprehensive
238and assume a higher level of technical knowledge, and are used in the
239construction of the user release notes.  It is the job of the release
240engineer to produce a set of system release notes for every release,
241with early versions towards the beginning of the release cycle.  The
242best way to make sure this happens is to maintain the system release
243notes throughout the entire development cycle.
244
245Thus, it is the job of the release engineer to watch the checkins to
246the source tree and enter a note about all user-visible changes in the
247system release notes, which live in /afs/dev.mit.edu/project/relnotes.
248Highly visible changes should appear near the beginning of the file,
249and less visible changes should appear towards the end.  Changes to
250particular subsystems should be grouped together when possible.
251
252Release cycles
253--------------
254
255Release cycles have five phases: crash and burn, alpha, beta, early,
256and the public release.  The release team has a set of criteria for
257entering and exiting each phase, which won't be covered here.  The
258following guidelines should help the release go smoothly:
259
260  * Crash and burn
261
262    This phase is for rel-eng internal testing.  The release engineer
263    needs to make sure that the current source base works well enough
264    for testers to use it and find bugs.  For crash and burn to begin,
265    the operating system support person for each platform must provide
266    a way to install or update a machine to the new version of the
267    operating system for that platform.
268
269    Each platform needs a build tree and system packs volume.  The
270    build tree should be mounted in
271    /afs/dev.mit.edu/project/release/<version>/build/<sysname>.  The
272    system packs volume should be mounted in
273    /afs/dev.mit.edu/system/<sysname>/srvd-<version>.
274
275    Each platform needs a new-release build machine to generate system
276    packs to test.  Set it up according to the directions in "Build
277    Machines" above.
278
279    To do a full build for release testing:
280
281    # Get tickets as builder and ssh to the wash machine
282    rm -rf /rel/.srvd/* /rel/.srvd/.??*
283    rm -rf /rel/build/* /rel/build/.??*
284    chroot /rel sh /mit/source-X.Y/packs/build/build.sh -l &
285
286    (It can be useful to run the ssh to the build machine inside a
287    screen session so you don't have to log out of the build machine
288    until the build is finished.)
289
290    The crash and burn machines should be identified and used to test
291    the update (and install, if possible).  System packs may be
292    regenerated at will.  The system packs volume does not need any
293    replication.
294
295    Before the transition from crash and burn to alpha, the release
296    engineer should do a sanity check on the new packs by comparing a
297    file listing of the new packs to a file listing of the previous
298    release's packs.  The release engineer should also check the list
299    of configuration files for each platform (in
300    packs/update/platform/*/configfiles) and make sure that any
301    configuration files which have changed are listed as changed in
302    the version script.  Finally, the release should be checked to
303    make sure it won't overflow partitions on any client machines.
304
305    A note on the wash: it is not especially important that the wash
306    be running during the release cycle, but currently the wash can
307    run on the new release build machine without interfering with the
308    build functions of the machine.  So after updating the wash
309    machine to the new OS for new release builds, the release engineer
310    can set up the wash right away.
311
312  * Alpha
313
314    The alpha phase is for internal testing by the release team.
315    System packs may still be regenerated at will, but the system
316    packs volume (and os volume) should be read-only so it can be
317    updated by a vos release.  Changes to the packs do not need to be
318    propagated in patch releases; testers are expected to be able to
319    ensure consistency by forcing repeat updates or reinstalling their
320    machines.
321
322    System release notes should be prepared during this phase.
323
324    Before the transition from alpha to beta, doc/third-party should
325    be checked to see if miscellaneous third-party files (the ones not
326    under the "third" hierarchy) should be updated.
327
328  * Beta
329
330    The beta phase involves outside testers.  System packs and os
331    volumes should be replicated on multiple servers, and permissions
332    should be set to avoid accidental changes (traditionally this
333    means giving write access to system:packs, a normally empty
334    group).  Changes to the packs must be propagated by patch
335    releases.
336
337    User release notes should be prepared during this phase.  Ideally,
338    no new features should be committed to the source tree during the
339    beta phase.
340
341    For the transition from beta to early:
342
343    - Prepare a release branch with a name of the form athena-8_1.
344      Tag it with athena-8_1-early.
345
346    - Create a volume with a mountpoint of the form
347      /afs/dev.mit.edu/source/src-8.1 and check out a tree on the
348      branch there.  Set the permissions by doing an fs copyacl from
349      an older source tree before the checkout, and run
350      CVSROOT/afs-permissions.sh after the checkout.  Copy over the
351      .rconf file from the src-current directory.  Have a filsys entry
352      of the form source-8.1 created for the new tree.
353
354    - attach and lock the branch source tree on each build machine.
355
356    - Do a final full build of the release from the branch source
357      tree.
358
359  * Early
360
361    The early release involves more outside testers and some cluster
362    machines.  The release should be considered ready for public
363    consumption.
364
365    The release branch should be tagged with a name of the form
366    athena-8_1-early.
367
368  * Release
369
370    The release branch should be tagged with a name of the form
371    athena-8_1-release.
372
373    Once the release has gone public, the current-release machines
374    should be updated to the release and set up as the build machines
375    for the now-current release.  Remove the /build and /.srvd
376    symlinks on the new-release build machines, and make sure the wash
377    is running on them if you didn't do so back in the crash and burn
378    phase.
379
380One thing that needs to happen externally during a release cycle, if
381there is an OS upgrade involved, is the addition of compatibility
382symlinks under the arch directories of various lockers. All of the
383lockers listed in packs/glue/specs, as well as tellme, mkserv, and
384andrew, definitely need to be hit, and the popular software lockers
385need to be hit as well. Here is a reasonable list of popular lockers
386to get in addition to the glue ones:
387
388  consult
389  games
390  gnu
391  graphics
392  outland
393  sipb
394  tcl
395  watchmaker
396  windowmanagers
397  /afs/sipb/project/tcsh
398
399In addition, the third-party software lockers need to be updated; the
400third-party software group keeps their own list.
401
402Patch releases
403--------------
404
405Once a release has hit beta test, all changes to the release must be
406propagated through patch releases.  The steps to performing a patch
407release are:
408
409  * Check in the changes on the mainline (if they apply) and on the
410    release branch and update the relevant sections of the source tree
411    in /mit/source-<version>.
412
413  * If the update needs to do anything other than track against the
414    system packs, you must prepare a version script which deals with
415    any transition issues, specifies whether to track the OS volume,
416    specifies whether to deal with a kernel update, and specifies
417    which if any configuration files need to be updated.  See the
418    update script (packs/update/do-update.sh) for details.  See
419    packs/build/update/os/*/configfiles for a list of configuration
420    files for a given platform.  The version script should be checked
421    in on the mainline and on the release branch.
422
423  * Do the remainder of the steps as "builder" on the build machine.
424    Probably the best way is to get Kerberos tickets as "builder" and
425    ssh to the build machine.
426
427  * Make sure to add symlinks under /build tree for any files you have
428    added.  Note that you probably added a build script if the update
429    needs to do anything other than track against the system packs.
430
431  * In the build tree, bump the version number in packs/build/version
432    (the symlink should be broken for this file to avoid having to
433    change it in the source tree).
434
435  * If you are going to need to update binaries that users run from
436    the packs, go into the packs and move (don't copy) them into a
437    .deleted directory at the root of the packs.  This is especially
438    important for binaries like emacs and dash which people run for
439    long periods of time, to avoid making the running processes dump
440    core when the packs are released.
441
442  * Update the read-write volume of the packs to reflect the changes
443    you've made.  You can use the build.sh script to build and install
444    specific packages, or you can use the do.sh script to build the
445    package and then install specific files (cutting and pasting from
446    the output of "gmake -n install DESTDIR=/srvd" is the safest way);
447    updating the fewest number of files is preferrable.  Remember to
448    install the version script.
449
450  * Use the build.sh script to build and install packs/build/finish.
451    This will fix ownerships and update the track lists and the like.
452
453  * It's a good idea to test the update from the read-write packs by
454    symlinking the read-write packs to /srvd on a test machine and
455    taking the update.  Note that when the machine comes back up with
456    the new version, it will probably re-attach the read-write packs,
457    so you may have to re-make the symlink if you want to test stuff
458    that's on the packs.
459
460  * At some non-offensive time, release the packs in the dev cell.
461
462  * Send mail to rel-eng saying that the patch release went out, and
463    what was in it.  (You can find many example pieces of mail in the
464    discuss archive.)  Include instructions explaining how to
465    propagate the release to the athena cell.
466
467Third-party pull-ups for patch releases
468---------------------------------------
469
470In CVS, unmodified imported files have the default branch set to
4711.1.1.  When a new version is imported, such files need no merging;
472the new version on the vendor branch automatically becomes the current
473version of the file.  This optimization reduces storage requirements
474and makes the merge step of an import faster and less error-prone, at
475the cost of rendering a third-party module inconsistent between an
476import and a merge.
477
478Due to an apparent bug in CVS (as of version 1.11.2), a commit to a
479branch may reset the default branch of an unmodified imported file as
480if the commit were to the trunk.  The practical effect for us is that
481pulling up versions of third-party packages to a release branch
482results in many files being erroneously shifted from the unmodified
483category to the modified category.
484
485To account for this problem as well as other corner cases, use the
486following procedure to pull up third-party packages for a patch
487release:
488
489  cvs co -r athena-X_Y third/module
490  cd third/module
491  cvs update -d
492  cvs update -j athena-X_Y -j HEAD
493  cvs ci
494  cd /afs/dev.mit.edu/source/repository/third/module
495  find . -name "*,v" -print0 | xargs -0 sh /tmp/vend.sh
496
497Where /tmp/vend.sh is:
498
499  #!/bin/sh
500
501  for f; do
502    if rlog -h "$f" | grep -q '^head: 1\.1$' && \
503       rlog -h "$f" | grep -q '^branch:$' && \
504       rlog -h "$f" | grep -q 'vendor: 1\.1\.1$'; then
505      rcs -bvendor "$f"
506    fi
507  done
508
509The find -print0 and xargs -0 flags are not available on the native
510Solaris versions of find and xargs, so the final step may be best
511performed under Linux.
512
513Rel-eng machines
514----------------
515
516The machine running the wash update is equal-rites.mit.edu.
517
518There are three rel-eng machines for each platform:
519
520  * A current release build machine, for doing incremental updates to
521    the last public release.  This machine may also be used by
522    developers for building software.
523
524  * A new release build machine, for building and doing incremental
525    updates to releases which are still in testing.  This machine also
526    performs the wash.  This machine may also be used by developers
527    for building software, or if they want a snapshot of the new
528    system packs to build things against.
529
530  * A crash and burn machine, usually located in the release
531    engineer's office for easy physical access.
532
533Here is a list of the rel-eng machines for each platform:
534
535                       Sun       Linux
536
537Current release build  maytag    kenmore
538New release build      downy     snuggle
539Crash and burn         pyramids  men-at-arms
540
541For reference, here are some names that fit various laundry and
542construction naming schemes:
543
544  * Washing machines: kenmore, whirlpool, ge, maytag
545  * Laundry detergents: fab, calgon, era, cheer, woolite,
546    tide, ultra-tide, purex
547  * Bleaches: clorox, ajax
548  * Fabric softeners: downy, final-touch, snuggle, bounce
549  * Heavy machinery: steam-shovel, pile-driver, dump-truck,
550    wrecking-ball, crane
551  * Construction kits: lego, capsela, technics, k-nex, playdoh,
552    construx
553  * Construction materials: rebar, two-by-four, plywood,
554    sheetrock
555  * Heavy machinery companies: caterpillar, daewoo, john-deere,
556    sumitomo
557  * Buildings: empire-state, prudential, chrysler
558
559Clusters
560--------
561
562The getcluster(8) man explains how clients interpret cluster
563information.  This section documents the clusters related to the
564release cycle, and how they should be managed.
565
566There are five clusters for each platform, each of the form
567PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
568alpha, beta, early, public) and PLATFORM is the machtype name of the
569platform.  There are two filsys entries for each platform and release
570pointing to the athena cell and dev cell system packs for the release;
571they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
572X and Y are the major and minor numbers of the release.
573
574At the crash and burn, alpha, and beta phases of the release cycle,
575the appropriate cluster (PHASE-PLATFORM) should be updated to include
576data records of the form:
577
578       Label: syslib     Data: dev-PLATFORMsys-XY X.Y t
579
580This change will cause console messages to appear on the appropriate
581machines informing their maintainers of a new testing release which
582they can take manually.
583
584At the early and public phases of the release cycle, the 't' should be
585removed from the new syslib records in the crash, alpha, and beta
586clusters, and the appropriate cluster (early-PLATFORM or
587public-PLATFORM) should be updated to include data records:
588
589       Label: syslib     Data: athena-PLATFORMsys-XY X.Y
590
591This change will cause AUTOUPDATE machines in the appropriate cluster
592(as well as the crash, alpha, and beta clusters) to take the new
593release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.