source: trunk/doc/maintenance @ 20449

Revision 20449, 22.5 KB checked in by ghudson, 20 years ago (diff)
Begin updating to reflect current reality. Eliminate SGI references, expect release notes one phase later, and remove some (but by no means all) references to aspects of the srvd-oriented build system.
Line 
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
6
7The areas covered in this file are:
8
9  Mailing lists
10  Permissions
11  Build machines
12  The wash process
13  Imake templates
14  Release notes
15  Release cycles
16  Patch releases
17  Third-party pullups for patch releases
18  Rel-eng machines
19  Cluster information
20
21Mailing lists
22-------------
23
24Here are descriptions of the mailing lists related to the source tree:
25
26  * source-developers
27
28    For discussion of the policy and day-to-day maintenance of the
29    repository.  This is a public list, and there is a public discuss
30    archive on menelaus.
31
32  * source-reviewers
33
34    For review of changes to be checked into the repository.  To be a
35    member of this mailing list, you must have read access to the
36    non-public parts of the source tree, but you do not need to be a
37    staff member.  There is a non-public discuss archive on menelaus.
38
39  * source-commits
40
41    This mailing lists receives commit logs for all commits to the
42    repository.  This is a public mailing list.  There is a public
43    discuss archive on menelaus.
44
45  * source-diffs
46
47    This mailing list receives commit logs with diffs for all commits
48    to the repository.  To be on this mailing list, you must have read
49    access to the non-public parts of the source tree.  There is no
50    discuss archive for this list.
51
52  * source-wash
53
54    This mailing list receives mail when the wash process blows out.
55    This is a public mailing list.  There is no discuss archive for
56    this list.
57
58  * rel-eng
59
60    The release engineering mailing list.  Mail goes here about patch
61    releases and other release details.  There is a public archive on
62    menelaus.
63
64  * release-team
65
66    The mailing list for the release team, which sets policy for
67    releases.  There is a public archive on menelaus, with the name
68    "release-77".
69
70Permissions
71-----------
72
73Following are descriptions of the various groups found on the acls of
74the source tree:
75
76  * read:source
77    read:staff
78
79    These two groups have identical permissions in the repository, but
80    read:source contains artificial constructs (the builder user and
81    service principals) while read:staff contains people.  In the
82    future, highly restricted source could have access for read:source
83    and not read:staff.
84
85    Both of these groups have read access to non-public areas of the
86    source tree.
87
88  * write:staff
89
90    Contains developers with commit access to the source tree.  This
91    group has write access to the repository, but not to the
92    checked-out copy of the mainline (/mit/source).
93
94  * write:update
95
96    Contains the service principal responsible for updating
97    /mit/source.  This group has write access to /mit/source but not
98    to the repository.
99
100  * adm:source
101
102    This group has administrative access to the repository and to
103    /mit/source.
104
105system:anyuser has read access to public areas of the source tree and
106list access to the rest.  system:authuser occasionally has read access
107to areas that system:anyuser does not (synctree is the only current
108example).
109
110The script CVSROOT/afs-protections.sh in the repository makes sure the
111permissions are correct in the repository or in a working directory.
112Run it from the top level of the repository or of /mit/source, giving
113it the argument "repository" or "wd".
114
115Build machines
116--------------
117
118We do release builds in a chrooted environment to avoid damaging the
119machines we are building on.  So that builds can have access to AFS,
120we mount AFS inside the chrooted environments and make a symlink from
121/afs to the place AFS is mounted.  Each build machine has two such
122environments, one in /rel (for the release build) and one in /rel/wash
123(for the wash).  The second environment has to be located within the
124first, of course, so that AFS can be visible from both.
125
126To set up a build machine, follow these instructions after installing:
127
128  * Set the root password.
129  * Put "builder rl" in /etc/athena/access.
130  * In /etc/athena/rc.conf, set SSHD and ACCESSON to true.  Set
131    AFSADJUST, PUBLIC, and AUTOUPDATE to false.
132  * Create /rel/wash/afs and parents.
133  * Edit /usr/vice/etc/cacheinfo and change the AFS mountpoint from
134    "/afs" to "/rel/wash/afs".
135  * Reboot.  (Remote access daemons should work without AFS, more or
136    less.)
137  * Create symlinks /afs -> rel/wash/afs and /rel/afs -> wash/afs.
138  * Run "/mit/source/packs/build/makeroot.sh /rel X.Y", where X.Y is
139    the full release this build is for.
140  * Run "/mit/source/packs/build/makeroot.sh /rel/wash".
141  * Make a symlink from /rel/.srvd to the AFS srvd volume, and from
142    /rel/build to the AFS build volume.  (These steps can be ommitted
143    if the release cycle hasn't progressed far enough for those
144    volumes to exist.)
145
146The wash process
147----------------
148
149The wash process is a nightly rebuild of the source repository from
150scratch, intended to alert the source tree maintainers when someone
151checks in a change which causes the source tree to stop building.  The
152general architecture of the wash process is:
153
154  * Each night at midnight, a machine performs a cvs update of the
155    checked-out tree in /afs/dev.mit.edu/source/src-current.  If the
156    cvs update fails, the update script sends mail to
157    source-wash@mit.edu.  This machine is on read:source and
158    write:update.
159
160  * Each night at 4:30am, a machine of each architecture performs a
161    build of the tree in /rel/wash/build, using the /rel/wash chroot
162    environment.  If the build fails, the wash script copies the log
163    of the failed build into AFS and sends mail to source-wash@mit.edu
164    with the last few lines of the log.
165
166Source for the wash scripts lives in /afs/dev.mit.edu/service/wash.
167They are installed in /usr/local on the wash machines.  Logs of the
168start and end times of the wash processes on each machine live in
169/afs/dev.mit.edu/service/wash/status/`hostname`.  See "Rel-eng
170machines" below to find out which machines take part in the wash
171process.
172
173To set up the source update on a machine:
174
175  * Ensure that it is in the set of machines installed onto by
176    /afs/dev.mit.edu/service/wash/inst, and run that script to install
177    the wash scripts onto that machine.
178
179  * Set up the cron job on the machine according to
180    /afs/dev.mit.edu/service/wash/README.
181
182  * Ensure that the machine has a host key.
183
184  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
185
186  * Ensure that rcmd.machinename is in write:update.
187
188To set up the wash on a build machine:
189
190  * Ensure that it is in the set of machines installed onto by
191    /afs/dev.mit.edu/service/wash/inst, and run that script to install
192    the wash scripts onto that machine.
193
194  * Set up the cron job on the machine according to
195    /afs/dev.mit.edu/service/wash/README.
196
197  * Ensure that the machine has a host key.
198
199  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
200
201  * Ensure that rcmd.machinename is in read:source.
202
203  * Ensure that
204    /afs/dev.mit.edu/service/wash/status/machinename.mit.edu exists
205    and that rcmd.machinename has write access to it.
206
207Imake templates
208---------------
209
210We don't like imake, but we have two sets of imake templates:
211
212  * packs/build/config
213
214    These templates are the legacy Athena build system.  They are no
215    longer used by any software in the release; we install them in
216    case someone wants to build some very old software.
217
218  * packs/build/xconfig
219
220    These templates are used for building software which uses X-style
221    Imakefiles.  They may need periodic updating as new versions of X
222    are released.  These templates are full of a lot of hacks, mostly
223    because the imake model isn't really adequate for dealing with
224    third-party software and local site customizations.
225
226Release notes
227-------------
228
229There are two kinds of release notes, the system release notes and the
230user release notes.  The system release notes are more comprehensive
231and assume a higher level of technical knowledge, and are used in the
232construction of the user release notes.  It is the job of the release
233engineer to produce a set of system release notes for every release,
234with early versions towards the beginning of the release cycle.  The
235best way to make sure this happens is to maintain the system release
236notes throughout the entire development cycle.
237
238Thus, it is the job of the release engineer to watch the checkins to
239the source tree and enter a note about all user-visible changes in the
240system release notes, which live in /afs/dev.mit.edu/project/relnotes.
241Highly visible changes should appear near the beginning of the file,
242and less visible changes should appear towards the end.  Changes to
243particular subsystems should be grouped together when possible.
244
245Release cycles
246--------------
247
248Release cycles have five phases: crash and burn, alpha, beta, early,
249and the public release.  The release team has a set of criteria for
250entering and exiting each phase, which won't be covered here.  The
251following guidelines should help the release go smoothly:
252
253  * Crash and burn
254
255    This phase is for rel-eng internal testing.  The release engineer
256    needs to make sure that the current source base works well enough
257    for testers to use it and find bugs.  For crash and burn to begin,
258    the operating system support person for each platform must provide
259    a way to install or update a machine to the new version of the
260    operating system for that platform.
261
262    Each platform needs a build tree and system packs volume.  The
263    build tree should be mounted in
264    /afs/dev.mit.edu/project/release/<version>/build/<sysname>.  The
265    system packs volume should be mounted in
266    /afs/dev.mit.edu/system/<sysname>/srvd-<version>.
267
268    Each platform needs a new-release build machine to generate system
269    packs to test.  Set it up according to the directions in "Build
270    Machines" above.
271
272    To do a full build for release testing:
273
274    # Get tickets as builder and ssh to the wash machine
275    rm -rf /rel/.srvd/* /rel/.srvd/.??*
276    rm -rf /rel/build/* /rel/build/.??*
277    chroot /rel sh /mit/source-X.Y/packs/build/build.sh -l &
278
279    (It can be useful to run the ssh to the build machine inside a
280    screen session so you don't have to log out of the build machine
281    until the build is finished.)
282
283    The crash and burn machines should be identified and used to test
284    the update (and install, if possible).  System packs may be
285    regenerated at will.  The system packs volume does not need any
286    replication.
287
288    Before the transition from crash and burn to alpha, the release
289    engineer should do a sanity check on the new packs by comparing a
290    file listing of the new packs to a file listing of the previous
291    release's packs.  The release engineer should also check the list
292    of configuration files for each platform (in
293    packs/update/platform/*/configfiles) and make sure that any
294    configuration files which have changed are listed as changed in
295    the version script.  Finally, the release should be checked to
296    make sure it won't overflow partitions on any client machines.
297
298    A note on the wash: it is not especially important that the wash
299    be running during the release cycle, but currently the wash can
300    run on the new release build machine without interfering with the
301    build functions of the machine.  So after updating the wash
302    machine to the new OS for new release builds, the release engineer
303    can set up the wash right away.
304
305  * Alpha
306
307    The alpha phase is for internal testing by the release team.
308    System packs may still be regenerated at will, but the system
309    packs volume (and os volume) should be read-only so it can be
310    updated by a vos release.  Changes to the packs do not need to be
311    propagated in patch releases; testers are expected to be able to
312    ensure consistency by forcing repeat updates or reinstalling their
313    machines.
314
315    System release notes should be prepared during this phase.
316
317    Before the transition from alpha to beta, doc/third-party should
318    be checked to see if miscellaneous third-party files (the ones not
319    under the "third" hierarchy) should be updated.
320
321  * Beta
322
323    The beta phase involves outside testers.  System packs and os
324    volumes should be replicated on multiple servers, and permissions
325    should be set to avoid accidental changes (traditionally this
326    means giving write access to system:packs, a normally empty
327    group).  Changes to the packs must be propagated by patch
328    releases.
329
330    User release notes should be prepared during this phase.  Ideally,
331    no new features should be committed to the source tree during the
332    beta phase.
333
334    For the transition from beta to early:
335
336    - Prepare a release branch with a name of the form athena-8_1.
337      Tag it with athena-8_1-early.
338
339    - Create a volume with a mountpoint of the form
340      /afs/dev.mit.edu/source/src-8.1 and check out a tree on the
341      branch there.  Set the permissions by doing an fs copyacl from
342      an older source tree before the checkout, and run
343      CVSROOT/afs-permissions.sh after the checkout.  Copy over the
344      .rconf file from the src-current directory.  Have a filsys entry
345      of the form source-8.1 created for the new tree.
346
347    - attach and lock the branch source tree on each build machine.
348
349    - Do a final full build of the release from the branch source
350      tree.
351
352  * Early
353
354    The early release involves more outside testers and some cluster
355    machines.  The release should be considered ready for public
356    consumption.
357
358    The release branch should be tagged with a name of the form
359    athena-8_1-early.
360
361  * Release
362
363    The release branch should be tagged with a name of the form
364    athena-8_1-release.
365
366    Once the release has gone public, the current-release machines
367    should be updated to the release and set up as the build machines
368    for the now-current release.  Remove the /build and /.srvd
369    symlinks on the new-release build machines, and make sure the wash
370    is running on them if you didn't do so back in the crash and burn
371    phase.
372
373One thing that needs to happen externally during a release cycle, if
374there is an OS upgrade involved, is the addition of compatibility
375symlinks under the arch directories of various lockers. All of the
376lockers listed in packs/glue/specs, as well as tellme, mkserv, and
377andrew, definitely need to be hit, and the popular software lockers
378need to be hit as well. Here is a reasonable list of popular lockers
379to get in addition to the glue ones:
380
381  consult
382  games
383  gnu
384  graphics
385  outland
386  sipb
387  tcl
388  watchmaker
389  windowmanagers
390  /afs/sipb/project/tcsh
391
392In addition, the third-party software lockers need to be updated; the
393third-party software group keeps their own list.
394
395Patch releases
396--------------
397
398Once a release has hit beta test, all changes to the release must be
399propagated through patch releases.  The steps to performing a patch
400release are:
401
402  * Check in the changes on the mainline (if they apply) and on the
403    release branch and update the relevant sections of the source tree
404    in /mit/source-<version>.
405
406  * If the update needs to do anything other than track against the
407    system packs, you must prepare a version script which deals with
408    any transition issues, specifies whether to track the OS volume,
409    specifies whether to deal with a kernel update, and specifies
410    which if any configuration files need to be updated.  See the
411    update script (packs/update/do-update.sh) for details.  See
412    packs/build/update/os/*/configfiles for a list of configuration
413    files for a given platform.  The version script should be checked
414    in on the mainline and on the release branch.
415
416  * Do the remainder of the steps as "builder" on the build machine.
417    Probably the best way is to get Kerberos tickets as "builder" and
418    ssh to the build machine.
419
420  * Make sure to add symlinks under /build tree for any files you have
421    added.  Note that you probably added a build script if the update
422    needs to do anything other than track against the system packs.
423
424  * In the build tree, bump the version number in packs/build/version
425    (the symlink should be broken for this file to avoid having to
426    change it in the source tree).
427
428  * If you are going to need to update binaries that users run from
429    the packs, go into the packs and move (don't copy) them into a
430    .deleted directory at the root of the packs.  This is especially
431    important for binaries like emacs and dash which people run for
432    long periods of time, to avoid making the running processes dump
433    core when the packs are released.
434
435  * Update the read-write volume of the packs to reflect the changes
436    you've made.  You can use the build.sh script to build and install
437    specific packages, or you can use the do.sh script to build the
438    package and then install specific files (cutting and pasting from
439    the output of "gmake -n install DESTDIR=/srvd" is the safest way);
440    updating the fewest number of files is preferrable.  Remember to
441    install the version script.
442
443  * Use the build.sh script to build and install packs/build/finish.
444    This will fix ownerships and update the track lists and the like.
445
446  * It's a good idea to test the update from the read-write packs by
447    symlinking the read-write packs to /srvd on a test machine and
448    taking the update.  Note that when the machine comes back up with
449    the new version, it will probably re-attach the read-write packs,
450    so you may have to re-make the symlink if you want to test stuff
451    that's on the packs.
452
453  * At some non-offensive time, release the packs in the dev cell.
454
455  * Send mail to rel-eng saying that the patch release went out, and
456    what was in it.  (You can find many example pieces of mail in the
457    discuss archive.)  Include instructions explaining how to
458    propagate the release to the athena cell.
459
460Third-party pull-ups for patch releases
461---------------------------------------
462
463In CVS, unmodified imported files have the default branch set to
4641.1.1.  When a new version is imported, such files need no merging;
465the new version on the vendor branch automatically becomes the current
466version of the file.  This optimization reduces storage requirements
467and makes the merge step of an import faster and less error-prone, at
468the cost of rendering a third-party module inconsistent between an
469import and a merge.
470
471Due to an apparent bug in CVS (as of version 1.11.2), a commit to a
472branch may reset the default branch of an unmodified imported file as
473if the commit were to the trunk.  The practical effect for us is that
474pulling up versions of third-party packages to a release branch
475results in many files being erroneously shifted from the unmodified
476category to the modified category.
477
478To account for this problem as well as other corner cases, use the
479following procedure to pull up third-party packages for a patch
480release:
481
482  cvs co -r athena-X_Y third/module
483  cd third/module
484  cvs update -d
485  cvs update -j athena-X_Y -j HEAD
486  cvs ci
487  cd /afs/dev.mit.edu/source/repository/third/module
488  find . -name "*,v" -print0 | xargs -0 sh /tmp/vend.sh
489
490Where /tmp/vend.sh is:
491
492  #!/bin/sh
493
494  for f; do
495    if rlog -h "$f" | grep -q '^head: 1\.1$' && \
496       rlog -h "$f" | grep -q '^branch:$' && \
497       rlog -h "$f" | grep -q 'vendor: 1\.1\.1$'; then
498      rcs -bvendor "$f"
499    fi
500  done
501
502The find -print0 and xargs -0 flags are not available on the native
503Solaris versions of find and xargs, so the final step may be best
504performed under Linux.
505
506Rel-eng machines
507----------------
508
509The machine running the wash update is equal-rites.mit.edu.
510
511There are three rel-eng machines for each platform:
512
513  * A current release build machine, for doing incremental updates to
514    the last public release.  This machine may also be used by
515    developers for building software.
516
517  * A new release build machine, for building and doing incremental
518    updates to releases which are still in testing.  This machine also
519    performs the wash.  This machine may also be used by developers
520    for building software, or if they want a snapshot of the new
521    system packs to build things against.
522
523  * A crash and burn machine, usually located in the release
524    engineer's office for easy physical access.
525
526Here is a list of the rel-eng machines for each platform:
527
528                       Sun       Linux
529
530Current release build  maytag    kenmore
531New release build      downy     snuggle
532Crash and burn         pyramids  men-at-arms
533
534For reference, here are some names that fit various laundry and
535construction naming schemes:
536
537  * Washing machines: kenmore, whirlpool, ge, maytag
538  * Laundry detergents: fab, calgon, era, cheer, woolite,
539    tide, ultra-tide, purex
540  * Bleaches: clorox, ajax
541  * Fabric softeners: downy, final-touch, snuggle, bounce
542  * Heavy machinery: steam-shovel, pile-driver, dump-truck,
543    wrecking-ball, crane
544  * Construction kits: lego, capsela, technics, k-nex, playdoh,
545    construx
546  * Construction materials: rebar, two-by-four, plywood,
547    sheetrock
548  * Heavy machinery companies: caterpillar, daewoo, john-deere,
549    sumitomo
550  * Buildings: empire-state, prudential, chrysler
551
552Clusters
553--------
554
555The getcluster(8) man explains how clients interpret cluster
556information.  This section documents the clusters related to the
557release cycle, and how they should be managed.
558
559There are five clusters for each platform, each of the form
560PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
561alpha, beta, early, public) and PLATFORM is the machtype name of the
562platform.  There are two filsys entries for each platform and release
563pointing to the athena cell and dev cell system packs for the release;
564they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
565X and Y are the major and minor numbers of the release.
566
567At the crash and burn, alpha, and beta phases of the release cycle,
568the appropriate cluster (PHASE-PLATFORM) should be updated to include
569data records of the form:
570
571       Label: syslib     Data: dev-PLATFORMsys-XY X.Y t
572
573This change will cause console messages to appear on the appropriate
574machines informing their maintainers of a new testing release which
575they can take manually.
576
577At the early and public phases of the release cycle, the 't' should be
578removed from the new syslib records in the crash, alpha, and beta
579clusters, and the appropriate cluster (early-PLATFORM or
580public-PLATFORM) should be updated to include data records:
581
582       Label: syslib     Data: athena-PLATFORMsys-XY X.Y
583
584This change will cause AUTOUPDATE machines in the appropriate cluster
585(as well as the crash, alpha, and beta clusters) to take the new
586release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.