source: trunk/doc/maintenance @ 22219

Revision 22219, 23.6 KB checked in by ghudson, 19 years ago (diff)
Add notes about mount entries for /etc/mnttab on Solaris build machines.
Line 
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
6
7The areas covered in this file are:
8
9  Mailing lists
10  Permissions
11  Build machines
12  The wash process
13  Imake templates
14  Release notes
15  Release cycles
16  Patch releases
17  Third-party pullups for patch releases
18  Rel-eng machines
19  Cluster information
20
21Mailing lists
22-------------
23
24Here are descriptions of the mailing lists related to the source tree:
25
26  * source-developers
27
28    For discussion of the policy and day-to-day maintenance of the
29    repository.  This is a public list, and there is a public discuss
30    archive on menelaus.
31
32  * source-reviewers
33
34    For review of changes to be checked into the repository.  To be a
35    member of this mailing list, you must have read access to the
36    non-public parts of the source tree, but you do not need to be a
37    staff member.  There is a non-public discuss archive on menelaus.
38
39  * source-commits
40
41    This mailing lists receives commit logs for all commits to the
42    repository.  This is a public mailing list.  There is a public
43    discuss archive on menelaus.
44
45  * source-diffs
46
47    This mailing list receives commit logs with diffs for all commits
48    to the repository.  To be on this mailing list, you must have read
49    access to the non-public parts of the source tree.  There is no
50    discuss archive for this list.
51
52  * source-wash
53
54    This mailing list receives mail when the wash process blows out.
55    This is a public mailing list.  There is no discuss archive for
56    this list.
57
58  * rel-eng
59
60    The release engineering mailing list.  Mail goes here about patch
61    releases and other release details.  There is a public archive on
62    menelaus.
63
64  * release-team
65
66    The mailing list for the release team, which sets policy for
67    releases.  There is a public archive on menelaus, with the name
68    "release-77".
69
70Permissions
71-----------
72
73Following are descriptions of the various groups found on the acls of
74the source tree:
75
76  * read:source
77    read:staff
78
79    These two groups have identical permissions in the repository, but
80    read:source contains artificial constructs (the builder user and
81    service principals) while read:staff contains people.  In the
82    future, highly restricted source could have access for read:source
83    and not read:staff.
84
85    Both of these groups have read access to non-public areas of the
86    source tree.
87
88  * write:staff
89
90    Contains developers with commit access to the source tree.  This
91    group has write access to the repository, but not to the
92    checked-out copy of the mainline (/mit/source).
93
94  * write:update
95
96    Contains the service principal responsible for updating
97    /mit/source.  This group has write access to /mit/source but not
98    to the repository.
99
100  * adm:source
101
102    This group has administrative access to the repository and to
103    /mit/source.
104
105system:anyuser has read access to public areas of the source tree and
106list access to the rest.  system:authuser occasionally has read access
107to areas that system:anyuser does not (synctree is the only current
108example).
109
110The script CVSROOT/afs-protections.sh in the repository makes sure the
111permissions are correct in the repository or in a working directory.
112Run it from the top level of the repository or of /mit/source, giving
113it the argument "repository" or "wd".
114
115Build machines
116--------------
117
118We do release builds in a chrooted environment to avoid damaging the
119machines we are building on.  So that builds can have access to AFS,
120we mount AFS inside the chrooted environments and make a symlink from
121/afs to the place AFS is mounted.  Each build machine has two such
122environments, one in /rel (for the release build) and one in /rel/wash
123(for the wash).  The second environment has to be located within the
124first, of course, so that AFS can be visible from both.
125
126To set up a build machine, follow these instructions after installing:
127
128  * Set the root password.
129  * Put "builder rl" in /etc/athena/access.
130  * In /etc/athena/rc.conf, set SSHD and ACCESSON to true.  Set
131    PUBLIC, and AUTOUPDATE to false.
132  * On Solaris, add a line "/afs - /rel/afs lofs - yes -" to
133    /etc/vfstab, and similarly for /rel/wash/afs.  mount /rel/afs and
134    /rel/wash/afs.
135  * On Solaris, add "/etc/mnttab - /rel/etc/mnttab lofs - yes -"
136    to /etc/vfstab, and similarly for /rel/wash/etc/mnttab.  Mount
137    /rel/etc/mnttab and /rel/wash/etc/mnttab.
138  * On Linux, add a line "/afs /rel/afs none bind" to /etc/fstab, and
139    similarly for /rel/afs.
140  * Run "/mit/source/packs/build/makeroot.sh /rel X.Y", where X.Y is
141    the full release this build is for.
142  * Run "/mit/source/packs/build/makeroot.sh /rel/wash".
143  * Make a symlink from /rel/.srvd to the AFS srvd volume, if you're
144    at that stage.
145  * On Solaris, ensure that procfs is mounted on /rel/proc and
146    /rel/wash/proc.  (A host of system tools fail if procfs is not
147    mounted in the chroot environment.)  Add lines to /etc/vfstab to
148    make this happen at boot.
149  * On Solaris, install the Sun compiler locally.  Run:
150      cd /afs/dev.mit.edu/reference/sunpro8/packages
151      pkgadd -R /rel -a /usr/athena/lib/update/noask \
152        `cat ../installed-packages`
153    and follow the directions in
154    /afs/dev.mit.edu/reference/sunpro8/README.  Repeat for /rel/wash.
155
156Right now we have an issue doing a complete build of the source tree
157from scratch, because programs which use gdk-pixbuf-csource at build
158time (like gnome-panel) require /etc/athena/gtk-2.0/gdk-pixbuf.loaders
159to be set up.  Since we lack machinery to deal with that kind of
160problem, the workaround is to run the build at least as far as
161third/gtk2 and then run, from within the chrooted environment:
162
163  mkdir -p /etc/athena/gtk-2.0
164  gdk-pixbuf-query-loaders > /etc/athena/gtk-2.0/gdk-pixbuf.loaders
165  gtk-query-immodules-2.0 > /etc/athena/gtk-2.0/gtk.immodules
166
167The wash process
168----------------
169
170The wash process is a nightly rebuild of the source repository from
171scratch, intended to alert the source tree maintainers when someone
172checks in a change which causes the source tree to stop building.  The
173general architecture of the wash process is:
174
175  * Each night at midnight, a machine performs a cvs update of the
176    checked-out tree in /afs/dev.mit.edu/source/src-current.  If the
177    cvs update fails, the update script sends mail to
178    source-wash@mit.edu.  This machine is on read:source and
179    write:update.
180
181  * Each night at 4:30am, a machine of each architecture performs a
182    build of the tree in /rel/wash/build, using the /rel/wash chroot
183    environment.  If the build fails, the wash script copies the log
184    of the failed build into AFS and sends mail to source-wash@mit.edu
185    with the last few lines of the log.
186
187Source for the wash scripts lives in /afs/dev.mit.edu/service/wash.
188They are installed in /usr/local on the wash machines.  Logs of the
189start and end times of the wash processes on each machine live in
190/afs/dev.mit.edu/service/wash/status/`hostname`.  See "Rel-eng
191machines" below to find out which machines take part in the wash
192process.
193
194To set up the source update on a machine:
195
196  * Ensure that it is in the set of machines installed onto by
197    /afs/dev.mit.edu/service/wash/inst, and run that script to install
198    the wash scripts onto that machine.
199
200  * Set up the cron job on the machine according to
201    /afs/dev.mit.edu/service/wash/README.
202
203  * Ensure that the machine has a host key.
204
205  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
206
207  * Ensure that rcmd.machinename is in write:update.
208
209To set up the wash on a build machine:
210
211  * Ensure that it is in the set of machines installed onto by
212    /afs/dev.mit.edu/service/wash/inst, and run that script to install
213    the wash scripts onto that machine.
214
215  * Set up the cron job on the machine according to
216    /afs/dev.mit.edu/service/wash/README.
217
218  * Ensure that the machine has a host key.
219
220  * Ensure that rcmd.machinename has a PTS identity in the dev cell.
221
222  * Ensure that rcmd.machinename is in read:source.
223
224  * Ensure that
225    /afs/dev.mit.edu/service/wash/status/machinename.mit.edu exists
226    and that rcmd.machinename has write access to it.
227
228Imake templates
229---------------
230
231We don't like imake, but we have two sets of imake templates:
232
233  * packs/build/config
234
235    These templates are the legacy Athena build system.  They are no
236    longer used by any software in the release; we install them in
237    case someone wants to build some very old software.
238
239  * packs/build/xconfig
240
241    These templates are used for building software which uses X-style
242    Imakefiles.  They may need periodic updating as new versions of X
243    are released.  These templates are full of a lot of hacks, mostly
244    because the imake model isn't really adequate for dealing with
245    third-party software and local site customizations.
246
247Release notes
248-------------
249
250There are two kinds of release notes, the system release notes and the
251user release notes.  The system release notes are more comprehensive
252and assume a higher level of technical knowledge, and are used in the
253construction of the user release notes.  It is the job of the release
254engineer to produce a set of system release notes for every release,
255with early versions towards the beginning of the release cycle.  The
256best way to make sure this happens is to maintain the system release
257notes throughout the entire development cycle.
258
259Thus, it is the job of the release engineer to watch the checkins to
260the source tree and enter a note about all user-visible changes in the
261system release notes, which live in /afs/dev.mit.edu/project/relnotes.
262Highly visible changes should appear near the beginning of the file,
263and less visible changes should appear towards the end.  Changes to
264particular subsystems should be grouped together when possible.
265
266Release cycles
267--------------
268
269Release cycles have five phases: crash and burn, alpha, beta, early,
270and the public release.  The release team has a set of criteria for
271entering and exiting each phase, which won't be covered here.  The
272following guidelines should help the release go smoothly:
273
274  * Crash and burn
275
276    This phase is for rel-eng internal testing.  The release engineer
277    needs to make sure that the current source base works well enough
278    for testers to use it and find bugs.  For crash and burn to begin,
279    the operating system support person for each platform must provide
280    a way to install or update a machine to the new version of the
281    operating system for that platform.
282
283    Each platform needs a build tree and system packs volume.  The
284    build tree should be mounted in
285    /afs/dev.mit.edu/project/release/<version>/build/<sysname>.  The
286    system packs volume should be mounted in
287    /afs/dev.mit.edu/system/<sysname>/srvd-<version>.
288
289    Each platform needs a new-release build machine to generate system
290    packs to test.  Set it up according to the directions in "Build
291    Machines" above.
292
293    To do a full build for release testing:
294
295    # Get tickets as builder and ssh to the wash machine
296    rm -rf /rel/.srvd/* /rel/.srvd/.??*
297    rm -rf /rel/build/* /rel/build/.??*
298    chroot /rel sh /mit/source-X.Y/packs/build/build.sh -l &
299
300    (It can be useful to run the ssh to the build machine inside a
301    screen session so you don't have to log out of the build machine
302    until the build is finished.)
303
304    The crash and burn machines should be identified and used to test
305    the update (and install, if possible).  System packs may be
306    regenerated at will.  The system packs volume does not need any
307    replication.
308
309    Before the transition from crash and burn to alpha, the release
310    engineer should do a sanity check on the new packs by comparing a
311    file listing of the new packs to a file listing of the previous
312    release's packs.  The release engineer should also check the list
313    of configuration files for each platform (in
314    packs/update/platform/*/configfiles) and make sure that any
315    configuration files which have changed are listed as changed in
316    the version script.  Finally, the release should be checked to
317    make sure it won't overflow partitions on any client machines.
318
319    A note on the wash: it is not especially important that the wash
320    be running during the release cycle, but currently the wash can
321    run on the new release build machine without interfering with the
322    build functions of the machine.  So after updating the wash
323    machine to the new OS for new release builds, the release engineer
324    can set up the wash right away.
325
326  * Alpha
327
328    The alpha phase is for internal testing by the release team.
329    System packs may still be regenerated at will, but the system
330    packs volume (and os volume) should be read-only so it can be
331    updated by a vos release.  Changes to the packs do not need to be
332    propagated in patch releases; testers are expected to be able to
333    ensure consistency by forcing repeat updates or reinstalling their
334    machines.
335
336    System release notes should be prepared during this phase.
337
338    Before the transition from alpha to beta, doc/third-party should
339    be checked to see if miscellaneous third-party files (the ones not
340    under the "third" hierarchy) should be updated.
341
342  * Beta
343
344    The beta phase involves outside testers.  System packs and os
345    volumes should be replicated on multiple servers, and permissions
346    should be set to avoid accidental changes (traditionally this
347    means giving write access to system:packs, a normally empty
348    group).  Changes to the packs must be propagated by patch
349    releases.
350
351    User release notes should be prepared during this phase.  Ideally,
352    no new features should be committed to the source tree during the
353    beta phase.
354
355    For the transition from beta to early:
356
357    - Prepare a release branch with a name of the form athena-8_1.
358      Tag it with athena-8_1-early.
359
360    - Create a volume with a mountpoint of the form
361      /afs/dev.mit.edu/source/src-8.1 and check out a tree on the
362      branch there.  Set the permissions by doing an fs copyacl from
363      an older source tree before the checkout, and run
364      CVSROOT/afs-permissions.sh after the checkout.  Copy over the
365      .rconf file from the src-current directory.  Have a filsys entry
366      of the form source-8.1 created for the new tree.
367
368    - attach and lock the branch source tree on each build machine.
369
370    - Do a final full build of the release from the branch source
371      tree.
372
373  * Early
374
375    The early release involves more outside testers and some cluster
376    machines.  The release should be considered ready for public
377    consumption.
378
379    The release branch should be tagged with a name of the form
380    athena-8_1-early.
381
382  * Release
383
384    The release branch should be tagged with a name of the form
385    athena-8_1-release.
386
387    Once the release has gone public, the current-release machines
388    should be updated to the release and set up as the build machines
389    for the now-current release.  Remove the /build and /.srvd
390    symlinks on the new-release build machines, and make sure the wash
391    is running on them if you didn't do so back in the crash and burn
392    phase.
393
394One thing that needs to happen externally during a release cycle, if
395there is an OS upgrade involved, is the addition of compatibility
396symlinks under the arch directories of various lockers. All of the
397lockers listed in packs/glue/specs, as well as tellme, mkserv, and
398andrew, definitely need to be hit, and the popular software lockers
399need to be hit as well. Here is a reasonable list of popular lockers
400to get in addition to the glue ones:
401
402  consult
403  games
404  gnu
405  graphics
406  outland
407  sipb
408  tcl
409  watchmaker
410  windowmanagers
411  /afs/sipb/project/tcsh
412
413In addition, the third-party software lockers need to be updated; the
414third-party software group keeps their own list.
415
416Patch releases
417--------------
418
419Once a release has hit beta test, all changes to the release must be
420propagated through patch releases.  The steps to performing a patch
421release are:
422
423  * Check in the changes on the mainline (if they apply) and on the
424    release branch and update the relevant sections of the source tree
425    in /mit/source-<version>.
426
427  * If the update needs to do anything other than track against the
428    system packs, you must prepare a version script which deals with
429    any transition issues, specifies whether to track the OS volume,
430    specifies whether to deal with a kernel update, and specifies
431    which if any configuration files need to be updated.  See the
432    update script (packs/update/do-update.sh) for details.  See
433    packs/build/update/os/*/configfiles for a list of configuration
434    files for a given platform.  The version script should be checked
435    in on the mainline and on the release branch.
436
437  * Do the remainder of the steps as "builder" on the build machine.
438    Probably the best way is to get Kerberos tickets as "builder" and
439    ssh to the build machine.
440
441  * Make sure to add symlinks under /build tree for any files you have
442    added.  Note that you probably added a build script if the update
443    needs to do anything other than track against the system packs.
444
445  * In the build tree, bump the version number in packs/build/version
446    (the symlink should be broken for this file to avoid having to
447    change it in the source tree).
448
449  * If you are going to need to update binaries that users run from
450    the packs, go into the packs and move (don't copy) them into a
451    .deleted directory at the root of the packs.  This is especially
452    important for binaries like emacs and dash which people run for
453    long periods of time, to avoid making the running processes dump
454    core when the packs are released.
455
456  * Update the read-write volume of the packs to reflect the changes
457    you've made.  You can use the build.sh script to build and install
458    specific packages, or you can use the do.sh script to build the
459    package and then install specific files (cutting and pasting from
460    the output of "gmake -n install DESTDIR=/srvd" is the safest way);
461    updating the fewest number of files is preferrable.  Remember to
462    install the version script.
463
464  * Use the build.sh script to build and install packs/build/finish.
465    This will fix ownerships and update the track lists and the like.
466
467  * It's a good idea to test the update from the read-write packs by
468    symlinking the read-write packs to /srvd on a test machine and
469    taking the update.  Note that when the machine comes back up with
470    the new version, it will probably re-attach the read-write packs,
471    so you may have to re-make the symlink if you want to test stuff
472    that's on the packs.
473
474  * At some non-offensive time, release the packs in the dev cell.
475
476  * Send mail to rel-eng saying that the patch release went out, and
477    what was in it.  (You can find many example pieces of mail in the
478    discuss archive.)  Include instructions explaining how to
479    propagate the release to the athena cell.
480
481Third-party pull-ups for patch releases
482---------------------------------------
483
484In CVS, unmodified imported files have the default branch set to
4851.1.1.  When a new version is imported, such files need no merging;
486the new version on the vendor branch automatically becomes the current
487version of the file.  This optimization reduces storage requirements
488and makes the merge step of an import faster and less error-prone, at
489the cost of rendering a third-party module inconsistent between an
490import and a merge.
491
492Due to an apparent bug in CVS (as of version 1.11.2), a commit to a
493branch may reset the default branch of an unmodified imported file as
494if the commit were to the trunk.  The practical effect for us is that
495pulling up versions of third-party packages to a release branch
496results in many files being erroneously shifted from the unmodified
497category to the modified category.
498
499To account for this problem as well as other corner cases, use the
500following procedure to pull up third-party packages for a patch
501release:
502
503  cvs co -r athena-X_Y third/module
504  cd third/module
505  cvs update -d
506  cvs update -j athena-X_Y -j HEAD
507  cvs ci
508  cd /afs/dev.mit.edu/source/repository/third/module
509  find . -name "*,v" -print0 | xargs -0 sh /tmp/vend.sh
510
511Where /tmp/vend.sh is:
512
513  #!/bin/sh
514
515  for f; do
516    if rlog -h "$f" | grep -q '^head: 1\.1$' && \
517       rlog -h "$f" | grep -q '^branch:$' && \
518       rlog -h "$f" | grep -q 'vendor: 1\.1\.1$'; then
519      rcs -bvendor "$f"
520    fi
521  done
522
523The find -print0 and xargs -0 flags are not available on the native
524Solaris versions of find and xargs, so the final step may be best
525performed under Linux.
526
527Rel-eng machines
528----------------
529
530The machine running the wash update is equal-rites.mit.edu.
531
532There are three rel-eng machines for each platform:
533
534  * A current release build machine, for doing incremental updates to
535    the last public release.  This machine may also be used by
536    developers for building software.
537
538  * A new release build machine, for building and doing incremental
539    updates to releases which are still in testing.  This machine also
540    performs the wash.  This machine may also be used by developers
541    for building software, or if they want a snapshot of the new
542    system packs to build things against.
543
544  * A crash and burn machine, usually located in the release
545    engineer's office for easy physical access.
546
547Here is a list of the rel-eng machines for each platform:
548
549                       Sun       Linux
550
551Current release build  maytag    kenmore
552New release build      downy     snuggle
553Crash and burn         pyramids  men-at-arms
554
555For reference, here are some names that fit various laundry and
556construction naming schemes:
557
558  * Washing machines: kenmore, whirlpool, ge, maytag
559  * Laundry detergents: fab, calgon, era, cheer, woolite,
560    tide, ultra-tide, purex
561  * Bleaches: clorox, ajax
562  * Fabric softeners: downy, final-touch, snuggle, bounce
563  * Heavy machinery: steam-shovel, pile-driver, dump-truck,
564    wrecking-ball, crane
565  * Construction kits: lego, capsela, technics, k-nex, playdoh,
566    construx
567  * Construction materials: rebar, two-by-four, plywood,
568    sheetrock
569  * Heavy machinery companies: caterpillar, daewoo, john-deere,
570    sumitomo
571  * Buildings: empire-state, prudential, chrysler
572
573Clusters
574--------
575
576The getcluster(8) man explains how clients interpret cluster
577information.  This section documents the clusters related to the
578release cycle, and how they should be managed.
579
580There are five clusters for each platform, each of the form
581PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
582alpha, beta, early, public) and PLATFORM is the machtype name of the
583platform.  There are two filsys entries for each platform and release
584pointing to the athena cell and dev cell system packs for the release;
585they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
586X and Y are the major and minor numbers of the release.
587
588At the crash and burn, alpha, and beta phases of the release cycle,
589the appropriate cluster (PHASE-PLATFORM) should be updated to include
590data records of the form:
591
592       Label: syslib     Data: dev-PLATFORMsys-XY X.Y t
593
594This change will cause console messages to appear on the appropriate
595machines informing their maintainers of a new testing release which
596they can take manually.
597
598At the early and public phases of the release cycle, the 't' should be
599removed from the new syslib records in the crash, alpha, and beta
600clusters, and the appropriate cluster (early-PLATFORM or
601public-PLATFORM) should be updated to include data records:
602
603       Label: syslib     Data: athena-PLATFORMsys-XY X.Y
604
605This change will cause AUTOUPDATE machines in the appropriate cluster
606(as well as the crash, alpha, and beta clusters) to take the new
607release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.