source: trunk/doc/maintenance @ 17254

Revision 17254, 21.8 KB checked in by ghudson, 22 years ago (diff)
Fix build machine setup instructions a little.
Line 
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
6
7The areas covered in this file are:
8
9        Mailing lists
10        Permissions
11        Build machines
12        The wash process
13        Imake templates
14        Release notes
15        Release cycles
16        Patch releases
17        Rel-eng machines
18        Cluster information
19
20Mailing lists
21-------------
22
23Here are descriptions of the mailing lists related to the source tree:
24
25        * source-developers
26
27                For discussion of the policy and day-to-day
28                maintenance of the repository.  This is a public list,
29                and there is a public discuss archive on menelaus.
30
31        * source-reviewers
32
33                For review of changes to be checked into the
34                repository.  To be a member of this mailing list, you
35                must have read access to the non-public parts of the
36                source tree, but you do not need to be a staff member.
37                There is a non-public discuss archive on menelaus.
38
39        * source-commits
40
41                This mailing lists receives commit logs for all
42                commits to the repository.  This is a public mailing
43                list.  There is a public discuss archive on menelaus.
44
45        * source-diffs
46
47                This mailing list receives commit logs with diffs for
48                all commits to the repository.  To be on this mailing
49                list, you must have read access to the non-public
50                parts of the source tree.  There is no discuss archive
51                for this list.
52
53        * source-wash
54
55                This mailing list receives mail when the wash process
56                blows out.  This is a public mailing list.  There is
57                no discuss archive for this list.
58
59        * rel-eng
60
61                The release engineering mailing list.  Mail goes here
62                about patch releases and other release details.  There
63                is a public archive on menelaus.
64
65        * release-team
66
67                The mailing list for the release team, which sets
68                policy for releases.  There is a public archive on
69                menelaus (currently, it has the name "release-77").
70
71Permissions
72-----------
73
74Following are descriptions of the various groups found on the acls of
75the source tree:
76
77        * read:source
78          read:staff
79
80                These two groups have identical permissions in the
81                repository, but read:source contains artificial
82                constructs (the builder user and service principals)
83                while read:staff contains people.  In the future,
84                highly restricted source could have access for
85                read:source and not read:staff.
86
87                Both of these groups have read access to non-public
88                areas of the source tree.
89
90        * write:staff
91
92                Contains developers with commit access to the source
93                tree.  This group has write access to the repository,
94                but not to the checked-out copy of the mainline
95                (/mit/source).
96
97        * write:update
98
99                Contains the service principal responsible for
100                updating /mit/source.  This group has write access to
101                /mit/source but not to the repository.
102
103        * adm:source
104
105                This group has administrative access to the repository
106                and to /mit/source.
107
108system:anyuser has read access to public areas of the source tree and
109list access to the rest.  system:authuser occasionally has read access
110to areas that system:anyuser does not (synctree is the only current
111example).
112
113The script CVSROOT/afs-protections.sh in the repository makes sure the
114permissions are correct in the repository or in a working directory.
115Run it from the top level of the repository or of /mit/source, giving
116it the argument "repository" or "wd".
117
118Build machines
119--------------
120
121We do release builds in a chrooted environment to avoid damaging the
122machines we are building on.  So that builds can have access to AFS,
123we mount AFS inside the chrooted environments and make a symlink from
124/afs to the place AFS is mounted.  Each build machine has two such
125environments, one in /rel (for the release build) and one in /rel/wash
126(for the wash).  The second environment has to be located within the
127first, of course, so that AFS can be visible from both.
128
129To set up a build machine, follow these instructions after installing:
130
131        * Set the root password.
132        * Put "builder rl" in /etc/athena/access.
133        * In /etc/athena/rc.conf, set SSHD and ACCESSON to true.  Set
134          AFSADJUST, PUBLIC, and AUTOUPDATE to false.
135        * Create /rel/wash/afs and parents.
136        * Edit /usr/vice/etc/cacheinfo and change the AFS mountpoint
137          from "/afs" to "/rel/wash/afs".
138        * Reboot.  (Remote access daemons should work without AFS,
139          more or less.)
140        * Create symlinks /afs -> rel/wash/afs and /rel/afs ->
141          wash/afs.
142        * Run "/mit/source/packs/build/makeroot.sh /rel X.Y", where
143          X.Y is the full release this build is for.
144        * Run "/mit/source/packs/build/makeroot.sh /rel/wash".
145        * Make a symlink from /rel/.srvd to the AFS srvd volume, and
146          from /rel/build to the AFS build volume.  (These steps can
147          be ommitted if the release cycle hasn't progressed far
148          enough for those volumes to exist.)
149
150The wash process
151----------------
152
153The wash process is a nightly rebuild of the source repository from
154scratch, intended to alert the source tree maintainers when someone
155checks in a change which causes the source tree to stop building.  The
156general architecture of the wash process is:
157
158        * Each night at midnight, a machine performs a cvs update of
159          the checked-out tree in /afs/dev.mit.edu/source/src-current.
160          If the cvs update fails, the update script sends mail to
161          source-wash@mit.edu.  This machine is on read:source and
162          write:update.
163
164        * Each night at 4:30am, a machine of each architecture
165          performs a build of the tree into /var/srvd.new, using the
166          build directory /var/build.  If the build fails, the wash
167          script copies the log of the failed build into AFS and sends
168          mail to source-wash@mit.edu with the last few lines of the
169          log.  If the build succeeds, the wash script moves
170          /var/srvd.new to /var/srvd, so that /var/srvd is always the
171          last successful build of the source tree.
172
173        * Each Sunday at 1:00am, the wash machines make a copy of
174          their last successful builds into a "srvd-current" directory
175          in AFS.  The copy is done without system:administrator
176          privileges, so the file permissions on srvd-current are all
177          wrong, but the current srvd is useful for development work.
178
179Source for the wash scripts lives in /afs/dev.mit.edu/service/wash.
180They are installed in /usr/local on the wash machines.  Logs of the
181start and end times of the wash processes on each machine live in
182/afs/dev.mit.edu/service/wash/status/`hostname`.  See "Rel-eng
183machines" below to find out which machines take part in the wash
184process.
185
186To set up the source update on a machine:
187
188        * Ensure that it is in the set of machines installed onto by
189          /afs/dev.mit.edu/service/wash/inst, and run that script to
190          install the wash scripts onto that machine.
191
192        * Set up the cron job on the machine according to
193          /afs/dev.mit.edu/service/wash/README.
194
195        * Ensure that the machine has a host key or rcmd srvtab.
196
197        * Ensure that rcmd.machinename has a PTS identity in the dev
198          cell.
199
200        * Ensure that rcmd.machinename is in write:update.
201
202To set up the wash on a build machine:
203
204        * Ensure that it is in the set of machines installed onto by
205          /afs/dev.mit.edu/service/wash/inst, and run that script to
206          install the wash scripts onto that machine.
207
208        * Set up cron jobs on the machine according to
209          /afs/dev.mit.edu/service/wash/README.
210
211        * Ensure that the machine has a host key or rcmd srvtab.
212
213        * Ensure that rcmd.machinename has a PTS identity in the dev
214          cell.
215
216        * Ensure that rcmd.machinename is in read:source.
217
218        * Ensure that
219          /afs/dev.mit.edu/service/wash/status/machinename.mit.edu
220          exists and that rcmd.machinename has write access to it.
221
222        * Ensure that /afs/dev.mit.edu/system/systemtype/srvd-current
223          exists as a separate volume with adequate quota, and that
224          rcmd.machinename has write access to it.
225
226Imake templates
227---------------
228
229We don't like imake, but we have two sets of imake templates:
230
231        * packs/build/config
232
233                These templates are the legacy Athena build system.
234                They are no longer used by any software in the
235                release; we install them in case someone wants to
236                build some very old software.
237
238        * packs/build/xconfig
239
240                These templates are used for building software which
241                uses X-style Imakefiles.  They may need periodic
242                updating as new versions of X are released.  These
243                templates are full of a lot of hacks, mostly because
244                the imake model isn't really adequate for dealing with
245                third-party software and local site customizations.
246
247Release notes
248-------------
249
250There are two kinds of release notes, the system release notes and the
251user release notes.  The system release notes are more comprehensive
252and assume a higher level of technical knowledge, and are used in the
253construction of the user release notes.  It is the job of the release
254engineer to produce a set of system release notes for every release,
255with early versions towards the beginning of the release cycle.  The
256best way to make sure this happens is to maintain the system release
257notes throughout the entire development cycle.
258
259Thus, it is the job of the release engineer to watch the checkins to
260the source tree and enter a note about all user-visible changes in the
261system release notes, which live in /afs/dev.mit.edu/project/relnotes.
262Highly visible changes should appear near the beginning of the file,
263and less visible changes should appear towards the end.  Changes to
264particular subsystems should be grouped together when possible.
265
266Release cycles
267--------------
268
269Release cycles have five phases: crash and burn, alpha, beta, early,
270and the public release.  The release team has a set of criteria for
271entering and exiting each phase, which won't be covered here.  The
272following guidelines should help the release go smoothly:
273
274        * Crash and burn
275
276          This phase is for rel-eng internal testing.  The release
277          engineer needs to make sure that the current source base
278          works well enough for testers to use it and find bugs.  For
279          crash and burn to begin, the operating system support person
280          for each platform must provide a way to install or update a
281          machine to the new version of the operating system for that
282          platform.
283
284          Each platform needs a build tree and system packs volume.
285          The build tree should be mounted in
286          /afs/dev.mit.edu/project/release/<version>/build/<sysname>.
287          The system packs volume should be mounted in
288          /afs/dev.mit.edu/system/<sysname>/srvd-<version>.
289
290          Each platform needs a new-release build machine to generate
291          system packs to test.  Set it up according to the directions
292          in "Build Machines" above.
293
294          To do a full build for release testing:
295
296                # Get tickets as builder and ssh to the wash machine
297                rm -rf /rel/.srvd/* /rel/.srvd/.??*
298                rm -rf /rel/build/* /rel/build/.??*
299                chroot /rel sh /mit/source-X.Y/packs/build/build.sh -l &
300
301          (It can be useful to run the ssh to the build machine inside
302          a screen session so you don't have to log out of the build
303          machine until the build is finished.)
304
305          The crash and burn machines should be identified and used to
306          test the update (and install, if possible).  System packs
307          may be regenerated at will.  The system packs volume does
308          not need any replication.
309
310          System release notes should be prepared during this phase.
311
312          Before the transition from crash and burn to alpha, the
313          release engineer should do a sanity check on the new packs
314          by comparing a file listing of the new packs to a file
315          listing of the previous release's packs.  The release
316          engineer should also check the list of configuration files
317          for each platform (in packs/update/platform/*/configfiles)
318          and make sure that any configuration files which have
319          changed are listed as changed in the version script.
320          Finally, the release should be checked to make sure it won't
321          overflow partitions on any client machines; currently, SGIs
322          are not a problem (because they have one big partition) and
323          the most restrictive sizes on Solaris clients are 27713K and
324          51903K of useable space for the root and /usr partitions.
325
326          A note on the wash: it is not especially important that the
327          wash be running during the release cycle, but currently the
328          wash can run on the new release build machine without
329          interfering with the build functions of the machine.
330          So after updating the wash machine to the new OS for new
331          release builds, the release engineer can set up the wash
332          right away.
333
334        * Alpha
335
336          The alpha phase is for internal testing by the release team.
337          System packs may still be regenerated at will, but the
338          system packs volume (and os volume) should be read-only so
339          it can be updated by a vos release.  Changes to the packs do
340          not need to be propagated in patch releases; testers are
341          expected to be able to ensure consistency by forcing repeat
342          updates or reinstalling their machines.
343
344          User release notes should be prepared during this phase.
345
346          Before the transition from alpha to beta, doc/third-party
347          should be checked to see if miscellaneous third-party files
348          (the ones not under the "third" hierarchy) should be
349          updated.
350
351        * Beta
352
353          The beta phase involves outside testers.  System packs and
354          os volumes should be replicated on multiple servers, and
355          permissions should be set to avoid accidental changes
356          (traditionally this means giving write access to
357          system:packs, a normally empty group).  Changes to the packs
358          must be propagated by patch releases.
359
360          User release notes should be essentially finished by the end
361          of this phase.  System release notes may continue to be
362          updated as bug fixes occur.  Ideally, no new features should
363          be committed to the source tree during the beta phase.
364
365          For the transition from beta to early:
366
367                - Prepare a release branch with a name of the form
368                  athena-8_1.  Tag it with athena-8_1-early.
369
370                - Create a volume with a mountpoint of the form
371                  /afs/dev.mit.edu/source/src-8.1 and check out a tree
372                  on the branch there.  Set the permissions by doing
373                  an fs copyacl from an older source tree before the
374                  checkout, and run CVSROOT/afs-permissions.sh after
375                  the checkout.  Copy over the .rconf file from the
376                  src-current directory.  Have a filsys entry of the
377                  form source-8.1 created for the new tree.
378
379                - attach and lock the branch source tree on each build
380                  machine.
381
382                - Do a final full build of the release from the branch
383                  source tree.
384
385        * Early
386
387          The early release involves more outside testers and some
388          cluster machines.  The release should be considered ready
389          for public consumption.
390
391          The release branch should be tagged with a name of the form
392          athena-8_1-early.
393
394        * Release
395
396          The release branch should be tagged with a name of the form
397          athena-8_1-release.
398
399          Once the release has gone public, the current-release
400          machines should be updated to the release and set up as the
401          build machines for the now-current release.  Remove the
402          /build and /.srvd symlinks on the new-release build
403          machines, and make sure the wash is running on them if you
404          didn't do so back in the crash and burn phase.
405
406One thing that needs to happen externally during a release cycle, if
407there is an OS upgrade involved, is the addition of compatibility
408symlinks under the arch directories of various lockers. All of the
409lockers listed in packs/glue/specs, as well as tellme, mkserv, and
410andrew, definitely need to be hit, and the popular software lockers
411need to be hit as well. Here is a reasonable list of popular lockers
412to get in addition to the glue ones:
413
414        consult
415        games
416        gnu
417        graphics
418        outland
419        sipb
420        tcl
421        watchmaker
422        windowmanagers
423        /afs/sipb/project/tcsh
424
425In addition, the third-party software lockers need to be updated; the
426third-party software group keeps their own list.
427
428Patch releases
429--------------
430
431Once a release has hit beta test, all changes to the release must be
432propagated through patch releases.  The steps to performing a patch
433release are:
434
435        * Check in the changes on the mainline (if they apply) and on
436          the release branch and update the relevant sections of the
437          source tree in /mit/source-<version>.
438
439        * If the update needs to do anything other than track against
440          the system packs, you must prepare a version script which
441          deals with any transition issues, specifies whether to track
442          the OS volume, specifies whether to deal with a kernel
443          update, and specifies which if any configuration files need
444          to be updated.  See the update script
445          (packs/update/do-update.sh) for details.  See
446          packs/build/update/os/*/configfiles for a list of
447          configuration files for a given platform.  The version
448          script should be checked in on the mainline and on the
449          release branch.
450
451        * Do the remainder of the steps as "builder" on the build
452          machine.  Probably the best way is to get Kerberos tickets
453          as "builder" and ssh to the build machine.
454
455        * Make sure to add symlinks under /build tree for any files
456          you have added.  Note that you probably added a build script
457          if the update needs to do anything other than track against
458          the system packs.
459
460        * In the build tree, bump the version number in
461          packs/build/version (the symlink should be broken for this
462          file to avoid having to change it in the source tree).
463
464        * If you are going to need to update binaries that users run
465          from the packs, go into the packs and move (don't copy) them
466          into a .deleted directory at the root of the packs.  This is
467          especially important for binaries like emacs and dash which
468          people run for long periods of time, to avoid making the
469          running processes dump core when the packs are released.
470
471        * Update the read-write volume of the packs to reflect the
472          changes you've made.  You can use the build.sh script to
473          build and install specific packages, or you can use the
474          do.sh script to build the package and then install specific
475          files (cutting and pasting from the output of "gmake -n
476          install DESTDIR=/srvd" is the safest way); updating the
477          fewest number of files is preferrable.  Remember to install
478          the version script.
479
480        * Use the build.sh script to build and install
481          packs/build/finish.  This will fix ownerships and update the
482          track lists and the like.
483
484        * It's a good idea to test the update from the read-write
485          packs by symlinking the read-write packs to /srvd on a test
486          machine and taking the update.  Note that when the machine
487          comes back up with the new version, it will probably
488          re-attach the read-write packs, so you may have to re-make
489          the symlink if you want to test stuff that's on the packs.
490
491        * At some non-offensive time, release the packs in the dev
492          cell.
493
494        * Send mail to rel-eng saying that the patch release went out,
495          and what was in it.  (You can find many example pieces of
496          mail in the discuss archive.)  Include instructions
497          explaining how to propagate the release to the athena cell.
498
499Rel-eng machines
500----------------
501
502The machine running the wash update is equal-rites.mit.edu.
503
504There are three rel-eng machines for each platform:
505
506        * A current release build machine, for doing incremental
507          updates to the last public release.  This machine may also
508          be used by developers for building software.
509
510        * A new release build machine, for building and doing
511          incremental updates to releases which are still in testing.
512          This machine also performs the wash.  This machine may also
513          be used by developers for building software, or if they want
514          a snapshot of the new system packs to build things against.
515
516        * A crash and burn machine, usually located in the release
517          engineer's office for easy physical access.
518
519Here is a list of the rel-eng machines for each platform:
520
521                        Sun                  O2          Linux
522
523Current release build   downy                bounce      snuggle
524New release build       maytag               whirlpool   kenmore
525Crash and burn          the-colour-of-magic  reaper-man  men-at-arms
526
527For reference, here are some names that fit various laundry and
528construction naming schemes:
529
530        * Washing machines: kenmore, whirlpool, ge, maytag
531        * Laundry detergents: fab, calgon, era, cheer, woolite,
532                tide, ultra-tide, purex
533        * Bleaches: clorox, ajax
534        * Fabric softeners: downy, final-touch, snuggle, bounce
535        * Heavy machinery: steam-shovel, pile-driver, dump-truck,
536                wrecking-ball, crane
537        * Construction kits: lego, capsela, technics, k-nex, playdoh,
538                construx
539        * Construction materials: rebar, two-by-four, plywood,
540                sheetrock
541        * Heavy machinery companies: caterpillar, daewoo, john-deere,
542                sumitomo
543        * Buildings: empire-state, prudential, chrysler
544
545Clusters
546--------
547
548The getcluster(8) man explains how clients interpret cluster
549information.  This section documents the clusters related to the
550release cycle, and how they should be managed.
551
552There are five clusters for each platform, each of the form
553PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
554alpha, beta, early, public) and PLATFORM is the machtype name of the
555platform.  There are two filsys entries for each platform and release
556pointing to the athena cell and dev cell system packs for the release;
557they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
558X and Y are the major and minor numbers of the release.  For the SGI,
559we currently also have athena-sgi-inst-XY and dev-sgi-inst-XY.
560
561At the crash and burn, alpha, and beta phases of the release cycle,
562the appropriate cluster (PHASE-PLATFORM) should be updated to include
563data records of the form:
564
565        Label: syslib           Data: dev-PLATFORMsys-XY X.Y t
566(SGI)   Label: instlib          Data: dev-sgi-inst-XY X.Y t
567
568This change will cause console messages to appear on the appropriate
569machines informing their maintainers of a new testing release which
570they can take manually.
571
572At the early and public phases of the release cycle, the 't' should be
573removed from the new syslib records in the crash, alpha, and beta
574clusters, and the appropriate cluster (early-PLATFORM or
575public-PLATFORM) should be updated to include data records:
576
577        Label: syslib           Data: athena-PLATFORMsys-XY X.Y
578(SGI)   Label: instlib          Data: athena-sgi-inst-XY X.Y
579
580This change will cause AUTOUPDATE machines in the appropriate cluster
581(as well as the crash, alpha, and beta clusters) to take the new
582release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.