source: trunk/doc/maintenance @ 12069

Revision 12069, 17.2 KB checked in by ghudson, 26 years ago (diff)
Update the description of the wash.
Line 
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
6
7The areas covered in this file are:
8
9        Mailing lists
10        Permissions
11        The wash process
12        Imake templates
13        Release notes
14        Release cycles
15        Patch releases
16        Rel-eng machines
17        Cluster information
18
19Mailing lists
20-------------
21
22Here are descriptions of the mailing lists related to the source tree:
23
24        * source-developers
25
26                For discussion of the policy and day-to-day
27                maintenance of the repository.  This is a public list,
28                and there is a public discuss archive on menelaus.
29
30        * source-reviewers
31
32                For review of changes to be checked into the
33                repository.  To be a member of this mailing list, you
34                must have read access to the non-public parts of the
35                source tree, but you do not need to be a staff member.
36                There is a non-public discuss archive on menelaus.
37
38        * source-commits
39
40                This mailing lists receives commit logs for all
41                commits to the repository.  This is a public mailing
42                list.  There is a public discuss archive on menelaus.
43
44        * source-diffs
45
46                This mailing list receives commit logs with diffs for
47                all commits to the repository.  To be on this mailing
48                list, you must have read access to the non-public
49                parts of the source tree.  There is no discuss archive
50                for this list.
51
52        * source-wash
53
54                This mailing list receives mail when the wash process
55                blows out.  This is a public mailing list.  There is
56                no discuss archive for this list.
57
58        * rel-eng
59
60                The release engineering mailing list.  Mail goes here
61                about patch releases and other release details.  There
62                is a public archive on menelaus.
63
64        * release-team
65
66                The mailing list for the release team, which sets
67                policy for releases.  There is a public archive on
68                menelaus (currently, it has the name "release-77").
69
70Permissions
71-----------
72
73Following are descriptions of the various groups found on the acls of
74the source tree:
75
76        * read:source
77          read:staff
78
79                These two groups have identical permissions in the
80                repository, but read:source contains artificial
81                constructs (the builder user and service principals)
82                while read:staff contains people.  In the future,
83                highly restricted source could have access for
84                read:source and not read:staff.
85
86                Both of these groups have read access to non-public
87                areas of the source tree.
88
89        * write:staff
90
91                Contains developers with commit access to the source
92                tree.  This group has write access to the repository,
93                but not to the checked-out copy of the mainline
94                (/mit/source).
95
96        * write:update
97
98                Contains the service principal responsible for
99                updating /mit/source.  This group has write access to
100                /mit/source but not to the repository.
101
102        * adm:source
103
104                This group has administrative access to the repository
105                and to /mit/source.
106
107system:anyuser has read access to public areas of the source tree and
108list access to the rest.  system:authuser occasionally has read access
109to areas that system:anyuser does not (synctree is the only current
110example).
111
112The script CVSROOT/afs-protections.sh in the repository makes sure the
113permissions are correct in the repository or in a working directory.
114Run it from the top level of the repository or of /mit/source, giving
115it the argument "repository" or "wd".
116
117The wash process
118----------------
119
120The wash process is a nightly rebuild of the source repository from
121scratch, intended to alert the source tree maintainers when someone
122checks in a change which causes the source tree to stop building.  The
123general architecture of the wash process is:
124
125        * Each night at midnight, a machine (currently small-gods)
126          performs a cvs update of the checked-out tree in
127          /afs/dev.mit.edu/source/src-current.  If the cvs update
128          fails, the update script sends mail to source-wash@mit.edu.
129          This machine is on read:source and write:update.
130
131        * Each night at 4:30am, a machine of each architecture
132          (currently whirlpool, kenmore, and maytag) performs a build
133          of the tree into /var/srvd.new, using the build directory
134          /var/build.  If the build fails, the wash script copies the
135          log of the failed build into AFS and sends mail to
136          source-wash@mit.edu with the last few lines of the log.  If
137          the build succeeds, the wash script moves /var/srvd.new to
138          /var/srvd, so that /var/srvd is always the last successful
139          build of the source tree.
140
141        * Each Sunday at 1:00am, the wash machines make a copy of
142          their last successful builds into a "srvd-current" directory
143          in AFS.  The copy is done without system:administrator
144          privileges, so the file permissions on srvd-current are all
145          wrong, but the current srvd is useful for development work.
146
147Source for the wash scripts lives in /afs/dev.mit.edu/service/wash.
148They are installed in /usr/local on the wash machines.  Logs of the
149start and end times of the wash processes on each machine live in
150/afs/dev.mit.edu/service/wash/status/`hostname`.
151
152Imake templates
153---------------
154
155We don't like imake, but we maintain two sets of imake templates:
156
157        * packs/build/config
158
159                These templates are the legacy Athena build system.
160                They are specific to software in the athena hierarchy,
161                and one glorious day in the future they will no longer
162                be necessary.
163
164                For these templates, you should define TOPDIR to the
165                top-level source directory.
166
167        * packs/build/xconfig
168
169                These templates are used for building software which
170                uses X-style Imakefiles.  They may need periodic
171                updating as new versions of X are released.  These
172                templates are full of a lot of hacks, mostly because
173                the imake model isn't really adequate for dealing with
174                third-party software and local site customizations.
175
176                For these templates, you should define TOPDIR to "."
177                and SRCDIR to the top-level source directory.
178
179Release notes
180-------------
181
182There are two kinds of release notes, the system release notes and the
183user release notes.  The system release notes are more comprehensive
184and assume a higher level of technical knowledge, and are used in the
185construction of the user release notes.  It is the job of the release
186engineer to produce a set of system release notes for every release,
187with early versions towards the beginning of the release cycle.  The
188best way to make sure this happens is to maintain the system release
189notes throughout the entire development cycle.
190
191Thus, it is the job of the release engineer to watch the checkins to
192the source tree and enter a note about all user-visible changes in the
193system release notes, which live in /afs/dev.mit.edu/project/relnotes.
194Highly visible changes should appear near the beginning of the file,
195and less visible changes should appear towards the end.  Changes to
196particular subsystems should be grouped together when possible.
197
198Release cycles
199--------------
200
201Release cycles have five phases: crash and burn, alpha, beta, early,
202and the public release.  The release team has a set of criteria for
203entering and exiting each phase, which won't be covered here.  The
204following guidelines should help the release go smoothly:
205
206        * Crash and burn
207
208          This phase is for rel-eng internal testing.  The crash and
209          burn machines should be identified and used to test the
210          install and update.  System packs may be generated at will
211          by taking snapshots from the wash machine.  The system packs
212          volume does not need any replication.
213
214          System release notes should be prepared during this phase.
215
216          Before the transition from crash and burn to alpha, the
217          release engineer should do a sanity check on the new packs
218          by comparing a file listing of the new packs to a file
219          listing of the previous release's packs.  The release
220          engineer should also check the list of configuration files
221          for each platform (in packs/update/platform/*/configfiles)
222          and make sure that any configuration files which have
223          changed are listed as changed in the version script.
224          Finally, the release should be checked to make sure it won't
225          overflow partitions on any client machines; currently, SGIs
226          are not a problem (because they have one big partition) and
227          the most restrictive sizes on Solaris clients are 27713K and
228          51903K of useable space for the root and /usr partitions.
229
230        * Alpha
231
232          The alpha phase is for internal testing by the release team.
233          System packs may still be regenerated at will by taking
234          snapshots, but the system packs volume (and os volume)
235          should be read-only so it can be updated by a vos release.
236          Changes to the packs do not need to be propagated in patch
237          releases; testers are expected to be able to ensure
238          consistency by forcing repeat updates or reinstalling their
239          machines.
240
241          A draft of the system release notes should be ready by the
242          beginning of this phase.  User release notes should be
243          prepared during this phase.
244
245          Before the transition from alpha to beta, doc/third-party
246          should be checked to see if miscellaneous third-party files
247          (the ones not under the "third" hierarchy) should be
248          updated.
249
250        * Beta
251
252          The beta phase involves outside testers.  System packs and
253          os volumes should be replicated on multiple servers, and
254          permissions should be set to avoid accidental changes
255          (traditionally this means giving write access to
256          system:packs, a normally empty group).  Changes to the packs
257          must be propagated by patch releases.
258
259          User release notes should be essentially finished by the end
260          of this phase.  System release notes may continue to be
261          updated as bug fixes occur.  Ideally, no new features should
262          be committed to the source tree during the beta phase.
263
264          At the end of the beta phase, a release branch should
265          be created with a name of the form athena-8_1, and tagged
266          with athena-8_1-early.  A checked-out tree should be made in
267          /afs/dev.mit.edu/source for the release branch, with a name
268          of the form src-8.1.  It should have a locker with a name of
269          the form source-8.1.  A final full build of the system packs
270          should be done from the release branch, with the build tree
271          located in /afs/dev.mit.edu/project/release.  The new
272          release build machines should be set up for incremental
273          changes to the new release at this point (which means
274          turning off the wash).
275
276        * Early
277
278          The early release involves more outside testers and some
279          cluster machines.  The release should be considered ready
280          for public consumption.
281
282          The release branch should be tagged with a name of the form
283          athena-8_1-early.
284
285        * Release
286
287          The release branch should be tagged with a name of the form
288          athena-8_1-release.
289
290One thing that needs to happen externally during a release cycle, if
291there is an OS upgrade involved, is the addition of compatibility
292symlinks under the arch directories of various lockers.  All of the
293lockers listed in packs/glue/specs definitely need to be hit, and the
294popular software lockers need to be hit as well.  Here is a reasonable
295list of popular lockers to get in addition to the glue ones:
296
297        consult
298        games
299        gnu
300        graphics
301        outland
302        sipb
303        tcl
304        watchmaker
305        windowmanagers
306        /afs/sipb/project/tcsh
307
308In addition, the third-party software lockers need to be updated; the
309third-party software group keeps their own list.
310
311Patch releases
312--------------
313
314Once a release has hit beta test, all changes to the release must be
315propagated through patch releases.  The steps to performing a patch
316release are:
317
318        * Check in the changes on the mainline (if they apply) and on
319          the release branch and update the relevant sections of the
320          source tree in /afs/dev.mit.edu/source.
321
322        * If the update needs to do anything other than track against
323          the system packs, you must prepare a version script which
324          deals with any transition issues, specifies whether to track
325          the OS volume, specifies whether to deal with a kernel
326          update, and specifies which if any configuration files need
327          to be updated.  See the update script
328          (packs/update/do-update.sh) for details.  See
329          packs/build/update/platform/*/configfiles for a list of
330          configuration files for a given platform.  The version
331          script should be checked in on the mainline and on the
332          release branch.
333
334        * Make sure to add symlinks in the build tree for any files
335          you have added.  Note that you probably added a build script
336          if the update needs to do anything other than track against
337          the system packs.
338
339        * In the build tree, bump the version number in
340          packs/build/version (the symlink should be broken for this
341          file to avoid having to change it in the source tree).
342
343        * If you are going to need to update binaries that users run
344          from the packs, go into the packs and move (don't copy) them
345          into a .deleted directory at the root of the packs.  This is
346          especially important for binaries like emacs and dash which
347          people run for long periods of time, to avoid making the
348          running processes dump core when the packs are released.
349
350        * Update the read-write volume of the packs to reflect the
351          changes you've made.  You can use the build.sh script to
352          build and install specific packages, or you can use the
353          do.sh script to build the package and then install specific
354          files (cutting and pasting from the output of "make -n
355          install DESTDIR=/srvd" is the safest way); updating the
356          fewest number of files is preferrable.  Remember to install
357          the version script.
358
359        * Use the build.sh script to build and install
360          packs/build/finish.  This will fix ownerships and update the
361          track lists and the like.
362
363        * It's a good idea to test the update from the read-write
364          packs by symlinking the read-write packs to /srvd on a test
365          machine and taking the update.  Note that when the machine
366          comes back up with the new version, it will probably
367          re-attach the read-write packs, so you may have to re-make
368          the symlink if you want to test stuff that's on the packs.
369
370        * At some non-offensive time, release the packs in the dev
371          cell.
372
373        * Send mail to rel-eng saying that the patch release went out,
374          and what was in it.  (You can find many example pieces of
375          mail in the discuss archive.)  Include instructions
376          explaining how to propagate the release to the athena cell.
377
378Rel-eng machines
379----------------
380
381There are three rel-eng machines for each platform:
382
383        * A current release build machine, for doing incremental
384          updates to the last public release.  This machine may also
385          be used by developers for building software.
386
387        * A new release build machine, for building and doing
388          incremental updates to releases which are still in testing.
389          Before a new release goes into testing, this machine should
390          perform the wash.  This machine may also be used by
391          developers for building software, or if they want a snapshot
392          of the new system packs to build things against.
393
394        * A crash and burn machine, usually located in the release
395          engineer's office for easy physical access.
396
397Here is a list of the rel-eng machines for each platform:
398
399                                Sun             Indy            O2
400
401Current release build           downy           snuggle         bounce
402New release build               whirlpool       kenmore         maytag
403Crash and burn                  sourcery        pyramids        reaper-man
404
405For reference, here are some names that fit various laundry and
406construction naming schemes:
407
408        * Washing machines: kenmore, whirlpool, ge, maytag
409        * Laundry detergents: fab, calgon, era, cheer, woolite,
410                tide, ultra-tide
411        * Bleaches: clorox, ajax
412        * Fabric softeners: downy, final-touch, snuggle, bounce
413        * Heavy machinery: steam-shovel, pile-driver, dump-truck,
414                wrecking-ball, crane
415        * Construction kits: lego, capsela, technics, k-nex, playdoh,
416                construx
417        * Construction materials: rebar, two-by-four, plywood,
418                sheetrock
419        * Heavy machinery companies: caterpillar, daewoo, john-deere,
420                sumitomo
421        * Buildings: empire-state, prudential, chrysler
422
423Clusters
424--------
425
426The getcluster(8) man explains how clients interpret cluster
427information.  This section documents the clusters related to the
428release cycle, and how they should be managed.
429
430There are five clusters for each platform, each of the form
431PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
432alpha, beta, early, public) and PLATFORM is the machtype name of the
433platform.  There are two filsys entries for each platform and release
434pointing to the athena cell and dev cell system packs for the release;
435they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
436X and Y are the major and minor numbers of the release.  For the SGI,
437we currently also have athena-sgi-inst-XY and dev-sgi-inst-XY.
438
439At the crash and burn, alpha, and beta phases of the release cycle,
440the appropriate cluster (PHASE-PLATFORM) should be updated to include
441data records of the form:
442
443        Label: syslib           Data: dev-PLATFORMsys-XY X.Y t
444(SGI)   Label: instlib          Data: dev-sgi-inst-XY X.Y t
445
446This change will cause console messages to appear on the appropriate
447machines informing their maintainers of a new testing release which
448they can take manually.
449
450At the early and public phases of the release cycle, the 't' should be
451removed from the new syslib records in the crash, alpha, and beta
452clusters, and the appropriate cluster (early-PLATFORM or
453public-PLATFORM) should be updated to include data records:
454
455        Label: syslib           Data: athena-PLATFORMsys-XY X.Y
456(SGI)   Label: instlib          Data: athena-sgi-inst-XY X.Y
457
458This change will cause AUTOUPDATE machines in the appropriate cluster
459(as well as the crash, alpha, and beta clusters) to take the new
460release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.