source: trunk/doc/maintenance @ 16976

Revision 16976, 21.0 KB checked in by ghudson, 22 years ago (diff)
Revise text about imake templates.
Line 
1This file contains notes about the care and feeding of the Athena
2source repository.  It is intended primarily for the administrators of
3the source tree, not for developers (except perhaps for the first
4section, "mailing lists").  See the file "procedures" in this
5directory for information about procedures relevant to developers.
6
7The areas covered in this file are:
8
9        Mailing lists
10        Permissions
11        The wash process
12        Imake templates
13        Release notes
14        Release cycles
15        Patch releases
16        Rel-eng machines
17        Cluster information
18
19Mailing lists
20-------------
21
22Here are descriptions of the mailing lists related to the source tree:
23
24        * source-developers
25
26                For discussion of the policy and day-to-day
27                maintenance of the repository.  This is a public list,
28                and there is a public discuss archive on menelaus.
29
30        * source-reviewers
31
32                For review of changes to be checked into the
33                repository.  To be a member of this mailing list, you
34                must have read access to the non-public parts of the
35                source tree, but you do not need to be a staff member.
36                There is a non-public discuss archive on menelaus.
37
38        * source-commits
39
40                This mailing lists receives commit logs for all
41                commits to the repository.  This is a public mailing
42                list.  There is a public discuss archive on menelaus.
43
44        * source-diffs
45
46                This mailing list receives commit logs with diffs for
47                all commits to the repository.  To be on this mailing
48                list, you must have read access to the non-public
49                parts of the source tree.  There is no discuss archive
50                for this list.
51
52        * source-wash
53
54                This mailing list receives mail when the wash process
55                blows out.  This is a public mailing list.  There is
56                no discuss archive for this list.
57
58        * rel-eng
59
60                The release engineering mailing list.  Mail goes here
61                about patch releases and other release details.  There
62                is a public archive on menelaus.
63
64        * release-team
65
66                The mailing list for the release team, which sets
67                policy for releases.  There is a public archive on
68                menelaus (currently, it has the name "release-77").
69
70Permissions
71-----------
72
73Following are descriptions of the various groups found on the acls of
74the source tree:
75
76        * read:source
77          read:staff
78
79                These two groups have identical permissions in the
80                repository, but read:source contains artificial
81                constructs (the builder user and service principals)
82                while read:staff contains people.  In the future,
83                highly restricted source could have access for
84                read:source and not read:staff.
85
86                Both of these groups have read access to non-public
87                areas of the source tree.
88
89        * write:staff
90
91                Contains developers with commit access to the source
92                tree.  This group has write access to the repository,
93                but not to the checked-out copy of the mainline
94                (/mit/source).
95
96        * write:update
97
98                Contains the service principal responsible for
99                updating /mit/source.  This group has write access to
100                /mit/source but not to the repository.
101
102        * adm:source
103
104                This group has administrative access to the repository
105                and to /mit/source.
106
107system:anyuser has read access to public areas of the source tree and
108list access to the rest.  system:authuser occasionally has read access
109to areas that system:anyuser does not (synctree is the only current
110example).
111
112The script CVSROOT/afs-protections.sh in the repository makes sure the
113permissions are correct in the repository or in a working directory.
114Run it from the top level of the repository or of /mit/source, giving
115it the argument "repository" or "wd".
116
117The wash process
118----------------
119
120The wash process is a nightly rebuild of the source repository from
121scratch, intended to alert the source tree maintainers when someone
122checks in a change which causes the source tree to stop building.  The
123general architecture of the wash process is:
124
125        * Each night at midnight, a machine performs a cvs update of
126          the checked-out tree in /afs/dev.mit.edu/source/src-current.
127          If the cvs update fails, the update script sends mail to
128          source-wash@mit.edu.  This machine is on read:source and
129          write:update.
130
131        * Each night at 4:30am, a machine of each architecture
132          performs a build of the tree into /var/srvd.new, using the
133          build directory /var/build.  If the build fails, the wash
134          script copies the log of the failed build into AFS and sends
135          mail to source-wash@mit.edu with the last few lines of the
136          log.  If the build succeeds, the wash script moves
137          /var/srvd.new to /var/srvd, so that /var/srvd is always the
138          last successful build of the source tree.
139
140        * Each Sunday at 1:00am, the wash machines make a copy of
141          their last successful builds into a "srvd-current" directory
142          in AFS.  The copy is done without system:administrator
143          privileges, so the file permissions on srvd-current are all
144          wrong, but the current srvd is useful for development work.
145
146Source for the wash scripts lives in /afs/dev.mit.edu/service/wash.
147They are installed in /usr/local on the wash machines.  Logs of the
148start and end times of the wash processes on each machine live in
149/afs/dev.mit.edu/service/wash/status/`hostname`.  See "Rel-eng
150machines" below to find out which machines take part in the wash
151process.
152
153To set up the source update on a machine:
154
155        * Ensure that it is in the set of machines installed onto by
156          /afs/dev.mit.edu/service/wash/inst, and run that script to
157          install the wash scripts onto that machine.
158
159        * Set up the cron job on the machine according to
160          /afs/dev.mit.edu/service/wash/README.
161
162        * Ensure that the machine has a host key or rcmd srvtab.
163
164        * Ensure that rcmd.machinename has a PTS identity in the dev
165          cell.
166
167        * Ensure that rcmd.machinename is in write:update.
168
169To set up the wash on a machine:
170
171        * Ensure that it is in the set of machines installed onto by
172          /afs/dev.mit.edu/service/wash/inst, and run that script to
173          install the wash scripts onto that machine.
174
175        * Set up cron jobs on the machine according to
176          /afs/dev.mit.edu/service/wash/README.
177
178        * Ensure that the machine has a host key or rcmd srvtab.
179
180        * Ensure that rcmd.machinename has a PTS identity in the dev
181          cell.
182
183        * Ensure that rcmd.machinename is in read:source.
184
185        * Ensure that
186          /afs/dev.mit.edu/service/wash/status/machinename.mit.edu
187          exists and that rcmd.machinename has write access to it.
188
189        * Ensure that /afs/dev.mit.edu/system/systemtype/srvd-current
190          exists as a separate volume with adequate quota, and that
191          rcmd.machinename has write access to it.
192
193Imake templates
194---------------
195
196We don't like imake, but we have two sets of imake templates:
197
198        * packs/build/config
199
200                These templates are the legacy Athena build system.
201                They are no longer used by any software in the
202                release; we install them in case someone wants to
203                build some very old software.
204
205        * packs/build/xconfig
206
207                These templates are used for building software which
208                uses X-style Imakefiles.  They may need periodic
209                updating as new versions of X are released.  These
210                templates are full of a lot of hacks, mostly because
211                the imake model isn't really adequate for dealing with
212                third-party software and local site customizations.
213
214Release notes
215-------------
216
217There are two kinds of release notes, the system release notes and the
218user release notes.  The system release notes are more comprehensive
219and assume a higher level of technical knowledge, and are used in the
220construction of the user release notes.  It is the job of the release
221engineer to produce a set of system release notes for every release,
222with early versions towards the beginning of the release cycle.  The
223best way to make sure this happens is to maintain the system release
224notes throughout the entire development cycle.
225
226Thus, it is the job of the release engineer to watch the checkins to
227the source tree and enter a note about all user-visible changes in the
228system release notes, which live in /afs/dev.mit.edu/project/relnotes.
229Highly visible changes should appear near the beginning of the file,
230and less visible changes should appear towards the end.  Changes to
231particular subsystems should be grouped together when possible.
232
233Release cycles
234--------------
235
236Release cycles have five phases: crash and burn, alpha, beta, early,
237and the public release.  The release team has a set of criteria for
238entering and exiting each phase, which won't be covered here.  The
239following guidelines should help the release go smoothly:
240
241        * Crash and burn
242
243          This phase is for rel-eng internal testing.  The release
244          engineer needs to make sure that the current source base
245          works well enough for testers to use it and find bugs.  For
246          crash and burn to begin, the operating system support person
247          for each platform must provide a way to install or update a
248          machine to the new version of the operating system for that
249          platform.
250
251          Each platform needs a build tree and system packs volume.
252          The build tree should be mounted in
253          /afs/dev.mit.edu/project/release/<version>/build/<sysname>.
254          The system packs volume should be mounted in
255          /afs/dev.mit.edu/system/<sysname>/srvd-<version>.
256
257          Each platform needs a new-release build machine to generate
258          system packs to test.  For an existing platform, this is
259          generally the wash machine.  The wash machine needs to be
260          updated to the new operating system (a reinstall is
261          sometimes the simplest way).  Release build machines are set
262          up as follows:
263
264                - /etc/athena/access contains "builder rl"
265                - In /etc/athena/rc.conf:
266                        SSHD is true
267                        ACCESSON is true
268                        RVDCLIENT is false
269                        AUTOUPDATE is false
270                - /build is a symlink to the build tree
271                - /os is a symlink to the rw of the os volume
272                - /install is a symlink to the rw of the os volume
273                - /srvd is a symlink to the ro of the packs
274                - /.srvd is a symlink to the rw of the packs
275                - For Solaris, /usr/gcc is a symlink to /.srvd/usr/gcc
276                - The source locker is attached and locked.
277
278          Doing a full build for release testing is then simple:
279
280                # Get tickets as builder and ssh to the wash machine
281                rm -rf /.srvd/* /.srvd/.??* /build/* /build/.??*
282                sh /build/packs/build/build.sh -l &
283
284          (It can be useful to run the ssh to the build machine inside
285          a screen session so you don't have to log out of the build
286          machine until the build is finished.)
287
288          The crash and burn machines should be identified and used to
289          test the update (and install, if possible).  System packs
290          may be regenerated at will.  The system packs volume does
291          not need any replication.
292
293          System release notes should be prepared during this phase.
294
295          Before the transition from crash and burn to alpha, the
296          release engineer should do a sanity check on the new packs
297          by comparing a file listing of the new packs to a file
298          listing of the previous release's packs.  The release
299          engineer should also check the list of configuration files
300          for each platform (in packs/update/platform/*/configfiles)
301          and make sure that any configuration files which have
302          changed are listed as changed in the version script.
303          Finally, the release should be checked to make sure it won't
304          overflow partitions on any client machines; currently, SGIs
305          are not a problem (because they have one big partition) and
306          the most restrictive sizes on Solaris clients are 27713K and
307          51903K of useable space for the root and /usr partitions.
308
309          A note on the wash: it is not especially important that the
310          wash be running during the release cycle, but currently the
311          wash can run on the new release build machine without
312          interfering with the build functions of the machine.
313          So after updating the wash machine to the new OS for new
314          release builds, the release engineer can set up the wash
315          right away.
316
317        * Alpha
318
319          The alpha phase is for internal testing by the release team.
320          System packs may still be regenerated at will, but the
321          system packs volume (and os volume) should be read-only so
322          it can be updated by a vos release.  Changes to the packs do
323          not need to be propagated in patch releases; testers are
324          expected to be able to ensure consistency by forcing repeat
325          updates or reinstalling their machines.
326
327          User release notes should be prepared during this phase.
328
329          Before the transition from alpha to beta, doc/third-party
330          should be checked to see if miscellaneous third-party files
331          (the ones not under the "third" hierarchy) should be
332          updated.
333
334        * Beta
335
336          The beta phase involves outside testers.  System packs and
337          os volumes should be replicated on multiple servers, and
338          permissions should be set to avoid accidental changes
339          (traditionally this means giving write access to
340          system:packs, a normally empty group).  Changes to the packs
341          must be propagated by patch releases.
342
343          User release notes should be essentially finished by the end
344          of this phase.  System release notes may continue to be
345          updated as bug fixes occur.  Ideally, no new features should
346          be committed to the source tree during the beta phase.
347
348          For the transition from beta to early:
349
350                - Prepare a release branch with a name of the form
351                  athena-8_1.  Tag it with athena-8_1-early.
352
353                - Create a volume with a mountpoint of the form
354                  /afs/dev.mit.edu/source/src-8.1 and check out a tree
355                  on the branch there.  Set the permissions by doing
356                  an fs copyacl from an older source tree before the
357                  checkout, and run CVSROOT/afs-permissions.sh after
358                  the checkout.  Copy over the .rconf file from the
359                  src-current directory.  Have a filsys entry of the
360                  form source-8.1 created for the new tree.
361
362                - attach and lock the branch source tree on each build
363                  machine.
364
365                - Do a final full build of the release from the branch
366                  source tree.
367
368        * Early
369
370          The early release involves more outside testers and some
371          cluster machines.  The release should be considered ready
372          for public consumption.
373
374          The release branch should be tagged with a name of the form
375          athena-8_1-early.
376
377        * Release
378
379          The release branch should be tagged with a name of the form
380          athena-8_1-release.
381
382          Once the release has gone public, the current-release
383          machines should be updated to the release and set up as the
384          build machines for the now-current release.  Remove the
385          /build and /.srvd symlinks on the new-release build
386          machines, and make sure the wash is running on them if you
387          didn't do so back in the crash and burn phase.
388
389One thing that needs to happen externally during a release cycle, if
390there is an OS upgrade involved, is the addition of compatibility
391symlinks under the arch directories of various lockers. All of the
392lockers listed in packs/glue/specs, as well as tellme, mkserv, and
393andrew, definitely need to be hit, and the popular software lockers
394need to be hit as well. Here is a reasonable list of popular lockers
395to get in addition to the glue ones:
396
397        consult
398        games
399        gnu
400        graphics
401        outland
402        sipb
403        tcl
404        watchmaker
405        windowmanagers
406        /afs/sipb/project/tcsh
407
408In addition, the third-party software lockers need to be updated; the
409third-party software group keeps their own list.
410
411Patch releases
412--------------
413
414Once a release has hit beta test, all changes to the release must be
415propagated through patch releases.  The steps to performing a patch
416release are:
417
418        * Check in the changes on the mainline (if they apply) and on
419          the release branch and update the relevant sections of the
420          source tree in /mit/source-<version>.
421
422        * If the update needs to do anything other than track against
423          the system packs, you must prepare a version script which
424          deals with any transition issues, specifies whether to track
425          the OS volume, specifies whether to deal with a kernel
426          update, and specifies which if any configuration files need
427          to be updated.  See the update script
428          (packs/update/do-update.sh) for details.  See
429          packs/build/update/os/*/configfiles for a list of
430          configuration files for a given platform.  The version
431          script should be checked in on the mainline and on the
432          release branch.
433
434        * Do the remainder of the steps as "builder" on the build
435          machine.  Probably the best way is to get Kerberos tickets
436          as "builder" and ssh to the build machine.
437
438        * Make sure to add symlinks under /build tree for any files
439          you have added.  Note that you probably added a build script
440          if the update needs to do anything other than track against
441          the system packs.
442
443        * In the build tree, bump the version number in
444          packs/build/version (the symlink should be broken for this
445          file to avoid having to change it in the source tree).
446
447        * If you are going to need to update binaries that users run
448          from the packs, go into the packs and move (don't copy) them
449          into a .deleted directory at the root of the packs.  This is
450          especially important for binaries like emacs and dash which
451          people run for long periods of time, to avoid making the
452          running processes dump core when the packs are released.
453
454        * Update the read-write volume of the packs to reflect the
455          changes you've made.  You can use the build.sh script to
456          build and install specific packages, or you can use the
457          do.sh script to build the package and then install specific
458          files (cutting and pasting from the output of "gmake -n
459          install DESTDIR=/srvd" is the safest way); updating the
460          fewest number of files is preferrable.  Remember to install
461          the version script.
462
463        * Use the build.sh script to build and install
464          packs/build/finish.  This will fix ownerships and update the
465          track lists and the like.
466
467        * It's a good idea to test the update from the read-write
468          packs by symlinking the read-write packs to /srvd on a test
469          machine and taking the update.  Note that when the machine
470          comes back up with the new version, it will probably
471          re-attach the read-write packs, so you may have to re-make
472          the symlink if you want to test stuff that's on the packs.
473
474        * At some non-offensive time, release the packs in the dev
475          cell.
476
477        * Send mail to rel-eng saying that the patch release went out,
478          and what was in it.  (You can find many example pieces of
479          mail in the discuss archive.)  Include instructions
480          explaining how to propagate the release to the athena cell.
481
482Rel-eng machines
483----------------
484
485The machine running the wash update is equal-rites.mit.edu.
486
487There are three rel-eng machines for each platform:
488
489        * A current release build machine, for doing incremental
490          updates to the last public release.  This machine may also
491          be used by developers for building software.
492
493        * A new release build machine, for building and doing
494          incremental updates to releases which are still in testing.
495          This machine also performs the wash.  This machine may also
496          be used by developers for building software, or if they want
497          a snapshot of the new system packs to build things against.
498
499        * A crash and burn machine, usually located in the release
500          engineer's office for easy physical access.
501
502Here is a list of the rel-eng machines for each platform:
503
504                        Sun                  O2          Linux
505
506Current release build   downy                bounce      snuggle
507New release build       maytag               whirlpool   kenmore
508Crash and burn          the-colour-of-magic  reaper-man  men-at-arms
509
510For reference, here are some names that fit various laundry and
511construction naming schemes:
512
513        * Washing machines: kenmore, whirlpool, ge, maytag
514        * Laundry detergents: fab, calgon, era, cheer, woolite,
515                tide, ultra-tide, purex
516        * Bleaches: clorox, ajax
517        * Fabric softeners: downy, final-touch, snuggle, bounce
518        * Heavy machinery: steam-shovel, pile-driver, dump-truck,
519                wrecking-ball, crane
520        * Construction kits: lego, capsela, technics, k-nex, playdoh,
521                construx
522        * Construction materials: rebar, two-by-four, plywood,
523                sheetrock
524        * Heavy machinery companies: caterpillar, daewoo, john-deere,
525                sumitomo
526        * Buildings: empire-state, prudential, chrysler
527
528Clusters
529--------
530
531The getcluster(8) man explains how clients interpret cluster
532information.  This section documents the clusters related to the
533release cycle, and how they should be managed.
534
535There are five clusters for each platform, each of the form
536PHASE-PLATFORM, where PHASE is a phase of the release cycle (crash,
537alpha, beta, early, public) and PLATFORM is the machtype name of the
538platform.  There are two filsys entries for each platform and release
539pointing to the athena cell and dev cell system packs for the release;
540they have the form athena-PLATFORMsys-XY and dev-PLATFORMsys-XY, where
541X and Y are the major and minor numbers of the release.  For the SGI,
542we currently also have athena-sgi-inst-XY and dev-sgi-inst-XY.
543
544At the crash and burn, alpha, and beta phases of the release cycle,
545the appropriate cluster (PHASE-PLATFORM) should be updated to include
546data records of the form:
547
548        Label: syslib           Data: dev-PLATFORMsys-XY X.Y t
549(SGI)   Label: instlib          Data: dev-sgi-inst-XY X.Y t
550
551This change will cause console messages to appear on the appropriate
552machines informing their maintainers of a new testing release which
553they can take manually.
554
555At the early and public phases of the release cycle, the 't' should be
556removed from the new syslib records in the crash, alpha, and beta
557clusters, and the appropriate cluster (early-PLATFORM or
558public-PLATFORM) should be updated to include data records:
559
560        Label: syslib           Data: athena-PLATFORMsys-XY X.Y
561(SGI)   Label: instlib          Data: athena-sgi-inst-XY X.Y
562
563This change will cause AUTOUPDATE machines in the appropriate cluster
564(as well as the crash, alpha, and beta clusters) to take the new
565release; console messages will appear on non-AUTOUPDATE machines.
Note: See TracBrowser for help on using the repository browser.