Ticket #303 (closed defect: fixed)
Figure out Ubuntu release upgrades for clusters
Reported by: | broder | Owned by: | amb |
---|---|---|---|
Priority: | blocker | Milestone: | Summer 2010 (Lucid Deploy) |
Component: | -- | Keywords: | karmic |
Cc: | Fixed in version: | ||
Upstream bug: |
Description
We know that (or at least hope that) we'll be switching to Ubuntu Karmic some time in IAP 2010. What we don't know is how we want to trigger the clusters to take that update.
Should they reinstall themselves? do-release-upgrade?
We should figure out and test the mechanism by which this happens some time before it needs to happen, i.e. IAP.
Change History
comment:2 in reply to: ↑ 1 Changed 15 years ago by jweiss
If we're going to re-install, we need to be careful that we haven't allowed anyone to write data to the local disk of their private machine that they want to be "just like the cluster". Actually the chroot probably does this for us, so never mind. We would still need to think about how to upgrade -workstation machines tho, wouldn't we?
comment:3 Changed 15 years ago by geofft
It is actually the case that schroot bind-mounts /home through, so that continues to be writable; someone on -c help had a private -cluster system and was apparently regularly using a directory he'd created in /home. (That may be its own bug.) While part of me wants to blow away their /home so that they stop using -cluster, I realize we can't actually destroy data like that...
I'd assumed -workstation machines would be upgraded by someone explicitly running the Ubuntu updater, but it is the case that we claim to manage updates for them, so we probably should figure out a way to either trigger do-release-upgrade automatically safely, or tell people that they should run the Ubuntu updater once we finish gdm-config and such.
comment:5 Changed 14 years ago by jdreed
While it's easy to throw a hackboot package out there, I'd like it if we did something a bit more clever that perhaps made use of a new clusterinfo key. For example, we could deploy everything, and then set the magic cluster info key for, say, a few clusters so we can see what happens. Then upgrade the rest of campus. This avoids 300 machines simultaneously (even desync'd) hosing various servers, and allows us to not completely shoot ourselves in the foot if something breaks. (It's a lot easier to reinstall, say, 2-225 by hand than to reinstall every single machine on campus).
comment:7 Changed 14 years ago by jdreed
At release-team, we decided that we should add a clusterinfo key for "current supported release", and modify auto-update to do the upgrade based on the key. We'll need to change the desync interval.
comment:8 Changed 14 years ago by jdreed
Unless someone convinces me not to, I'm going to use the 4 field clusterinfo format. e.g.:
ubuntu_release jaunty 9.04
ubuntu_release lucid 10.04 t
comment:9 Changed 14 years ago by jdreed
The clusterinfo is mostly in place, but the proposed machines lack the Jaunty line. I have sent mail to ops.
Additionally, a first draft of the update mechanism is in r24797. Feedback would be appreciated.
I haven't yet decided whether to run it from cron separately, or from auto-update or what.
comment:10 Changed 14 years ago by jdreed
- Status changed from assigned to development
I successfully upgraded a machine with this, and I believe jhamrick did too.
Next step is to send mail to remove the 't' flag from the lucid cluster data, and push it out to beta-linux (and send mail to testers). There is one intrepid cluster machine in beta-linux. amb will create a "jaunty" subdirectory under /net-install so it will work.
comment:11 Changed 14 years ago by jdreed
- Status changed from development to proposed
We don't in fact care about the intrepid machine. This is now in proposed, and will be going to production later today.
comment:12 Changed 14 years ago by jdreed
- Status changed from proposed to closed
- Resolution set to fixed
We figured them out.
I'm sure there's been discussion of this but I don't see it on the ticket.
Anyway, I favor machines reinstalling themselves because it reduces the effective difference between cluster machines; we've already seen a couple of heisenbugs involving machines installed at different times. I think this should be done by debathena-auto-update sketching on grub.conf and doing a notify-reboot-required.