Ticket #1305 (closed enhancement: fixed)
auto-upgrade should support re-installation
Reported by: | jdreed | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | Current Semester |
Component: | -- | Keywords: | |
Cc: | Fixed in version: | debathena-auto-update 1.43 | |
Upstream bug: |
Description
auto-upgrade takes great pains to ensure it can't auto-upgrade to the current version. However, I think we want bulk reinstall functionality, for, e.g., this summer, to pick up a new disk layout.
My current naive plan is to add support for a new flag in clusterinfo, 'r' (for 'reinstall'). I'm going to test this in -bleeding.
Change History
comment:1 Changed 12 years ago by jdreed
- Milestone changed from The Distant Future to Current Semester
comment:3 Changed 12 years ago by jdreed
Yeah, uh, we don't actually want a flag on version data, because then machines will reinstall endlessly. Instead, we support a new clusterinfo flag, "reinstall_at", which is a value in seconds since the epoch. If /var/log/athena-install.log's last mod time is less than that value, the machine will be reinstalled at the current release.
comment:4 Changed 12 years ago by jdreed
- Status changed from new to committed
- Fixed in version set to debathena-auto-update 1.43
comment:5 follow-up: ↓ 6 Changed 12 years ago by jweiss
Are there any cases where the install can fail before it starts, but still update the (timestamp on) the log file? Are we worried about the race condition where machines start a small update before the magic timestamp, but finish afterwards? Have we thought about what other screw cases might exist?
comment:6 in reply to: ↑ 5 Changed 12 years ago by jdreed
Replying to jweiss:
Are there any cases where the install can fail before it starts, but still update the (timestamp on) the log file?
I'm not sure what you mean by "fail before it starts", but that log file is created the instant the Debathena installer is launched in the postinstall. If the postinstall fails at any point, the machine comes up with the "Call hotline" greeter. Reinstallation is no more likely to fail than a release upgrade, as they are fundamentally the same operation.
Are we worried about the race condition where machines start a small update before the magic timestamp, but finish afterwards?
I'm not sure what you mean by "small update". Nothing updates the timestamp other than the PXE installation. auto-upgrade runs independently of auto-update. There could be a "race condition" in that a machine is installed (or re-installed by hotline) between when we set the timestamp in Moira, and when it propagates to Hesiod, but we can fix that by setting the timestamp to take into account the next scheduled DCM and the TTL. We could also update the value later.
Have we thought about what other screw cases might exist?
Yes. No new ones were introduced with this. If there are no new releases, then we check if we should be inrestalled. We ensure that the "reinstall_at" value is numeric. stat can only return a numeric value or the empty string, we test for the latter and force it to the number 0. The numeric comparison cannot fail at this point, barring severe internal RAM corruption. If the machine was installed before the timestamp, it will reinstall the current release and bypass the sanity check that prevents "upgrades" to the current release. If not, nothing happens. I have cleaned up the logic a bit.