Ticket #783 (closed enhancement: fixed)
We need a recovery hook
Reported by: | jdreed | Owned by: | jdreed |
---|---|---|---|
Priority: | normal | Milestone: | The Distant Future |
Component: | -- | Keywords: | |
Cc: | Fixed in version: | ||
Upstream bug: |
Description
We need a somewhat reliable hook for when machines explode. This can be something like:
- a script in AFS that gets sourced at boot time (and thus any failure is as simple as "reboot the machine")
- a script in AFS that gets sourced by cron periodically
- a script in AFS that gets sourced by auto-update (to fix things prior to an update).
This could potentially also minimize the need for us to do stupid version-specific things in maintainer scripts when we screw up.
Change History
comment:2 Changed 14 years ago by jdreed
Arguably, with a public root password, we'll always have an authenticity problem. If a user is going to go to the trouble of hijacking DNS and performing a MITM attack, then it would be trivial to replace any CA in the machine's keychain.
comment:3 Changed 14 years ago by mitchb
I don't see how that's at all the same issue. To use the root
password, you first have to either trick someone into running your
code or physically go compromise the machine. We're discussing
not getting duped into running an attacker's code here, in an
automated fashion, on an entire cluster.
comment:4 Changed 14 years ago by jdreed
Note that if we go the https route (compared to, say, http and a script signed with the Debathena PGP key), we should use a server with an Equifax cert, since that will work even if debathena-ssl-certificates somehow explodes.
comment:5 Changed 14 years ago by geofft
This seems way too complicated. We are depending on the recovery mode package to not be borked, so why not just stuff an extra copy of the CA into this package and run wget --ca-certificate?
(Honestly, if we want to be really sure about this, we'd statically compile a program against libcurl.)
comment:6 Changed 14 years ago by jdreed
So, I think the right thing to do here is probably to have auto-update pull something from demeter via https (verified against the MITCA) and run it. Here are the potential failure modes I see:
- demeter's cert expires - it's important enough infrastructure that this is unlikely to happen, and in an emergency, we can get one (or is mitcert@ still a single point of failure?)
- auto-update stops running: the alternative is a periodic cron job, and if cron is somehow broken, we lose regardless
- ssl-certificates explodes: Any time we update ssl-certificates, a pre-requisite for it getting into proposed is that we re-test this update-recovery method.
comment:7 Changed 14 years ago by jdreed
- Owner set to jdreed
- Status changed from new to accepted
Here's a first pass:
Index: mitCA.crt =================================================================== --- mitCA.crt (revision 0) +++ mitCA.crt (revision 0) @@ -0,0 +1,21 @@ +-----BEGIN CERTIFICATE----- +MIIDZTCCAs6gAwIBAgIBATANBgkqhkiG9w0BAQUFADB7MQswCQYDVQQGEwJVUzEW +MBQGA1UECBMNTWFzc2FjaHVzZXR0czEuMCwGA1UEChMlTWFzc2FjaHVzZXR0cyBJ +bnN0aXR1dGUgb2YgVGVjaG5vbG9neTEkMCIGA1UECxMbTUlUIENlcnRpZmljYXRp +b24gQXV0aG9yaXR5MB4XDTA2MDQwODE2NTAwNFoXDTI2MDgwMTE2NTAwNFowezEL +MAkGA1UEBhMCVVMxFjAUBgNVBAgTDU1hc3NhY2h1c2V0dHMxLjAsBgNVBAoTJU1h +c3NhY2h1c2V0dHMgSW5zdGl0dXRlIG9mIFRlY2hub2xvZ3kxJDAiBgNVBAsTG01J +VCBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTCBnzANBgkqhkiG9w0BAQEFAAOBjQAw +gYkCgYEA09Dr51G1M3Wm2KOE6gJwXM+cIOALA4uORm4VJeF39mvEcN3UFgvMEYgx +OAvufFkkV+mNzXX4UmPdMwzwT5+1/JGuMoWMGnVjGZiGHpIjsofz9cmmopdo8uyy +Gq2z9e0J6sznvLRkUBXmVwAaesbe/uEwWFpdq7u0HBHsZMHTpFUCAwEAAaOB+DCB +9TAdBgNVHQ4EFgQUU/WjDwZdZdiKj1JtafrrVS29iwwwgaUGA1UdIwSBnTCBmoAU +U/WjDwZdZdiKj1JtafrrVS29iwyhf6R9MHsxCzAJBgNVBAYTAlVTMRYwFAYDVQQI +Ew1NYXNzYWNodXNldHRzMS4wLAYDVQQKEyVNYXNzYWNodXNldHRzIEluc3RpdHV0 +ZSBvZiBUZWNobm9sb2d5MSQwIgYDVQQLExtNSVQgQ2VydGlmaWNhdGlvbiBBdXRo +b3JpdHmCAQEwDAYDVR0TBAUwAwEB/zALBgNVHQ8EBAMCAQYwEQYJYIZIAYb4QgEB +BAQDAgEGMA0GCSqGSIb3DQEBBQUAA4GBAMTjXyVdM89JlPTzoe3o5CIvUP6TrWMN +Bm3/mSX5pXeZWbWLtdVfUgQ9mW6UBYXaQSUPmz9C09ZNBH8N3vOoDS5/jD8MMcV/ +U/rOAIb4v2bMRKpPweSINGm72Pv/Pg15t1sRcnatBK94orekYvfJa3PiPU/3pfge +RYhCd9zByXr2 +-----END CERTIFICATE----- Index: athena-auto-update =================================================================== --- athena-auto-update (revision 24997) +++ athena-auto-update (working copy) @@ -142,6 +142,33 @@ # Tell apt not to expect user input during package installation. export DEBIAN_FRONTEND=noninteractive +UPDATE_HOOK_URL=https://athena10.mit.edu/debathena-update-hook.sh +UPDATE_HOOK_SUM=https://athena10.mit.edu/debatshena-update-hook-sha256sum +MITCA=/usr/share/debathena-auto-update/mitCA.crt +UPDATE_HOOK=/var/run/debathena-update-hook.sh + +rm -f $UPDATE_HOOK +if curl -sf -o $UPDATE_HOOK --cacert $MITCA $UPDATE_HOOK_URL; then + chmod 500 $UPDATE_HOOK + SHA256SUM=$(curl -sf --cacert $MITCA $UPDATE_HOOK_SUM) + rv=$? + if [ $rv = 0 ]; then + LOCALSUM=$(sha256sum $UPDATE_HOOK | awk '{print $1}') + if [ "$SHA256SUM" = "$LOCALSUM" ]; then + if ! $UPDATE_HOOK; then + complain "update hook returned non-zero status" + exit + fi + else + complain "bad update hook checksum ($SHA256SUM != $LOCALSUM)" + exit + fi + else + complain "Failed to retrieve $UPDATE_HOOK_SUM (curl returned $rv)" + exit + fi +fi + # Configure any unconfigured packages (Trac #407) if ! v dpkg --configure -a; then complain "Failed to configure unconfigured packages." Index: changelog =================================================================== --- changelog (revision 25005) +++ changelog (working copy) @@ -1,3 +1,10 @@ +debathena-auto-update (1.23) UNRELEASED; urgency=low + + * Add support for an update hook to recovery from catastrophes + (Trac #783) + + -- Jonathan Reed <jdreed@mit.edu> Mon, 07 Mar 2011 21:45:58 -0500 + debathena-auto-update (1.22.2) unstable; urgency=low * Use the correct version notation when removing obsolete conffiles Index: debathena-auto-update.install =================================================================== --- debathena-auto-update.install (revision 24997) +++ debathena-auto-update.install (working copy) @@ -2,3 +2,4 @@ debian/athena-auto-update.8 usr/share/man/man8 debian/athena-auto-upgrade usr/sbin debian/athena-auto-upgrade.8 usr/share/man/man8 +debian/mitCA.crt usr/share/debathena-auto-update
comment:8 follow-up: ↓ 9 Changed 14 years ago by kaduk
As Anders noted on zephyr,
debathena / trac-#783 / andersk 21:49 (Anders Kaseorg) Please do shell-quote things at some point. debathena / trac-#783 / jdreed 21:50 (This zephyr does not necessarily refl The URLs? Yeah, I realized that right after I updated Trac
though I would be inclined to put double quotes around dollar-expansions as well.
I assume that we trust the script to be running with a sane umask and /var/run to not have dumb permissions.
Is there some Debian policy about scripts having or not having .sh extensions?
comment:9 in reply to: ↑ 8 Changed 14 years ago by amu
Replying to kaduk:
though I would be inclined to put double quotes around dollar-expansions as well.
Yeah, that's generally wise.
Is there some Debian policy about scripts having or not having .sh extensions?
Per policy 10.4 ( http://www.debian.org/doc/debian-policy/ch-files.html#s-scripts), scripts in PATH should not have .sh extensions. Elsewhere, it's pretty much a matter of taste (and avoiding gratuitous differences from upstream, not that that's an issue in this case).
comment:10 Changed 13 years ago by jdreed
- Status changed from accepted to committed
OK, this is now committed. The script will not run on -workstation by default. Right now, the update_hook sends a zephyr to -c debathena-update-hook, for testing purposes.
Before auto-update 1.23 is moved to proposed, the file should be removed, and the ACLs on that directory cleared up so that only debathena-root and ops can write to it.
We should probably also take this opportunity to repoint athena10's docroot at the AFS cell
comment:12 Changed 13 years ago by jdreed
- Status changed from development to proposed
Moving to proposed now. I'm going to delete the hook once I see that w20-575-{1,7} have taken it.
comment:13 Changed 13 years ago by jdreed
- Status changed from proposed to closed
- Resolution set to fixed
I notice that all of the options Jon proposes involve AFS, which
seems high on the list of things that could be borked in an update
accident. A wget/curl would be a much more reliable thing in that
you don't need much of a system to do it, but has the authenticity
problem. https fetch of something from demeter?