Ticket #536 (closed defect: fixed)

Opened 12 years ago

Last modified 11 years ago

Stop using kexec as soon as possible

Reported by: jdreed Owned by:
Priority: normal Milestone: Summer 2010 (Lucid Deploy)
Component: -- Keywords:
Cc: Fixed in version:
Upstream bug:

Description

Kexec makes it _impossible_ to actually reboot a machine, because even if you unload kexec and reboot, /etc/init.d/kexec-load will reload it for you.

We should go check on why we had to use kexec in the first place, and see whether we still need to do so in Karmic/Lucid?.

Change History

comment:1 Changed 12 years ago by geofft

There were two reasons, as I recall, that we used kexec:

  1. reactivate used to reboot machines at logout way more frequently than it does now, so we wanted to skip the time penalty of going through the BIOS/hardware boot process just to reactivate a machine for the next login. This is less of a problem now; besides, if we want this, we can explicity load a kernel via kexec in reactivate, just before going through the reboot process. If we disable unconditionally loading a kernel, I believe the kexec initscript will still use a loaded kernel.
  1. OptiPlex? 745s would hang and be unable to reboot when the kernel told them to reboot. This was previously tracked as  Jira:ATN-52, which points to a bunch of bugs ( LP:115011,  LP:115906,  LP:114854) against Linux 2.6.20, as well as  an lkml post adding a blacklist for the 745. That patch seems to have been taken in some form as  df2edcf for 2.6.23-rc1, and now matches all 745s rather than just that specific board. We're now sufficiently past 2.6.23 that there's a good chance the kernel can now reboot 745s successfully, so it may no longer be necessary to use kexec for this.

comment:2 Changed 11 years ago by broder

  • Status changed from new to proposed

Fixed in r24420 and r24421 and uploaded to -proposed.

We should test

  1. That reactivate kexecs if it reboots
  2. That Dell 745s can reboot without kexec

comment:3 Changed 11 years ago by broder

If we actually want to kill off kexec on current installs, we need to do something more clever.

Jaunty used to enable kexec by default. Then Ubuntu disabled it ( LP:251242). But since they disabled it while we had /etc/default/kexec diverted, kexec got turned off in /etc/default/kexec.debathena, not /etc/default/kexec.debathena-orig. Once we undid that diversion, kexec was re-enabled again.

The next time the clusters get reinstalled, this problem won't happen, because the new kexec-tools will get installed without the diversion in place. So if we're OK with waiting to fix this until the Lucid install, then we're done. Otherwise, we should go back to diverting /etc/default/kexec, but explicitly turn *off* kexec.

comment:4 Changed 11 years ago by broder

In the mean time, I've manually set LOAD_KEXEC=false on lola-granola, and verified that

  1. reactivate kexecs when it reboots, but drops "single" from the cmdline
  2. Rebooting from the "Actions" gdm menu does not reboot via kexec.

comment:5 Changed 11 years ago by broder

  • Status changed from proposed to closed
  • Resolution set to fixed

I've moved the fixes to reactivate and cluster-login-config into production, so I'm going to go ahead and close this ticket.

If we decide it's important to fix the current cluster machines before they get reinstalled this summer, we can open a separate ticket for that.

Note: See TracTickets for help on using tickets.