Ticket #328 (new enhancement)

Opened 11 years ago

Last modified 11 years ago

Investigate power management in clusters

Now that we have an operating system that at least pretends to support ACPI, we should look at whether it's possible to have the machines in the clusters sleep. There are some issues with this of course, the most significant being that the machines need to take updates. Hotline is ok with this, provided there's some good visual way to differentiate a sleeping machine from a dead or powered-off machine (ie: LED color, as with the monitors).

Perhaps sleeping could be limited to low-volume hours. Or we could start a pilot in one cluster.

Change History

comment:1 Changed 11 years ago by wdc

Inasmuch as the BIOS bug (now resolved) was related to the CPU going into the C3 idle state, I'll mention that the Dell 760's already do an active power save all the time.
If it gets to be too difficult to do an actual system power down, be consoled that some systems are already doing better power management without being put to sleep.

comment:2 Changed 11 years ago by jweiss

Note that sleeping a machine will presumably make it fail it's nagios checks. Hotline may be less okay with that than they realize at the moment. Still, it should be possible to sleep them off hours (when hotline doesn't care about nagios) assuming we can automate a way to wake them up on a schedule. If we do this we should think about making the code sufficiently modular that it can be stolen by the TSM project to allow your machine to sleep at night yet still get backed up.

