Ticket #294 (closed defect: fixed)

Opened 12 years ago

Last modified 12 years ago

Installer should force NTP sync and hwclock sync, and have a working ntpd afterwards as well

Reported by: jdreed Owned by: amb
Priority: normal Milestone: Fall 2009 Release
Component: -- Keywords:
Cc: Fixed in version:
Upstream bug:

Description

After today's reinstall, about half the cluster was left with workstations that were off by 4 hours, due to confusion about whether the hardware clock should use UTC or not. I've repaired them all by hand, but it would be great if the installer forced a synchronization with time.mit.edu at some point.

Change History

comment:1 Changed 12 years ago by broder

It looks like /etc/network/if-up.d/ntpdate should handle running ntpdate whenever networking comes up. I don't know why that's not triggering.

comment:2 Changed 12 years ago by amb

  • Owner set to amb
  • Status changed from new to accepted

I'm fairly sure this is done; the installer already gets the correct time from the network early in the install process, but without /etc/adjtime on the installed system it's read after boot as being local time; install-debathena.sh now takes care of this. (That said, I'm baffled why *all* installed systems weren't thus broken. Maybe the "set time on network start" code sorta-kinda-works-sometimes?)

comment:3 Changed 12 years ago by rbasch

I believe the problem at start-up is that DNS lookups are failing, from the time the network is started (rcS.d/S40networking), until named is started (rc2.d/S15bind9). Because we add an eth0 stanza in /etc/network/interfaces, and mark it "auto", ifup configures it, invoking the scripts in /etc/network/if-up.d; the first script is 000resolvconf, which will overwrite the nameserver information for the interface with the DNS settings from interfaces, which currently we are not setting. So I think the result is that no nameserver information will be set at this point; this will remain the case until bind9 is started in rc2. Thus /etc/network/if-up.d/ntpdate will fail, because TIME.MIT.EDU cannot be resolved. (The bind9 if-up.d script does not help here, as it will only kick an already-running named).

This is also the likely cause of ntpd not having any associations on many cluster machines, as discussed recently in Zephyr; there is a race condition between the named and ntpd start-ups, so ntpd may also fail to resolve TIME.MIT.EDU.

Probably the best solution is to include a dns-nameservers line for the MIT nameservers in the /etc/network/interfaces eth0 stanza. resolvconf will configure these when the network starts, and the bind9 start-up will later reconfigure with nameserver 127.0.0.1. A quick test rebooting with this setting on a cluster machine with unsynced time seemed to DTRT.

comment:4 Changed 12 years ago by amb

  • Summary changed from Installer should force NTP sync and hwclock sync to Installer should force NTP sync and hwclock sync, and have a working ntpd afterwards as well

Probably the best solution is to include a dns-nameservers line for the MIT nameservers in the /etc/network/interfaces eth0 stanza.

Done and deployed. It seems to work; I'll close this after a bit more testing.

comment:5 Changed 12 years ago by broder

  • Status changed from accepted to closed
  • Resolution set to fixed

This fix has been verified to work. The fact that old machines are still borked is a separate issue, tracked in #332

Note: See TracTickets for help on using tickets.