Ticket #165 (closed defect: wontfix)
HPET clock is broken on Dell 760 Hardware
Reported by: | wdc | Owned by: | |
---|---|---|---|
Priority: | blocker | Milestone: | Summer 2009 Deployment |
Component: | -- | Keywords: | |
Cc: | Fixed in version: | ||
Upstream bug: | LP:348694 Kernel:13053 |
Description (last modified by wdc) (diff)
When you boot from older Intrepid media, you see an spew of kernel oops messages. The apparent work-around for this is to power cycle the machine.
A one-line change to print the error message only once is incorporated into the Intrepid update kernel, so this symptom only affects systems lacking the update kernel.
The warning is reporting a fault in the HPET clock which *IS* real.
It takes forever for the system to boot up, and logging in takes rather long. The work around is to boot with the option acpi=off.
When you boot Jaunty, even from media as recent as 3/24, you get a blank screen unless you use the boot option acpi=off.
We need to find out where the fault lies and get it fixed.
Attachments
Change History
comment:2 Changed 16 years ago by wdc
- Description modified (diff)
- Summary changed from 2.6.26 kernel does not get along with Dell 760 Hardware to HPET clock is broken on Dell 760 Hardware
Revised description to reflect current understanding of the problem.
Changed 16 years ago by wdc
- attachment 760-dmesg-oops.log added
dmesg output from debathena-cluster system with normal code path. acpi enabled, slow response. Notice the oops.
Changed 16 years ago by wdc
- attachment 760-dmesg-apic-off.log added
Same system booted with "acpi=off"
comment:3 Changed 16 years ago by wdc
Note! Attachment named "760-dmesg-apic-off.log" is named incorrectly.
It is "acpi=off" NOT apic that was set.
Changed 16 years ago by wdc
- attachment 760-dmesg-pit.log added
dmesg when clocksource=pit. Oddly, it complains of an hpet problem!
comment:4 Changed 16 years ago by wdc
With hpet=disable /proc/interrupts looks like this:
wdc-ubuntu-test% more /proc/interrupts
CPU0 CPU1
0: 41858 43789 IO-APIC-edge timer
1: 1 1 IO-APIC-edge i8042
7: 0 0 IO-APIC-edge parport0
8: 1 1 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 2 2 IO-APIC-edge i8042
16: 806 730 IO-APIC-fasteoi uhci_hcd:usb1, HDA Intel
17: 708 608 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb6
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb7
22: 2 1 IO-APIC-fasteoi uhci_hcd:usb3, ehci_hcd:usb4
23: 0 0 IO-APIC-fasteoi uhci_hcd:usb5, ehci_hcd:usb8
219: 5348 5124 PCI-MSI-edge eth0
220: 11135 9240 PCI-MSI-edge ahci
NMI: 0 0 Non-maskable interrupts
LOC: 11293 16932 Local timer interrupts
RES: 13002 13790 Rescheduling interrupts
CAL: 305 195 function call interrupts
TLB: 105 87 TLB shootdowns
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
comment:5 Changed 16 years ago by wdc
With acpi=off /proc/interrupts looks like this:
wdc-ubuntu-test% cat /proc/interrupts CPU0 0: 70 IO-APIC-edge timer 1: 2 IO-APIC-edge i8042 2: 0 XT-PIC-XT cascade 4: 2 IO-APIC-edge 8: 2 IO-APIC-edge rtc0 12: 4 IO-APIC-edge i8042 16: 1467 IO-APIC-fasteoi uhci_hcd:usb1, HDA Intel 17: 485 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb6 18: 0 IO-APIC-fasteoi uhci_hcd:usb7 22: 3 IO-APIC-fasteoi uhci_hcd:usb3, ehci_hcd:usb4 23: 0 IO-APIC-fasteoi uhci_hcd:usb5, ehci_hcd:usb8 219: 5859 PCI-MSI-edge eth0 220: 17928 PCI-MSI-edge ahci NMI: 0 Non-maskable interrupts LOC: 31190 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 function call interrupts TLB: 0 TLB shootdowns SPU: 0 Spurious interrupts ERR: 0 MIS: 0
comment:6 Changed 16 years ago by wdc
Further testing under Jaunty gives new insight:
Setting "hpet=disable" results in a working jaunty system.
With the default boot options, what's happening is that the kernel never sees the clock interrupts. You can simulate interrupts by hitting the power switch until it configures the keyboard, then hitting a key until it starts gdm and configures the mouse. After that, the system runs just fine as long as you wiggle the mouse to create interrupts.
comment:8 Changed 16 years ago by wdc
We have more insight:
The root cause seems to be a bad interaction with the enablement of
"C States" in the "Performance" BIOS menu. This option is only available for high end processors, and appears to have gotten turned on for MIT machines because we wanted the maximum Energy savings.
Clearing this bit is a work-around for the problem.
comment:9 Changed 16 years ago by wdc
In case people are not following the LP, there is significant status update here.
- BZ 13053 has been opened with kernel.org.
http://bugzilla.kernel.org/show_bug.cgi?id=13053
- Setting boot option: acpi_skip_timer_override seems right now to be the smallest footprint work-around for the problem.
- Because of what acpi_skip_timer_override does (enable a legacy IRQ override), we are asking, "Is this a BIOS bug."
For now, setting acpi_skip_timer_override enables the C States BIOS option to do its work, and for interrupts to get to the CPU so that everything runs, even with the HPET clock.
comment:11 Changed 16 years ago by broder
- Upstream bug set to https://bugs.launchpad.net/bugs/348694, http://bugzilla.kernel.org/show_bug.cgi?id=13053
comment:12 Changed 16 years ago by broder
- Upstream bug changed from https://bugs.launchpad.net/bugs/348694, http://bugzilla.kernel.org/show_bug.cgi?id=13053 to LP:348694 Kernel:13053
comment:13 Changed 16 years ago by geofft
- Status changed from new to closed
- Resolution set to wontfix
We won't see any progress on this by the first summer deployment date (Friday), but Hotline has decided to just disable the "C States" in the BIOS until we hear more.
comment:14 Changed 15 years ago by wdc
On June 17, 2009 the updates (rev A03) BIOS became publicly available at:
comment:15 Changed 15 years ago by wdc
I have installed the A03 BIOS on my test system and confirmed that the Jaunty Live CD
functions properly even when C States is enabled. With this sanity check test, we now have a bona fide fix in BIOS rev A03.
This turns out to be more than a one-line kernel change.
The kernel change makes the endless spew stop, but does not fix HPET.
The manifestation of the problem is more serious under Jaunty.
If you choose, "Try ubuntu..." from the live CD, you get a blank screen.
Under both Intrepid and Jaunty, you get much better operation if you set "ACPI=off"
I have opened LP 348694
https://bugs.launchpad.net/bugs/348694
on this issue.