Ticket #905 (closed enhancement: ignored)
Investigate fglrx
Reported by: | jdreed | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | Current Semester |
Component: | -- | Keywords: | |
Cc: | Fixed in version: | ||
Upstream bug: |
Description
Investigate whether fglrx is a good idea and maintainable in the cluster environment. In particular, we care about robustness in the face of kernel upgrades (dkms is ideally better than some sketchy ATI script).
Change History
comment:2 Changed 13 years ago by geofft
Saw this today on thornhump, a Dell 790 using Natty's fglrx:
[623243.846343] [fglrx] ASIC hang happened [623243.846346] Pid: 3989, comm: Xorg Tainted: P W 2.6.38-10-generic #46-Ubuntu [623243.846348] Call Trace: [623243.846371] [<ffffffffa053cd4e>] ? KCL_DEBUG_OsDump+0xe/0x10 [fglrx] [623243.846388] [<ffffffffa054a14c>] ? firegl_hardwareHangRecovery+0x1c/0x50 [fglrx] [623243.846419] [<ffffffffa05cb619>] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx] [623243.846448] [<ffffffffa05cb5cc>] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x6c/0xb0 [fglrx] [623243.846477] [<ffffffffa05c9c74>] ? _ZN15ExecutableUnits10CPRingIdleE15idle_WaitMethod12_QS_CP_RING_+0xe4/0x1a0 [fglrx] [623243.846510] [<ffffffffa05d49c7>] ? _ZN11AsicCypress21initializeMicroEngineEv+0x147/0x160 [fglrx] [623243.846539] [<ffffffffa05c9b3b>] ? _ZN15ExecutableUnits7PM4idleE15idle_WaitMethod+0x4b/0x90 [fglrx] [623243.846567] [<ffffffffa05c9826>] ? _ZN15ExecutableUnits9assertPM4Eb+0x56/0x70 [fglrx] [623243.846596] [<ffffffffa05d1920>] ? _ZN8AsicR6009assertPM4Eb+0x40/0x70 [fglrx] [623243.846622] [<ffffffffa05a6e4a>] ? CMMQS_Initialize_WA+0x14a/0x170 [fglrx] [623243.846641] [<ffffffffa05671ad>] ? firegl_cmmqs_init+0x56d/0xa80 [fglrx] [623243.846656] [<ffffffffa05422b2>] ? firegl_addmap+0x4b2/0x870 [fglrx] [623243.846675] [<ffffffffa0566688>] ? firegl_cmmqs_createdriver+0x48/0x130 [fglrx] [623243.846679] [<ffffffff81279db9>] ? security_capable+0x29/0x30 [623243.846697] [<ffffffffa0566640>] ? firegl_cmmqs_createdriver+0x0/0x130 [fglrx] [623243.846712] [<ffffffffa0545d9a>] ? firegl_ioctl+0x1ea/0x250 [fglrx] [623243.846723] [<ffffffffa0536d7e>] ? ip_firegl_unlocked_ioctl+0xe/0x20 [fglrx] [623243.846728] [<ffffffff811764cf>] ? do_vfs_ioctl+0x8f/0x360 [623243.846731] [<ffffffff81164e73>] ? vfs_write+0x123/0x180 [623243.846734] [<ffffffff81176831>] ? sys_ioctl+0x91/0xa0 [623243.846737] [<ffffffff8100c002>] ? system_call_fastpath+0x16/0x1b
The previous warnings seem to be {{{
[623034.464127] [fglrx] ACPI is disabled on this system
[623034.563758] WARNING: at /build/buildd/linux-2.6.38/drivers/pci/msi.c:685 pci_enable_msi_block+0xc8/0xe0()
[623063.874197] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[623063.874214] WARNING: at /build/buildd/linux-2.6.38/drivers/gpu/drm/radeon/radeon_fence.c:248 radeon_fence_wait+0x36f/0x3e0 [radeon]()
[623063.874217] GPU lockup (waiting for 0x00402B25 last fence id 0x00402B24)
}}}
"ACPI is disabled on this system" is suspicious -- this is our only system that took the noacpi hack, I think. I wonder if this happens if that hack isn't in place.
comment:3 Changed 12 years ago by jdreed
- Status changed from new to closed
- Resolution set to ignored
I think we've sufficiently stopped caring. We have demonstrated that fglrx "just installs" these days. If we get to the point where we need it on cluster, we should enable it, but this ticket is sufficiently vague as to not be useful.
Not quite sure what you're asking, but fglrx on Natty at least uses DKMS.
etc.