Ticket #1020 (accepted defect) — at Version 9
aptitude sometimes spins forever when in --download-only mode
Reported by: | jdreed | Owned by: | jdreed |
---|---|---|---|
Priority: | high | Milestone: | Upstream Utopia |
Component: | -- | Keywords: | |
Cc: | Fixed in version: | ||
Upstream bug: | LP:975793 DebianBug:629266 |
Description (last modified by jdreed) (diff)
It may be relevant that when the problem does occur, it's always in the second invocation, when there aren't actually any files to download.
Change History
comment:2 Changed 13 years ago by jdreed
I forgot that granola is running sshd. Backtracing the wedged aptitude, which is
5661 ? Sl 64:06 aptitude --quiet --assume-yes --download-only dist-upgrade
#0 0x00007fbe3d2d981d in __libc_waitpid (pid=<value optimized out>, stat_loc=<value optimized out>, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:41 #1 0x00007fbe3e7b9c63 in ExecWait(int, char const*, bool) () from /usr/lib/libapt-pkg.so.4.10 #2 0x00007fbe3e83c8be in pkgDPkgPM::RunScriptsWithPkgs(char const*) () from /usr/lib/libapt-pkg.so.4.10 #3 0x00007fbe3e844b05 in pkgDPkgPM::Go(int) () from /usr/lib/libapt-pkg.so.4.10 #4 0x00007fbe3e7d6f85 in pkgPackageManager::DoInstallPostFork(int) () from /usr/lib/libapt-pkg.so.4.10
comment:3 Changed 13 years ago by jdreed
Er, sorry, there's also:
31328 ? S 0:00 /bin/sh -c /usr/sbin/dpkg-preconfigure --apt || true 31329 ? R 0:00 /usr/bin/perl -w /usr/sbin/dpkg-preconfigure --apt
Looks like dpkg-preconfigure is been repeatedly called and failing. Over the past minute, I've seen at least 10 processes similar to the ones above. There's only ever one set in ps output, but they're appearing, terminating, and respawning, AFAICT. They do so fast enough that I can't even attach gdb in time. Anyone debugging should repeatedly run "ps auxww" a few times, grepping for dpkg, and you'll see them.
comment:4 Changed 13 years ago by jdreed
- Summary changed from cron job to ensure auto-update doesn't get wedged to auto-update sits at "Writing extended state info"
comment:5 Changed 13 years ago by jdreed
- Priority changed from high to blocker
- Milestone changed from Fall 2011 to Natty Release
This is actually a release blocker, since the machines can't be fixed without intervention, and neither I nor hotline will be visiting every single cluster machine again. If we don't have a solution tomorrow, I propose we push out the release anyway, with the following addition to auto-update that gets dropped into cron.hourly:
#!/bin/bash UPD_START=$(stat -c "%Y" /var/run/athena-nologin 2>/dev/null) [ -z "$UPD_START" ] && exit 0 NOW=$(date +"%s") ELAPSED=$(expr $NOW - $UPD_START) if [ $ELAPSED -gt 3600 ]; then pkill -f athena-auto-update # (or maybe just reboot?) fi exit 0
comment:6 Changed 13 years ago by jdreed
Er, maybe add
[ "$(machtype -L)" = "debathena-cluster" ] || exit 0
at the top there, depending on whether we pkill or reboot. (Or maybe regardless?)
I tested killing the proc on w20-575-2 when it was wedged, and rebooting is fine, since the "aptitude install" stage of auto-update will get things going again on the next invocation.
comment:7 Changed 13 years ago by jdreed
- Owner set to jdreed
- Status changed from new to accepted
Geoff identified the code that breaks, but we still don't know why it gets called.
A horrible hack was committed and pushed out in auto-update 1.31
comment:8 Changed 13 years ago by jdreed
Fixed less stupidly and more functionally in auto-update 1.32, which just got pushed out. Keeping this open until we have a fix for the actual bug. Geoff notes that this is DebianBug:629266, and I concur.
auto-udpate is now wedged on granola in a similar state
In each case, it fails inside "aptitude --quiet --assume-yes --download-only dist-upgrade at
Writing extended state information....
In this case, it merely wants to upgrade gdm-config, which shouldn't be a hard transaction. This possibly points to an internal error in aptitude, especially since we're just asking it to download, which is not a hard operation.
/mit/jdreed/Public/granola-update.log for what it looks like right now.