Ticket #493 (closed defect: fixed)

Opened 12 years ago

Last modified 12 years ago

/proc/mounts! It's over sixty-five thousand!

Reported by: geofft Owned by:
Priority: blocker Milestone:
Component: -- Keywords:
Cc: Fixed in version:
Upstream bug:

Description

I logged in to a cluster machine today. /proc/mounts had 65571 entries in it, including 63356 instances of what appear to be bind-mounts of /media. My xterm that I launch from .startup.X started after about two minutes at a black screen. GNOME took a while longer, and I noticed this issue because stracing gnome-terminal indicated it was trying to read /proc/mounts and that was taking a long time. I think this is what people are reporting when they say some cluster machines take them several minutes to log in...

See  /mit/geofft/Public/proc-mounts-uniq-c for the output of cat /proc/mounts | uniq -c and  /mit/geofft/Public/debathena-over-65000 for the relevant kernel logs wherein I pressed alt-sysrq-T and -W a bunch.

Change History

comment:1 Changed 12 years ago by jdreed

It's not quite clear when it starts doubling and when it starts simply adding an extra /media line

/mit/jdreed/Public/debathena contains copies of /proc/mounts from both inside and outside a chroot.

However, on the previous machine, I saw /proc/mounts double from 4k to 8k to 16k after each login, so it's unclear at what point that begins happening.

comment:2 Changed 12 years ago by jdreed

Unmounting /media in /usr/lib/debathena-reactivate/reactivate "fixes" the problem.

Rather than preparing /media in the init script, should we move this:

        # Enable subtree operations on /media by making it a mount point,
        # then share it.
        if ! mountpoint -q /media; then
            mount --bind /media /media
            mount --make-shared /media
        fi

into snapshot-run, and then in "reactivate", we can just unmount /media (keeping in mind we may need to do so multiple times. And "mountpoint -q" says /media is not a mountpoint when it's bind-mounted, so I don't know what a good way to check is, other than "grep -iq media /proc/mounts | wc -l".

I like this idea better in general, since all preparations for the chroot happen before it's run, not at boot time.

comment:3 Changed 12 years ago by jdreed

At release-team, it was suggested that putting a tmpfs on /media instead of bind-mounting it to itself would be an improvement. It is not.

We should just umount everything under /media (and /media itself) at logout time, and re-mount and re-share it, and move on.

comment:4 Changed 12 years ago by kcarnold

quickstation-2 just took 2+ minutes to login, fans roaring. It was this problem. (See /mit/kcarnold/Public/mounts-uniq-c, but it's like geofft's.)

comment:5 Changed 12 years ago by rbasch

The reason the /media mounts double is in fact that the mountpoint test in the init script (which is invoked at the end of a session) doesn't work for a bind-mount. So the bind-mount is repeated after each session, and, since we set the mountpoint as shared, that is also propagated to the peer.

I addressed this in r24332 and r24333, by parsing "mount" output in the init script to determine whether the bind-mount has been done; this is now in -proposed (debathena-reactivate 2.0.8). Affected machines will need to be rebooted to clear out their mount tables.

comment:6 Changed 12 years ago by broder

  • Status changed from new to proposed

I updated the postinst to call /usr/share/update-notifier/notify-reboot-required, which should trigger a reboot to cleanup the mount tables.

comment:7 Changed 12 years ago by jdreed

  • Status changed from proposed to closed
  • Resolution set to fixed

Looks like this made it into production already.

Note: See TracTickets for help on using tickets.