Ticket #133 (new enhancement)

Opened 13 years ago

Last modified 8 years ago

debathena-zephyr-config should kill off zhm initscript

Reported by: broder Owned by:
Priority: normal Milestone: Current Semester
Component: -- Keywords:
Cc: Fixed in version:
Upstream bug:

Description

Since zhm doesn't handle not having networking very well, instead of hacking an if-up.d script into place, Karl suggests that we disable his initscript entirely on systems pre-dating squeeze, and instead provide if-up.d and if-down.d scripts that start and stop zhm.

We should be sure to get the patch to Karl, as well, so he can include it in squeeze.

Change History

comment:1 Changed 12 years ago by broder

  • Priority changed from major to minor
  • Component set to --
  • Milestone set to The Distant Future

comment:2 Changed 11 years ago by jdreed

  • Milestone changed from The Distant Future to Fall 2010

We should maybe look at this sooner rather than later? I ended up with mmanley's "two zhms" problem today on Lucid (-workstation). (See the 3/17 zlogs, currently debathena.172) It was fine on Friday, but it did take some updates over the weekend and rebooted. Of course, now that I've added some logging, I can't reproduce it. But there is at least one other workstation in N42 experiencing these transient symptoms.

I'll also note that the "restart" option to the initscript causes zhm to be restarted with the "-N" option, which does not exist according to the man page. So the initscript is vaguely broken anwyay:

jdreed@INFINITE-LOOP:~$ sudo /etc/init.d/zhm stop
Stopping zephyr host manager: zhm.-
jdreed@INFINITE-LOOP:~$ ps -ef | grep -i zhm
jdreed    2555  2035  0 10:02 pts/0    00:00:00 grep -i zhm
jdreed@INFINITE-LOOP:~$ sudo /etc/init.d/zhm start
Starting zephyr host manager: zhm.
jdreed@INFINITE-LOOP:~$ ps -ef | grep -i zhm
root      2562     1  0 10:02 ?        00:00:00 /usr/sbin/zhm -f
jdreed    2565  2035  0 10:02 pts/0    00:00:00 grep -i zhm
jdreed@INFINITE-LOOP:~$ sudo /etc/init.d/zhm restart
Restarting zephyr host manager: zhm.
jdreed@INFINITE-LOOP:~$ ps -ef | grep -i zhm
root      2573     1  0 10:02 ?        00:00:00 /usr/sbin/zhm -N -f
jdreed    2575  2035  0 10:02 pts/0    00:00:00 grep -i zhm
jdreed@INFINITE-LOOP:~$

comment:3 Changed 11 years ago by jdreed

  • Priority changed from minor to major

I encountered this on a cluster machine too, which is a first. There were two zhm processes, 2018 and 2019. 2018 was the one referenced in /var/run/zhm.pid. So there's clearly a race condition here, and I think I blame start-stop-daemon, but I'm not sure.

We should pursue this upstream, but as a short term fix, about a "sleep 1" in /etc/network/if-up.d/debathena-zephyr-config, before it restarts zhm?

comment:4 Changed 11 years ago by jdreed

See also #746

comment:5 Changed 10 years ago by geofft

  • Status changed from new to accepted
  • Owner set to geofft

comment:6 Changed 9 years ago by jdreed

So, uh, what's the upstream status of this? It references doing stuff for Squeeze, which was release a while ago.

comment:7 Changed 8 years ago by geofft

  • Status changed from accepted to new
  • Owner geofft deleted
Note: See TracTickets for help on using tickets.