Ticket #208 (closed defect: fixed)

Opened 15 years ago

Last modified 15 years ago

graphical login doesn't deal well with being unplugged

Reported by: geofft Owned by:
Priority: normal Milestone: Fall 2009 Release
Component: -- Keywords:
Cc: Fixed in version:
Upstream bug:

Description

There are a couple of problems with machines that don't have network access and try to do network login.

First, there's no clear error explaining what's wrong when the machine isn't connected and you try to log in. One possible solution to this is a PAM module that tries to access the network and displays a fatal message if it can't, although we'd have to be very sure it doesn't have any false positives. (For instance, if some but not all of the Kerberos servers go down, we shouldn't deny login.)

Second, if the machine goes offline, it's possible for an AFS access to time out and make the AFS client sad until "fs checks" is run. Other services like zhm can also become unhappy. Again, we could hack a PAM module to address this; there might be cleaner solutions. An Xsession.d script, for instance, is slightly cleaner.

aseering on testers@:

Hey,

The DebAthena computer adjacent to M12-182-4 (it doesn't have a label

and I can't log in to check) is currently sad. Its Ethernet cable was
unplugged when I walked up to it. I plugged it back in, and tried to
log in; the login hung while trying to render my applications bar. I
killed X (ctrl-alt-bksp); the machine is now sitting at a text console.

mitchb's reply:

You didn't try rebooting it? If the network cable has been out for
a length of time, a whole bunch of things on the machine are going
to have noticed (among them, AFS, zhm, syslogd, aptitude, etc.), and while
they may recover given time, assuming that the machine will immediately
be fine upon reinserting the cable is generally not accurate.

Change History

comment:1 Changed 15 years ago by mitchb

AFS will not remain 'sad until "fs checks" is run'. The cache manager
will periodically poll servers it thinks are down (by default, every
three minutes). You can run 'fs checks' if you don't want to wait for
that. Basically, my reply to Adam was getting at the point that "it
won't work immediately, and this isn't a change, if it was disconnected
for a while; you can reboot if you don't want to wait".

But certainly agreed that a diagnostic error would be a good idea. On
Athena 9, there was a "bad" diagnostic error - "Workstation failed to
activate successfully" or something like that generally meant that
something was wrong with net. Or "no clusterinfo for (hostname you
know has clusterinfo)".

comment:2 Changed 15 years ago by xavid

This also affects the live CD, so it'd be great if the solution also had helpful messages for "you're on MITNET but haven't registered this mac address" and "you're on some wacky network that wants you to open up firefox before it'll give you proper DNS".

comment:3 Changed 15 years ago by jdreed

  • Milestone set to Fall Release

comment:4 Changed 15 years ago by geofft

One possible solution to this is a PAM module that tries to access the network and displays a fatal message if it can't, although we'd have to be very sure it doesn't have any false positives.

I'm not 100% sure about this, but I believe if our PAM stack catches the "authinfo_unavail" return from pam_krb5 separately from other failures, we can be informed of when Kerberos auth fails because there's no network (or the Kerberos realm is down) as opposed to because the username or password is wrong. Then we can use pam_echo to display an error to gdm or ssh. See scripts.mit.edu:/etc/pam.d/sshd for an example of a PAM stack that has some conditionals on the return value of a PAM module and uses pam_echo to display useful messages.

This would let us, first, display an error exactly when Kerberos auth failed for lack of network (rather than guessing about why login failed), and second, avoid writing another PAM module.

comment:5 Changed 15 years ago by broder

I came up with the necessary PAM configuration change last night to do this, both for pre- and post-Intrepid systems.

However, the pam_echo error message is displayed in the pam-message greeter box, not the pam-error one, so we need to un-hide that and make it look good. We also need text for the error message.

Here are the two PAM patches, for reference:

dr-wily:~/sipb/src/athena/debathena/config broder$ svn diff
Index: pam-config/debian/transform_common-auth.debathena
===================================================================
--- pam-config/debian/transform_common-auth.debathena	(revision 23984)
+++ pam-config/debian/transform_common-auth.debathena	(working copy)
@@ -1,2 +1,2 @@
 #!/usr/bin/perl -0p
-s/^(auth[ \t]+)(required|requisite)( ? ?)([ \t]+)(pam_unix\.so([ \t]+.*)?)\n/$1sufficient$4$5\n$1required$3$4pam_krb5.so minimum_uid=1 use_first_pass\n/m or die;
+s/^(auth[ \t]+)(required|requisite)( ? ?)([ \t]+)(pam_unix\.so([ \t]+.*)?)\n/$1sufficient$4$5\n$1\[success=done authinfo_unavail=ignore default=die\]$3$4pam_krb5.so minimum_uid=1 use_first_pass\n$1\[default=die\]$3$4pam_echo.so file=\/etc\/issue.net.no_network\n/m or die;
Index: libpam-krb5-config/debian/libpam-krb5-config.pam-config
===================================================================
--- libpam-krb5-config/debian/libpam-krb5-config.pam-config	(revision 23984)
+++ libpam-krb5-config/debian/libpam-krb5-config.pam-config	(working copy)
@@ -3,9 +3,11 @@
 Priority: 128
 Auth-Type: Primary
 Auth-Initial:
-	[success=end default=ignore]	pam_krb5.so minimum_uid=1
+	[success=end authinfo_unavail=ignore default=1]	pam_krb5.so minimum_uid=1
+	[default=die] pam_echo file=/etc/issue.net.no_network
 Auth:
-	[success=end default=ignore]	pam_krb5.so minimum_uid=1 use_first_pass
+	[success=end authinfo_unavail=ignore default=1]	pam_krb5.so minimum_uid=1 use_first_pass
+	[default=die] pam_echo file=/etc/issue.net.no_network
 Account-Type: Primary
 Account:
 	[success=end default=ignore]	pam_krb5.so minimum_uid=1

comment:6 Changed 15 years ago by broder

Oh - I should note that the change to the debathena-pam-config configuration (which applies for Hardy and Debian machines) did not directly translate "required" into the more verbose syntax, because I would need to skip the pam_echo. If people are concerned that I'm assuming the pam_krb5 is the bottom, I can probably come up with a complicated chain of modules to do it more correctly, but I think it's a fair assumption.

I also added the missing .so suffix to the pam_echo lines in libpam-krb5-config in my working copy.

comment:7 follow-up: ↓ 8 Changed 15 years ago by broder

  • Status changed from new to development

I fixed debathena-pam-config and libpam-krb5-config to add these pam_echo lines in r23992, and I've uploaded the packages to development for preliminary testing.

Unfortunately, because the /etc/issue.net.no_network file has to go somewhere, pam-config and libpam-krb5-config now depend on each other, and circular dependencies make the baby aptitude cry. Specifically, it will only update pam-config, and then require another round of updates to update libpam-krb5-config. This seems distinctly suboptimal.

The most common solution to circular dependencies is rolling the two separate binary packages into a single package, which seems like it might work well here. In fact, I'm somewhat curious as to why libpam-krb5-config was created as a separate package in the first place, and if we still think that's a good idea.

If nobody has strong feelings on the matter, I'll plan to roll libpam-krb5-config into debathena-pam-config and have the latter Replace the former.

comment:8 in reply to: ↑ 7 ; follow-up: ↓ 9 Changed 15 years ago by price

Replying to broder:

The most common solution to circular dependencies is rolling the two
separate binary packages into a single package, which seems like it
might work well here. In fact, I'm somewhat curious as to why
libpam-krb5-config was created as a separate package in the first place,
and if we still think that's a good idea.

It was created separately because it fills in what ultimately needed to be a piece of libpam-krb5 upstream, not Debathena-specific. Only in Jaunty did an upstream version appear. It now looks more like something that belongs in a package like debathena-*-config.

This latest change sounds appropriate for any system, so you might try sending it to Russ and/or Steve Langasek for upstream.

Greg

comment:9 in reply to: ↑ 8 Changed 15 years ago by broder

Replying to price:

This latest change sounds appropriate for any system, so you might try sending it to Russ and/or Steve Langasek for upstream.

Sure, fair enough; I'll submit a patch at some point. Although this won't eliminate the need for our own configuration; the file that comes with the upstream libpam-krb5 still has minimum_uid=1000 set in order to avoid conflicting with local accounts, which won't work for us.

comment:10 Changed 15 years ago by broder

So price and I finally tracked down a bug in pam-auth-update that was leading to prompts on upgrades of my debathena-pam-config package. I've worked around the bug in r24102 and uploaded that to development - before I move it into proposed, it would be nice to know if it actually does anything useful.

comment:11 Changed 15 years ago by broder

  • Status changed from development to proposed

I've moved this change into -proposed so that we can do a better job of testing it.

In particular, I'm curious as to whether this change is sufficient to cause gdm to display the pam_echo message instead of simply erroring out with some kind of "username does not exist" error. /bin/login will display the message; su will just error out.

Before we get there, though, I'm not sure that the pam_echo message will be displayed at all, because our gdm theme hides the item with the pam-message id, which I think is where this would get displayed. Can I just move that to right after our pam-error block and give it the same <pos /> value?

It's easy to test if the message works without worrying about whether gdm gets that far by only unplugging the network cable after the password prompt shows up.

comment:12 Changed 15 years ago by broder

  • Status changed from proposed to closed
  • Resolution set to fixed

I moved the PAM config change into production. I'm pretty sure I'm OK with AFS possibly being slow the first time you login after you've just plugged the network connection back in, so I think we're done here.

Note: See TracTickets for help on using tickets.