Ticket #136 (closed enhancement: fixed)

Opened 14 years ago

Last modified 12 years ago

Can we use pam-afs-session?

Reported by: broder Owned by: broder
Priority: low Milestone: The Distant Future
Component: -- Keywords:
Cc: Fixed in version:
Upstream bug:


It looks like we might be able to use Russ Allbery's libpam-afs-session package to replace libpam-athena-locker. An aklog_homedir option was added early in development, which would solve the problem with libpam-openafs-session that libpam-athena-locker was written to solve.

libpam-afs-session was not made available in Dapper or Etch, but all other releases we support should have one that's new enough to work.

   linerva / pag / andersk  20:42  (Anders Kaseorg)
       Does that work when $HOME is a symlink to AFS?

This should work, since aklog_homedir just shells out to "aklog -p $HOME", but we should test it, as well as the module in general.

It would be nice to have one fewer pieces of software that we're maintaining just for Debathena.


0001-Reset-the-SIGCHLD-handler-while-running-aklog.patch Download (2.8 KB) - added by broder 13 years ago.

Change History

comment:1 follow-up: ↓ 2 Changed 14 years ago by nelhage

One other concern about libpam-openafs-session is that it doesn't appear to have a similar hack to our sigaction() hack to work around gdm's SIGCHLD handler. We should also make sure it works under gdm.

comment:2 in reply to: ↑ 1 Changed 14 years ago by price

Replying to nelhage:

One other concern about libpam-openafs-session is that it doesn't appear
to have a similar hack to our sigaction() hack to work around gdm's SIGCHLD handler.
We should also make sure it works under gdm.

If this turns out to be a problem with Russ's module, we can probably work
with him to get it fixed for his too. If he's aiming for people to be able
to log in and get their AFS homedirs, it seems like he'd want it to work
under gdm.

comment:3 Changed 14 years ago by broder

It turns out that we can use this.

We need to add

auth    required        pam_afs_session.so

after pam_krb5.so in common-auth, and

session required        pam_afs_session.so

somewhere also after pam_krb5.so in common-session.

We also need to add

        aklog_homedir = true

to krb5.conf

Jury's still out on the SIGCHLD issue.

comment:4 Changed 14 years ago by broder

And after configuring pam_afs_session on cluster-test, I was still able to login through GDM. What is supposed to be the issue here? Presumably if GDM was leaving a SIGCHLD handler around, the login process would hang completely - that's definitely not happening.

comment:5 Changed 14 years ago by broder

So the issue was more complex than I understood it to be (this is what you get for letting me try to decipher issues in C, guys). Turns out it's a race.

pam_afs_session forks and execs aklog, then the parent process uses waitpid() to pause until the aklog returns.

If GDM (or whatever is calling into PAM) sets a SIGCHLD handler that's not SIG_DFL, and aklog returns before waitpid is called, then waitpid either returns with ECHILD or hangs forever waiting for other children.

The relevant code snippets from pam_athena_locker are:

    struct sigaction act, oldact;
    /* Override gdm's SIGCHLD handler that makes waitpid() return -1.
       Maybe this leads to some race condition if gdm used that at the time? */
    memset(&act, 0, sizeof(act));
    act.sa_handler = SIG_DFL;
    sigaction(SIGCHLD, &act, &oldact);

before the fork, and

    sigaction(SIGCHLD, &oldact, NULL);

after the waitpid.

Because of both this bug, and the fact that Dapper and Etch don't have a libpam-afs-session at all, we'll have to build our own packages of the latest version if we want to switch to this.

The next step, then, is to come up with a patch which does the equivalent of the code above (possibly with a little more error handling?) and get it to Russ.

If you want to test that your patch eliminates the race condition, set the aklog command to /bin/true and sleep for a few seconds before calling waitpid.

comment:6 Changed 13 years ago by jdreed

  • Component set to --
  • Milestone set to The Distant Future

comment:7 Changed 13 years ago by broder

  • Status changed from new to accepted
  • Owner set to broder

Here, have a patch.

When I wrote up the race condition before, I said that a custom SIGCHLD handler could cause waitpid() to either return with ECHILD immediately or hang forever waiting for other children

I get how the first one can happen, but I don't understand what scenario would result in hanging. Can anybody explain this so I can include it in my writeup? Otherwise, I'll just send the patch to Russ as-is.

Also, pam-afs-session supports reading options from the PAM arguments as well as from /etc/krb5.conf, so we can just do

auth    required        pam_afs_session.so aklog_homedir

in common-auth, and

session required        pam_afs_session.so aklog_homedir

in common-session

comment:8 Changed 13 years ago by broder

I've attempted to reproduce this SIGCHLD bug on both Jaunty and Karmic's GDMs with the following patch:

diff --git a/tokens.c b/tokens.c
index 7a523fc..e356e6f 100644
--- a/tokens.c
+++ b/tokens.c
@@ -164,6 +164,7 @@ pamafs_run_aklog(pam_handle_t *pamh, struct pam_args *args, struct 
+    sleep(10);
     if (waitpid(child, &res, 0) && WIFEXITED(res) && WEXITSTATUS(res) == 0)
         return PAM_SUCCESS;

which should cause pam_afs_session to not waitpid() until after the child aklog process has already returned.

In both cases, the login was delayed for 10 seconds, but it did not error out.

How do I reproduce this bug? If nobody knows how to make it happen, I'm going to assume it doesn't actually exist anymore and move forward with replacing libpam-athena-locker with libpam-afs-session.

comment:9 Changed 12 years ago by broder

  • Status changed from accepted to development

Switched to libpam-afs-session in r24443. Currently uploaded to -development, pending upgrade testing.

I checked that the versions of libpam-afs-session in all releases we support have the options we need, so we don't need to backport that ourselves.

comment:10 Changed 12 years ago by broder

  • Status changed from development to proposed

Having verified that the upgrade path works, I've moved this into -proposed.

(I ran into an issue on mega-man, but that was because libpam-athena-locker had been marked as manually installed, which I'm pretty sure was my fault)

comment:11 Changed 12 years ago by broder

This seems to somehow break on cluster machines, in spite of working on non-cluster workstations. I suspect that something strange is going on with the login chroots, and my running theory involves the fact that schroot runs through the PAM stack itself, but I haven't worked out the details yet.

Until we fix that, this should NOT move to production.

comment:12 Changed 12 years ago by broder

Ok, the issue with cluster logins is now fixed in -proposed

comment:13 Changed 12 years ago by broder

  • Status changed from proposed to closed
  • Resolution set to fixed

Moved to production.

Note: See TracTickets for help on using tickets.