Ticket #136 (closed enhancement: fixed)
Can we use pam-afs-session?
Reported by: | broder | Owned by: | broder |
---|---|---|---|
Priority: | low | Milestone: | The Distant Future |
Component: | -- | Keywords: | |
Cc: | Fixed in version: | ||
Upstream bug: |
Description
It looks like we might be able to use Russ Allbery's libpam-afs-session package to replace libpam-athena-locker. An aklog_homedir option was added early in development, which would solve the problem with libpam-openafs-session that libpam-athena-locker was written to solve.
libpam-afs-session was not made available in Dapper or Etch, but all other releases we support should have one that's new enough to work.
linerva / pag / andersk 20:42 (Anders Kaseorg) Does that work when $HOME is a symlink to AFS?
This should work, since aklog_homedir just shells out to "aklog -p $HOME", but we should test it, as well as the module in general.
It would be nice to have one fewer pieces of software that we're maintaining just for Debathena.
Attachments
Change History
comment:2 in reply to: ↑ 1 Changed 15 years ago by price
Replying to nelhage:
One other concern about libpam-openafs-session is that it doesn't appear
to have a similar hack to our sigaction() hack to work around gdm's SIGCHLD handler.
We should also make sure it works under gdm.
If this turns out to be a problem with Russ's module, we can probably work
with him to get it fixed for his too. If he's aiming for people to be able
to log in and get their AFS homedirs, it seems like he'd want it to work
under gdm.
comment:3 Changed 15 years ago by broder
It turns out that we can use this.
We need to add
auth required pam_afs_session.so
after pam_krb5.so in common-auth, and
session required pam_afs_session.so
somewhere also after pam_krb5.so in common-session.
We also need to add
[appdefaults] aklog_homedir = true
to krb5.conf
Jury's still out on the SIGCHLD issue.
comment:4 Changed 15 years ago by broder
And after configuring pam_afs_session on cluster-test, I was still able to login through GDM. What is supposed to be the issue here? Presumably if GDM was leaving a SIGCHLD handler around, the login process would hang completely - that's definitely not happening.
comment:5 Changed 15 years ago by broder
So the issue was more complex than I understood it to be (this is what you get for letting me try to decipher issues in C, guys). Turns out it's a race.
pam_afs_session forks and execs aklog, then the parent process uses waitpid() to pause until the aklog returns.
If GDM (or whatever is calling into PAM) sets a SIGCHLD handler that's not SIG_DFL, and aklog returns before waitpid is called, then waitpid either returns with ECHILD or hangs forever waiting for other children.
The relevant code snippets from pam_athena_locker are:
struct sigaction act, oldact; /* Override gdm's SIGCHLD handler that makes waitpid() return -1. Maybe this leads to some race condition if gdm used that at the time? */ memset(&act, 0, sizeof(act)); act.sa_handler = SIG_DFL; sigaction(SIGCHLD, &act, &oldact);
before the fork, and
sigaction(SIGCHLD, &oldact, NULL);
after the waitpid.
Because of both this bug, and the fact that Dapper and Etch don't have a libpam-afs-session at all, we'll have to build our own packages of the latest version if we want to switch to this.
The next step, then, is to come up with a patch which does the equivalent of the code above (possibly with a little more error handling?) and get it to Russ.
If you want to test that your patch eliminates the race condition, set the aklog command to /bin/true and sleep for a few seconds before calling waitpid.
Changed 14 years ago by broder
- attachment 0001-Reset-the-SIGCHLD-handler-while-running-aklog.patch added
comment:7 Changed 14 years ago by broder
- Status changed from new to accepted
- Owner set to broder
Here, have a patch.
When I wrote up the race condition before, I said that a custom SIGCHLD handler could cause waitpid() to either return with ECHILD immediately or hang forever waiting for other children
I get how the first one can happen, but I don't understand what scenario would result in hanging. Can anybody explain this so I can include it in my writeup? Otherwise, I'll just send the patch to Russ as-is.
Also, pam-afs-session supports reading options from the PAM arguments as well as from /etc/krb5.conf, so we can just do
auth required pam_afs_session.so aklog_homedir
in common-auth, and
session required pam_afs_session.so aklog_homedir
in common-session
comment:8 Changed 14 years ago by broder
I've attempted to reproduce this SIGCHLD bug on both Jaunty and Karmic's GDMs with the following patch:
diff --git a/tokens.c b/tokens.c index 7a523fc..e356e6f 100644 --- a/tokens.c +++ b/tokens.c @@ -164,6 +164,7 @@ pamafs_run_aklog(pam_handle_t *pamh, struct pam_args *args, struct } free(argv); pamafs_free_envlist(env); + sleep(10); if (waitpid(child, &res, 0) && WIFEXITED(res) && WEXITSTATUS(res) == 0) return PAM_SUCCESS; else
which should cause pam_afs_session to not waitpid() until after the child aklog process has already returned.
In both cases, the login was delayed for 10 seconds, but it did not error out.
How do I reproduce this bug? If nobody knows how to make it happen, I'm going to assume it doesn't actually exist anymore and move forward with replacing libpam-athena-locker with libpam-afs-session.
comment:9 Changed 14 years ago by broder
- Status changed from accepted to development
Switched to libpam-afs-session in r24443. Currently uploaded to -development, pending upgrade testing.
I checked that the versions of libpam-afs-session in all releases we support have the options we need, so we don't need to backport that ourselves.
comment:10 Changed 14 years ago by broder
- Status changed from development to proposed
Having verified that the upgrade path works, I've moved this into -proposed.
(I ran into an issue on mega-man, but that was because libpam-athena-locker had been marked as manually installed, which I'm pretty sure was my fault)
comment:11 Changed 14 years ago by broder
This seems to somehow break on cluster machines, in spite of working on non-cluster workstations. I suspect that something strange is going on with the login chroots, and my running theory involves the fact that schroot runs through the PAM stack itself, but I haven't worked out the details yet.
Until we fix that, this should NOT move to production.
comment:12 Changed 14 years ago by broder
Ok, the issue with cluster logins is now fixed in -proposed
comment:13 Changed 14 years ago by broder
- Status changed from proposed to closed
- Resolution set to fixed
Moved to production.
One other concern about libpam-openafs-session is that it doesn't appear to have a similar hack to our sigaction() hack to work around gdm's SIGCHLD handler. We should also make sure it works under gdm.