Ticket #928 (closed defect: fixed)

Opened 10 years ago

Last modified 10 years ago

cluster logins don't get tokens (and fail) on natty

Reported by: kaduk Owned by:
Priority: blocker Milestone: Natty Beta
Component: -- Keywords:
Cc: Fixed in version:
Upstream bug:

Description

In a stock install of cluster on natty, logins fail.
This seems to be largely because when we run the session in the schroot, schroot's pam stack gets run, and it gives the user a new PAG (and keyring entry), but no tokens. Without tokens, access to the homedir fails, so the login bails pretty quickly.

Russ said that this may be because KRB5CCNAME is not in the PAM environment. However, the suggested possible workaround of adding always_aklog to the pam_afs_session arguments did not help. This may be because schroot is not fully running the session stack?

It will probably be useful to instrument a cluster login on lucid and see what has changed, so as to have a better sense of where to poke at to fix things.

Change History

comment:1 Changed 10 years ago by jdreed

  • Priority changed from normal to blocker
  • Milestone changed from The Distant Future to Natty Beta

comment:2 Changed 10 years ago by kaduk

I got to poke at a lucid cluster machine today; it also gets a new keyring entry and PAG in the chroot. However, KRB5CCNAME is passed into the pam environment for schroot's pam stack, which allows pam_afs_session to actually obtain tokens. (I am slightly confused that I only get athena-cell tokens for the 'sipbtest' user, since the 'always_aklog' option is passed to pam_afs_session in pam.d/common-session, and I want to say that this had gotten me tokens for all cells listed in .xlog on lola-granola. Could be the newer pam_afs_session, I suppose.

So, we need to figure out why pam_krb5 (?) is not introducing KRB5CCNAME into the pam environment, though I fear this may be a "feature" of an updated PAM.

comment:3 Changed 10 years ago by kaduk

Following a tip from Russ, libpam-afs-session_2.4-1 seems to provide a workaround, by always checking the full environment for KRB5CCNAME rather than just checking the PAM environment.

comment:4 follow-up: ↓ 5 Changed 10 years ago by kaduk

I was being silly -- the

D(2): pam_putenv: set KRB5CCNAME=FILE:/tmp/krb5cc_20922_JIjcZi

line I pasted to zephyr that shows KRB5CCNAME being set in the PAM environment on a lucid cluster machine is clearly debugging output from schroot, where schroot itself is setting up a pam environment. In natty's schroot (1.4.17), this is a "minimal environment" which includes just HOME, LOGNAME, PATH, SHELL, and USER; the version 1.4.0 in lucid must have been less minimal, no matter the lack of a changelog entry.
So, we can blame the schroot version for the source of the problem, but will probably still need to use the workaround of inserting KRB5CCNAME into the PAM environment ourself.

comment:5 in reply to: ↑ 4 Changed 10 years ago by jdreed

Replying to kaduk:

So, we can blame the schroot version for the source of the problem, but will probably still need to use the workaround of inserting KRB5CCNAME into the PAM environment ourself.

I naively tried this using /etc/security/pam_env.conf and could not convince PAM to copy KRB5CCNAME into the environment. I did succeed in setting KRBNAME=$KRB5CCNAME in snapshot-run, and then setting KRB5CCNAME=$KRBNAME in pam_env.conf, but I still don't end up with tokens. Not sure what I'm missing. And the "debug" arg for pam_env appears to be full of lies. Adding an explicit aklog in Xsession.debathena-orig works, of course, but is wrong.

comment:6 Changed 10 years ago by jdreed

The consensus on zephyr is that we should just debathenify libpam-afs-session and add the code to look in the environment for KRB5CCNAME. Before I put effort into that, I'd love it if someone double-checked to make sure we can't do this with pam_env.

comment:7 Changed 10 years ago by jdreed

  • Status changed from new to committed

Committed a hack for this in r25223. Barring a better idea in bounded time, we should move forward with this. It's unclear what we should conditionalize on (if anything). Certainly this will work on Lucid as well. I'm not interested in hearing about distributions other than Lucid and Natty, since we don't support cluster on them.

comment:8 Changed 10 years ago by jdreed

We should figure out whose fault this is and file bugs against the packages. Testing on maverick will give us a better idea of when this broke.

comment:9 Changed 10 years ago by jdreed

  • Status changed from committed to development

reactivate 2.0.22 ->dev

comment:10 Changed 10 years ago by jdreed

  • Status changed from development to proposed

comment:11 Changed 10 years ago by jdreed

  • Status changed from proposed to closed
  • Resolution set to fixed

Remainder of this tracked as #982

Note: See TracTickets for help on using tickets.