Ticket #1195 (new defect)

Opened 9 years ago

Last modified 8 years ago

ssh to athena.dialup.mit.edu fails when keytab obtained doesn't match ssh machine

Reported by: kchen Owned by:
Priority: normal Milestone: The Distant Future
Component: linerva Keywords: transition
Cc: Fixed in version:
Upstream bug:  https://bugzilla.mindrot.org/show_bug.cgi?id=1008

Description

(geofft's writeup on debathena@… on April 17, 2012)

Comcast's residential DNS IPs (75.75.75.75 and 75.75.76.76) are anycast addresses for muliple machines, which means that when looking up *.dialup.mit.edu, which is load-balanced at the DNS level, there's no guarantee that two successive client lookups (from a stub resolver) will hit the same server, and thus are more likely than not to give you two different results, because the first DNS server you hit is happily caching your result while you talk to the second DNS server.

This means that if you open an SSH connection to athena.dialup.mit.edu and ssh then asks the Kerberos libraries to get tickets for athena.dialup.mit.edu, the Kerberos libraries are likely to canonicalize the name into another server than the existing SSH connection and give you a service ticket for a the wrong dialup, leading to various random but probable failures setting up a Kerberos connection.

Combined with the underlying issue behind Trac #315 where a failed keyex aborts the connection, as opposed to falling back to another keyex method (like checking the server's RSA key), this manifests as athena.dialup giving "Connection closed" messages more often than not from a Debathena box on a Comcast residential connection if the user has active tickets.

I recommend we give some thought to one or more of
1) giving up on GSSAPIKeyExchange (#315 is almost excuse enough, but

combining it with #787 and this issue makes it something of a real
problem), at least until we can teach SSH to do keyex fallback

2) changing athena.dialup from DNS-level load balancing to IP-level, by

assigning a distinct IP to athena.dialup and having every athena.dialup
host include the host/athena.dialup.mit.edu key in its keytab in
addition to its own (and turning off the GSSAPIStrictAcceptorCheck);
this is basically the configuration of the scripts.mit.edu pool, except
that ssh connections aren't actually load-balanced.
This would also have some UI benefits for non-GSSAPI users, since they
would only get a host key prompt/warning for the single IP once.

Change History

comment:1 Changed 9 years ago by kchen

  • Component changed from development to linerva

comment:2 Changed 9 years ago by kchen

  • Summary changed from Kerberized ssh to athena.dialup.mit.edu fails when keytab obtained doesn't match ssh machine to ssh to athena.dialup.mit.edu fails when keytab obtained doesn't match ssh machine

comment:3 Changed 9 years ago by geofft

Here's what appears to be the upstream bug for this issue:  https://bugzilla.mindrot.org/show_bug.cgi?id=1008

There are a couple of patches there, all with caveats.

comment:4 Changed 9 years ago by adehnert

  • Upstream bug set to https://bugzilla.mindrot.org/show_bug.cgi?id=1008

comment:5 Changed 9 years ago by adehnert

  • Keywords transition added

comment:6 Changed 8 years ago by geofft

That upstream patch has already been included in distros for a while; it's just a matter of adding GSSAPITrustDNS yes to debathena-ssh-client-config. Given that we don't set rdns = false in our krb5.conf (the default is true), and given that GSSAPI on Debathena means Kerberos, it doesn't seem particularly harmful to make SSH itself do the canonicalization since the Kerberos library will do so, anyway.

Does making this change sound good to everyone? (For what it's worth,  remctl also hard-codes the moral equivalent of GSSAPITrustDNS yes.)

comment:7 Changed 8 years ago by andersk

I’m very skeptical about the idea of loosening a default security setting, no matter what arguments you have that other different commands may already have analogously loose default settings. Is this even still an issue now that NetworkManager does DNS caching in precise and higher?

comment:8 follow-up: ↓ 12 Changed 8 years ago by kchen

This is no longer an issue for me personally because of Precise and its default caching resolver, and because my new router also forces a caching resolver on me. There is an edge case that probably doesn't matter in practice, though, which is when the 30 second TTL is expiring, this issue could come up.

comment:9 Changed 8 years ago by geofft

I’m very skeptical about the idea of loosening a default security setting, no matter what arguments you have that other different commands may already have analogously loose default settings.

The argument is that the _same_ command has an analogously loose default setting -- hostnames already get canonicalized by the GSSAPI/Kerberos layer, and the only reason SSHTrustDNS doesn't default to yes is that you might want to turn off canonicalization at the GSSAPI layer and wouldn't expect SSH to then go and turn it back on for you.

I'm happy to revert this proposed change as soon as MIT Kerberos stops defaulting rdns to true, or as soon as debathena-kerberos-config overrides it.

comment:10 Changed 8 years ago by ghudson

Setting rdns=false does not substantially improve the security of Kerberos. With rdns=false, we still do forward resolution, which allows an attacker to spoof the result using a cname records.

I don't really foresee MIT krb5 changing the rdns default. We hate the reverse resolution step because of the way it affects the usability of new deployments, but we think changing the default would cause significant problems for some existing deployments.

Our long-term plan for this involves two significant changes. First, we want to make the KDC able to perform canonicalization of host-based service principals using its own NSS configuration (which could involve a local resolver backed by a securely updated copy of the zone file). Second, we want the KDC to be able to tell the client as AS-REP time that it supports canonicalization; the client would then refrain from doing any NSS-based canonicalization of service principals when making TGS requests with those credentials. I don't think Debathena will be able to take advantage of this for quite a while, though.

comment:11 Changed 8 years ago by geofft

comment:12 in reply to: ↑ 8 Changed 8 years ago by adehnert

Replying to kchen:

This is no longer an issue for me personally because of Precise and its default caching resolver, and because my new router also forces a caching resolver on me. There is an edge case that probably doesn't matter in practice, though, which is when the 30 second TTL is expiring, this issue could come up.

Presumably the failure mode in this case is "rare mysterious error message that ~always goes away with a second try", rather than anything more persistent than that? (I guess the failure mode for this bug is ~always "mysterious error that goes away with half a dozen tries", so maybe that's not a huge improvement.)

Has there been any discussion of Geoff's option (2) with Ops?

Note: See TracTickets for help on using tickets.