Ticket #1352 (closed defect: fixed)

Opened 8 years ago

Last modified 7 years ago

AFS slow on athena.dialup.mit.edu?

Reported by: kchen Owned by:
Priority: normal Milestone: The Distant Future
Component: linerva Keywords: transition
Cc: Fixed in version:
Upstream bug:

Description (last modified by andersk) (diff)

zcrypt, both in barnowl, and outside barnowl, is slow on athena.dialup.mit.edu. (Fixed.)

AFS seems generically slower on athena.dialup.mit.edu than on linerva. (See comments.)

Change History

comment:1 Changed 8 years ago by kchen

  • Component changed from -- to linerva

comment:2 Changed 8 years ago by achernya

Steven and I were doing some investigating of zcrypt, and found that one of the things that gpg does is encrypt the input data. It's possible that the dialups have different performance on encryption and compression when compared to Linerva. See below for a manual decryption of a message sent by zcrypt:

$ gpg --list-packets < /tmp/achernya/gpg-foo
:symkey enc packet: version 4, cipher 7, s2k 3, hash 2
        salt d599691305fd96ef, count 65536 (96)
gpg: AES encrypted data
gpg: gpg-agent is not available in this session
:encrypted data packet:
        length: 62
        mdc_method: 2
gpg: encrypted with 1 passphrase
:compressed packet: algo=1
:literal data packet:
        mode b (62), created 1370240911, name="",
        raw data: unknown length

We could modify zcrypt to pass "--compress-algo none" if that's the slowness pain point, although I haven't thought very hard about the security implications of disabling compression.

That said, we should profile the slowness of zcrypt more, to determine if it's AFS slowness,forking slowness, or just gpg slowness in encryption or compression.

However, if the slowness is in gpg encryption, there's not much we can do, because it looks like AES-NI is neither supported by the dialups nor supported by 64-bit gpg, despite the  code having been committed for 32-bit gpg.

Version 0, edited 8 years ago by achernya (next)

comment:3 Changed 8 years ago by andersk

If you believe that zcrypt is even slightly computationally bounded, then you are confused. The real problem is the layers and layers of wrapper scripts in the barnowl locker, mostly the stuff that sets up perl (which zcrypt doesn’t even use). perl -MPAR -e '1' takes several seconds.

comment:4 Changed 8 years ago by andersk

(Oh, and  this commit in 1.10dev will save most of the remaining fraction of a second.)

comment:5 Changed 8 years ago by kchen

It seems like this ticket may really be that athena.dialup.mit.edu or AFS on athena.dialup.mit.edu is really slow. (See -c consult from June 14, 2013.)

With my normal PATH and MANPATH set, running "time man screen > /dev/null" takes .1 to .2 seconds on linerva, and about 24 seconds on ten-thousand-dollar-bill.mit.edu. If I unset PATH and MANPATH, it's much more reasonable, at .5 to .6 seconds on ten-thousand-dollar-bill.

With the environment variables set:
[~ dr-wily]> strace man screen | & grep -E '(/mit|/afs)' | wc

538 3423 46364

[~ ten-thousand-dollar-bill]> strace man screen | & grep -E
'(/mit|/afs)' | wc

766 4879 65398

Without them set, it's 95 for linerva and 100 for ten-thousand-dollar-bill.

comment:6 Changed 8 years ago by kchen

One other thing recently noted on -c consult is that it takes me at least 30 seconds to log in to athena.dialup.mit.edu and get a prompt, while people with fewer lockers in .environment it's only 3 seconds. For linerva for me with my standard .environment, it's just about 2 seconds.

comment:7 Changed 7 years ago by andersk

  • Description modified (diff)
  • Summary changed from zcrypt slow on athena.dialup.mit.edu to AFS slow on athena.dialup.mit.edu?

With some locker script twiddling and the release of BarnOwl? 1.9rc2, zcrypt is now about equivalently fast (100 ms) on Linerva and athena.dialup. Updating the ticket to be about reported generic AFS slowness.

comment:8 follow-up: ↓ 9 Changed 7 years ago by kchen

  • Status changed from new to closed
  • Resolution set to fixed

Ops did something (I don't have details) that may have fixed the issue on August 19, and things have continued to look fine since then. [help.mit.edu #2406178]

comment:9 in reply to: ↑ 8 Changed 7 years ago by jweiss

Replying to kchen:

Ops did something (I don't have details) that may have fixed the issue on August 19, and things have continued to look fine since then. [help.mit.edu #2406178]

On half of them, I simply restarted the AFS client. On the other half, I removed the -memcache option to afsd. Having looked at things, I believe the later was more effective long term, and I plan to convert all of the dialups to a disk based cache over winter break / IAP.

Note: See TracTickets for help on using tickets.