Improve GCM perfomance on aarch32
Categories
(NSS :: Libraries, task, P2)
Tracking
(firefox-esr60 wontfix, firefox-esr68 wontfix, firefox67 wontfix, firefox67.0.1 wontfix, firefox68 wontfix, firefox69 wontfix, firefox71 wontfix, firefox72 wontfix, firefox73 fixed)
People
(Reporter: m_kato, Assigned: m_kato)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [geckoview:p2] [bcs:p2])
Attachments
(2 files)
bug 1559012 is for aarch64, but we can improve GCM even if aarch32.
Comment 1•6 years ago
|
||
Makoto - Can I volunteer you for this one? :) (Feel free to un-assign if it's overload).
Comment 2•6 years ago
|
||
esr68=affected because we might want to uplift this optimization to Fennec ESR 68.1.
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
Optimize GCM perfomance using https://conradoplg.cryptoland.net/files/2010/12/gcm14.pdf via ARM's NEON.
Comment 5•6 years ago
|
||
I'm curious about the attached patch - can I give this a try and see if it helps? Is it waiting on anything?
I've re-profiled speedof.me with webrender and the GCM_DecryptUpdate
is very prominent:
https://perfht.ml/2pBJsVr
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 6•6 years ago
|
||
I need to rebase old patch for review
Assignee | ||
Comment 7•6 years ago
|
||
(In reply to Andrew Creskey from comment #5)
I'm curious about the attached patch - can I give this a try and see if it helps? Is it waiting on anything?
I've re-profiled speedof.me with webrender and theGCM_DecryptUpdate
is very prominent:
https://perfht.ml/2pBJsVr
Well, this fix will improve a lot of performance of gcmHash_Update
that is 22% of this profiling data. About rijndael_encryptBlock128
, we have already landed improvement for ARMv8 CPU (MotoG5, Pixel 2 and etc) although it isn't merged into Fenix Nightly.
Updated•6 years ago
|
Comment 8•6 years ago
|
||
Thank you Makoto.
I did apply this patch to an arm32 PGO build of geckoview_example.
Unfortunately I couldn't pick up any performance improvements but all I have to test with are the speedof.me
tests.
I think Ideally we would more isolated GCM performance tests.
Assignee | ||
Comment 9•6 years ago
|
||
(In reply to Andrew Creskey from comment #8)
Thank you Makoto.
I did apply this patch to an arm32 PGO build of geckoview_example.
Unfortunately I couldn't pick up any performance improvements but all I have to test with are thespeedof.me
tests.
I think Ideally we would more isolated GCM performance tests.
This fix is that GCM improves 1.5-2.0 times faster than original even if non-ARMv8 hardware. Since we don't turn on WebRender for 32-bit device, it spends a lot of times on filter processing (bug 961759).
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_gcm_e 69Mb 256 1M 0 0.000 10000.000 10.000 6Mb <-- with this fix
aes_gcm_e 37Mb 256 654T 0 0.000 10000.000 10.000 3Mb
Comment 10•6 years ago
|
||
Thank you for collecting those throughput results, m_kato, they look great.
I did also turn on WebRender for that test, but, like I said, it's not a well defined work flow and there's a lot of noise.
I have a bit of a backlog, but let me needinfo myself to run this change on a real set of pages.
Our current automated pageload tests don't necessarily use the actual cipher suite that a client/server would agree on.
Comment 11•6 years ago
|
||
Makato, I'm curious -- how are you building your measured throughput tests?
Is this in-tree in gecko?
In Bug 1591725 we're looking at a more aggressive build optimization than the current -Oz
. It occurred to me that if your tests are built in a standalone app with a different build configuration then we would get different results. AFAIK -Oz
may disable some vectorization.
Assignee | ||
Comment 12•6 years ago
|
||
(In reply to Andrew Creskey from comment #11)
Makato, I'm curious -- how are you building your measured throughput tests?
Is this in-tree in gecko?
In Bug 1591725 we're looking at a more aggressive build optimization than the current-Oz
. It occurred to me that if your tests are built in a standalone app with a different build configuration then we would get different results. AFAIK-Oz
may disable some vectorization.
This result is by bltest in NSS tree (https://searchfox.org/mozilla-central/source/security/nss/cmd/bltest). And this algorithm is for armv7 using the paper (comment #4) and this doesn't depends on compiler's automatic vectorization.
Updated•6 years ago
|
Assignee | ||
Comment 13•6 years ago
|
||
Since I don't have commit permission for nss, could you land this? (I guess that sheriffs don't have permission for nss).
Comment 14•6 years ago
|
||
Will take in NSS 3.49. Leaving my needinfo until after I branch.
Comment 15•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 16•6 years ago
|
||
I do get this build error in new nss-3.49 release on arm, and after a bit of tinkering I found the issue to be the patch discussed in this thread. Here's the error:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: warning: wildcard match appears in both version 'NSSprivate_3.11' and 'NSSprivate_3.16' in script
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: error: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm-arm32-neon.o: multiple definition of 'gcm_HashMult_hw'
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm.o: previous definition here
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: error: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm-arm32-neon.o: multiple definition of 'gcm_HashWrite_hw'
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm.o: previous definition here
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: error: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm-arm32-neon.o: multiple definition of 'gcm_HashInit_hw'
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm.o: previous definition here
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: error: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm-arm32-neon.o: multiple definition of 'gcm_HashZeroX_hw'
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/9.2.0/../../../../armv7a-unknown-linux-gnueabihf/bin/ld: Linux2.6_arm_armv7a-unknown-linux-gnueabihf-gcc_glibc_PTH_OPT.OBJ/Linux_SINGLE_SHLIB/gcm.o: previous definition here
collect2: error: ld returned 1 exit status
I'm going to attach the full build log in a moment.
Should I open a new bug for this?
Comment 17•6 years ago
|
||
Description
•