Closed Bug 1649696 Opened 5 years ago Closed 2 years ago

Linux sandbox disables indirect branch prediction on x64/x86, resulting in a significant slowdown

Categories

(Core :: Security: Process Sandboxing, defect, P5)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
123 Branch
Fission Milestone Future
Tracking Status
firefox123 --- fixed

People

(Reporter: lth, Assigned: jld)

References

(Depends on 2 open bugs)

Details

(Keywords: perf-alert)

Attachments

(1 file)

There's a long saga at bug 1649109, the upshot of which is that seccomp by default disables indirect branch prediction as part of Spectre mitigation, resulting in a significant drop in performance for programs that use indirect calls. (This would mean every virtual and indirect call in C++; in my context it also means all indirect calls in WebAssembly.)

We can undo this slowdown by adding SECCOMP_FILTER_FLAG_SPEC_ALLOW to the seccomp flags (see bug 1649109 comment 17 for a small patch). Presumably we need to discuss whether we should do that, as this disables some Spectre mitigations. I observe that neither Windows nor macOS appear to have this mitigation enabled and that Firefox therefore runs without the mitigation on those platforms. There's probably no good reason why Linux should be different, but that's just my own perspective.

Once we have Fission, I think we should definitely disable these mitigations (chromium seems to). Until then, I expect they buy us some modest (theoretical, at this point) mitigation value so my inclination is that we shouldn't take action to override this default behavior.

(FWIW, I filed bug 1649853 to ask whether, by any chance, similar mitigations are being enabled on Windows.)

Chromium turns off all mitigations that were conditional on seccomp but also turns on STIBP explicitly; i.e., they turn off all mitigations except STIBP (which currently means just SSBD), and require site isolation even for that. Their bug for this is https://crbug.com/1029470.

As I understand it, the issue that STIBP protects against is that the branch prediction state is per physical core, and our processes could spy on unrelated tasks that happen to be on the core's other thread. Fission wouldn't help with that, and re-enabling SAB (with real parallelism) would increase the potential for exploitation if I understand correctly.

What we'd need would be a way to prevent sharing cores with other processes, and I think macOS may have something like that, but it didn't exist on Linux as of https://crbug.com/1029470#c32.

(For future reference, if we use SECCOMP_FILTER_FLAG_SPEC_ALLOW, we'll need a fallback if the kernel doesn't support it — it was added in 4.17, but the seccomp system call was added in 3.17, almost 4 years earlier. Also, if we need to use a prctl in some capacity, those are per-thread, so that would need to be applied in early startup and preferably before exec (there are some issues with injected libraries creating threads before main) or else we'd need to repurpose this exciting code that currently exists to support pre-3.17 kernels and I was hoping to get rid of someday.)

So this is weird: I can reproduce the result from bug 1649109 comment #0 (using sandbox vs. no sandbox rather than the JS shell), on an i9-7940X with kernel 5.7.6, but it still reproduces even if I boot with nosmt=force or spectre_v2_user=off, both of which disable the use of STIBP and the latter also disables IBPB, at least according to the documentation and what's reported in sysfs. So either the cause isn't STIBP, or there's a kernel bug such that it's still enabled with seccomp even when it shouldn't be.

However, setting SECCOMP_FILTER_FLAG_SPEC_ALLOW does remove the performance difference. I haven't tried manually turning on individual mitigations via prctl yet.

IIUC, cross-process-cross-hyperthread variant 2 attacks are in the same theoretically-possible-but-practically-extremely-hard category as, e.g., cross-process cache eviction attacks (which browsers also do nothing about). But it's useful to know Chrome seems to have STIBP enabled. It sounds like, after Fission ships, we'll need to look into this a bit more to see if that's still actually the case and how practical STIBP attacks actually are.

Severity: -- → S2
Priority: -- → P2

Presumably we need to discuss whether we should do that

I'm going to mark this P5 until there's a decision what to do from the JS side. If you know what you want, we can try to implement it on the sandboxing side.

Flags: needinfo?(luke)
Priority: P2 → P5

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #3)

What we'd need would be a way to prevent sharing cores with other processes, and I think macOS may have something like that

It does and we use it; see bug 1546544.

See Also: → CVE-2019-9815

The decision to do anything is a ways off (I don't think we should do anything before Fission), so I'll clear the ni? for now and we can keep this on the backlog.

Flags: needinfo?(luke)
Depends on: fission
Fission Milestone: --- → Future
Depends on: 1707955
QA Whiteboard: qa-not-actionable

Luke, we shipped Fission, and this bug is in the rather weird S2 but P5 state. Can you provide an update?

Flags: needinfo?(mail)
Flags: needinfo?(mail) → needinfo?(sdetar)

Jan, do you have any thoughts on this bug?

Flags: needinfo?(sdetar) → needinfo?(jdemooij)

Linux appears to have changed its default in commit 2f46993d83ff4abb310ef7b4beced56ba96f0d9d, which shipped in 5.16, and no longer enables SSBD or STIBP automatically when seccomp is used (but an administrator or distro can still choose that mode, and per-thread opt-in via prctl is always possible). Empirically, I can no longer reproduce the performance difference from bug 1649109.

The Linux commit message has a lengthy discussion of their rationale. I'm not completely following the comments there about pid namespaces; we don't currently use them (bug 1151624), but we do block most of the things that a pid could be used for, in particular reading /proc/{victim_pid}/maps to get the target's address space layout, so we might already have reasonable mitigations.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #11)

Linux appears to have changed its default in [commit 2f46993d83ff4abb310ef7b4beced56ba96f0d9d][change], which shipped in 5.16, and no longer enables SSBD or STIBP automatically when seccomp is used (but an administrator or distro can still choose that mode, and per-thread opt-in via prctl is always possible). Empirically, I can no longer reproduce the performance difference from bug 1649109.

Great, thanks for looking into this!

I think for now we should keep this bug open and blocked on bug 1707955. Maybe once that bug is fixed we can disable these mitigations on older kernels as well. For now it seems reasonable to rely on the kernel's default, also because we're shipping Fission and we don't have similar mitigations on other platforms as far as I know.

Flags: needinfo?(jdemooij)

Lowering severity because of comment 11.

Severity: S2 → S3

Now that bug 1837602 is resolved, do you think it makes sense to fix this for older kernels?

Flags: needinfo?(jdemooij)

(In reply to Ryan Hunt [:rhunt] from comment #14)

Now that bug 1837602 is resolved, do you think it makes sense to fix this for older kernels?

Yeah, I think it makes sense to fix this performance cliff with older kernels now that we've disabled Spectre mitigations. Ubuntu 22.04 LTS still ships 5.15 AFAICT so this still affects a lot of users.

jld, what do you think?

Flags: needinfo?(jdemooij) → needinfo?(jld)

(In reply to Jan de Mooij [:jandem] from comment #15)

Ubuntu 22.04 LTS still ships 5.15 AFAICT so this still affects a lot of users.

jld, what do you think?

I did a little investigating, and unfortunately the key word there is “Ubuntu”: they moved everyone to using Firefox as a Snap package, and Snap applies its own seccomp-bpf sandbox, which doesn't use SECCOMP_FILTER_FLAG_SPEC_ALLOW. This sets the speculations STORE_BYPASS and INDIRECT_BRANCH to “force disabled” status, meaning that the mitigations are applied and can't be turned back off by that process or its descendants. (This is also the case for Flatpak.)

However, Ubuntu LTS does upgrade kernels to newer branches, for hardware support, at least until the next LTS is released. In the past the situation was somewhat complicated and it was difficult to map from Ubuntu version to kernel version, but Ubuntu's current documentation about this indicates that desktop installs (but not server) now default to switching to new kernel versions as part of regular OS updates. So, a typical 22.04 LTS desktop system should be on 6.2.x by now, even if it was originally installed from the first 22.04.0 release, if I understand correctly.

But this might be an issue for Ubuntu 20.04, which is on kernel 5.15, and looks like it will remain there for the rest of its release cycle (until 2025), and defaults to using a normal .deb package (not Snap) for Firefox.

Also, I'm not sure how important this is, and I've only tested it in a VM so far, but it looks like the mitigations now slow down only the call external microbenchmark from the original bug report and not call internal; I don't know if there have been optimizations on the SpiderMonkey side that could have caused that.

Flags: needinfo?(jld)

(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #16)

(In reply to Jan de Mooij [:jandem] from comment #15)

Ubuntu 22.04 LTS still ships 5.15 AFAICT so this still affects a lot of users.

jld, what do you think?

I did a little investigating, and unfortunately the key word there is “Ubuntu”: they moved everyone to using Firefox as a Snap package, and Snap applies its own seccomp-bpf sandbox, which doesn't use SECCOMP_FILTER_FLAG_SPEC_ALLOW. This sets the speculations STORE_BYPASS and INDIRECT_BRANCH to “force disabled” status, meaning that the mitigations are applied and can't be turned back off by that process or its descendants. (This is also the case for Flatpak.)

However, Ubuntu LTS does upgrade kernels to newer branches, for hardware support, at least until the next LTS is released. In the past the situation was somewhat complicated and it was difficult to map from Ubuntu version to kernel version, but Ubuntu's current documentation about this indicates that desktop installs (but not server) now default to switching to new kernel versions as part of regular OS updates. So, a typical 22.04 LTS desktop system should be on 6.2.x by now, even if it was originally installed from the first 22.04.0 release, if I understand correctly.

But this might be an issue for Ubuntu 20.04, which is on kernel 5.15, and looks like it will remain there for the rest of its release cycle (until 2025), and defaults to using a normal .deb package (not Snap) for Firefox.

Also, I'm not sure how important this is, and I've only tested it in a VM so far, but it looks like the mitigations now slow down only the call external microbenchmark from the original bug report and not call internal; I don't know if there have been optimizations on the SpiderMonkey side that could have caused that.

Thanks for looking into this!

Interesting that you're seeing the perf-difference primarily on call external benchmark. Both of them use indirect calls, so I'd expect that they'd both see the slow down. We did land an optimization a while ago that splits a webassembly indirect call into a branch going to one of two different machine level indirect calls (one restores state for external calls, the other doesn't). That probably would be the cause, but the reason why doesn't seem obvious to me.

I do see that Debian 11 (the previous stable release from 2021) is listed as using the 5.10 kernel series. Not sure if that's significant or not though. Fedora seems to follow the kernel releases pretty closely, so they seem less likely to be behind.

Lars had a pretty simple patch in bug 1649109 comment 17 that he reported would fix the issue. Do you think it's worth trying to land that? If it's not a simple fix, I understand it might not be worth it as data seems to show this has become less severe.

Flags: needinfo?(jld)

(In reply to Ryan Hunt [:rhunt] from comment #17)

Lars had a pretty simple patch in bug 1649109 comment 17 that he reported would fix the issue. Do you think it's worth trying to land that? If it's not a simple fix, I understand it might not be worth it as data seems to show this has become less severe.

It would need to check for the relevant error (probably EINVAL) and fall back to calling seccomp without that flag, in order not to break kernels from 4.17 until 5.16, but otherwise that could be done.

(Leaving ni? to look at this more next week.)

(Edit, 2024-01-10: fixed kernel version numbers.)

Linux 4.17 applied some Spectre mitigations (SSBD and STIBP) by default
when seccomp-bpf is used, but later Linux 5.16 turned them off with
the rationale that, essentially: the attacks aren't really practical,
there are similar or worse attacks that it doesn't stop, and the
performance cost on is significant (STIBP seems to make indirect
branches unpredictable). Given that we support Linux distributions in
the affected range (e.g., Ubuntu 20.04 LTS, but not 22.04), and the
performance hit is very noticeable at least in microbenchmarks, this
patch opts out of the mitigations.

Note that, if a seccomp policy is applied by some external sandbox which
doesn't use this opt-out (e.g., if using Snap or Flatpak), and the
kernel is in that range, these Spectre mitigations will still be applied
and we can't turn them off.

Assignee: nobody → jld
Status: NEW → ASSIGNED

So, I did look at this some more; see above. Testing it (again) on Ubuntu 20.04 in a VM on a Threadripper and looking more closely at the numbers, I saw a 1.5x slowdown in call internal and 3x in call external; the original bug report on an Intel CPU, and what I recall also seeing on an Intel CPU when I tested a while ago, was about 2x on both. I don't know how interesting that is, but I thought I'd mention it for the record.

Flags: needinfo?(jld)
Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ebdd80f675b6 Use `SECCOMP_FILTER_FLAG_SPEC_ALLOW` where available. r=gcp
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch

(In reply to Pulsebot from comment #21)

Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ebdd80f675b6
Use SECCOMP_FILTER_FLAG_SPEC_ALLOW where available. r=gcp

We're seeing a number of performance improvements in CI!

== Change summary for alert #41031 (as of Tue, 16 Jan 2024 18:31:43 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
11% perf_reftest_singletons many-custom-props.html linux1804-64-shippable-qr e10s fission stylo webrender 342.56 -> 304.58
11% perf_reftest_singletons slow-selector-2.html linux1804-64-shippable-qr e10s fission stylo webrender 0.90 -> 0.80
11% perf_reftest_singletons slow-selector-1.html linux1804-64-shippable-qr e10s fission stylo webrender 0.90 -> 0.80
11% perf_reftest coalesce-1.html linux1804-64-shippable-qr e10s fission stylo webrender 84.17 -> 75.26
10% perf_reftest_singletons display-none-1.html linux1804-64-shippable-qr e10s fission stylo webrender 0.59 -> 0.53
... ... ... ... ...
2% offscreencanvas_webcodecs_main_2d_h264 offscreencanvas_webcodecs_main_2d_h264 Mean time across 100 frames: linux1804-64-qr e10s fission stylo webgl-ipc webrender 12.41 -> 12.15

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=41031

(In reply to Pulsebot from comment #21)

Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ebdd80f675b6
Use SECCOMP_FILTER_FLAG_SPEC_ALLOW where available. r=gcp

== Change summary for alert #41033 (as of Tue, 16 Jan 2024 13:35:24 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
11% youtube FirstVisualChange linux1804-64-shippable-qr fission warm webrender 179.97 -> 159.61 Before/After
9% reddit-billgates-ama.members ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 193.45 -> 175.15 Before/After
9% addkAR1 time_duration linux1804-64-shippable-qr fission webrender 50,859.48 -> 46,530.08 Before/After
8% youtube PerceptualSpeedIndex linux1804-64-shippable-qr fission warm webrender 1,067.16 -> 982.94 Before/After
7% youtube loadtime linux1804-64-shippable-qr fission warm webrender 1,274.17 -> 1,184.06 Before/After
... ... ... ... ... ...
2% nytimes SpeedIndex linux1804-64-shippable-qr fission warm webrender 1,093.93 -> 1,070.30 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=41033

(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #4)

So this is weird: I can reproduce the result from bug 1649109 […] but it still reproduces even if I boot with nosmt=force or spectre_v2_user=off, both of which disable the use of STIBP and the latter also disables IBPB […].

However, setting SECCOMP_FILTER_FLAG_SPEC_ALLOW does remove the performance difference. I haven't tried manually turning on individual mitigations via prctl yet.

I don't think I ever tested them separately, and I just did. STIBP (PR_SPEC_INDIRECT_BRANCH) actually doesn't change the microbenchmarks of indirect branches; it's SSBD (PR_SPEC_STORE_BYPASS) that reproduces the 2x/3x slowdowns.

The first part of that, counterintuitive as it sounds, makes sense: what I can easily test on right now is an AMD CPU that supports STIBP always-on mode (PDF), as well as an Intel CPU that supports Enhanced IBRS (a superset of always-on STIBP, essentially), so the PR_SPEC_INDIRECT_BRANCH mitigation is a no-op because it's already applied. On older CPUs that might cause a performance hit, however.

The second part is a little weirder, but maybe if I saw the generated code for the benchmark it might make more sense; nonetheless, speculative store bypass seems to be of concern only for same-process attacks (like jitcode in a non-Fission process) if I understand correctly, so I don't think we need to care about it.

I don't yet know which of these two is responsible for the performance effects reported in the last few comments, and I could try to find out, but I'm not sure if it matters given that we've already considered using STIBP and decided not to (bug 1841843).

See Also: → 1931745
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: