Closed Bug 1510491 Opened 7 years ago Closed 7 years ago

Don't eagerly hash debug shaders

Categories

(Core :: Graphics: WebRender, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(5 files, 2 obsolete files)

Right now we eagerly compile all shaders. This means compiling a DEBUG_OVERDRAW variant of a bunch of shaders, which is not work that needs to happen on startup. There's probably other fat that can be trimmed as well. I think the main cost here is probably first-run startup, when none of the shader binaries are cached. Though it may hurt us on regular startup as well, and certainly costs memory.
Priority: -- → P3
The eager compilation should only happen if we have a cached binary I believe, so this shouldn't happen for DEBUG_OVERDRAW shaders (unless the user has previously enabled the debug overlay, and within the 60? frame startup period). If that's correct, then I think we can skip blocking on this.
Flags: needinfo?(bobbyholley)
You're right that after bug 1510490 we don't compile them. We still do build a hash of all of the program sources, which may not be great for startup, but we shouldn't worry about it until we measure that to be a problem. I'll file a tracking bug for startup time and unblock here.
Flags: needinfo?(bobbyholley)
Summary: Don't eagerly compile debug shaders → Don't eagerly hash debug shaders
This may or may not be significant in practice, we should measure if startup time becomes an issue.
Blocks: wr-startup
No longer blocks: stage-wr-trains
Bug 1510493 had a significant impact on startup time, per bug 1514738 comment 5. That suggests that shader hashing is a non-trivial amount of work during startup, and that we shouldn't do it gratuitously. So I think we want a whitelist. We don't need to be ultra-precise with it, i.e. it can be a superset of the shaders that actually get used in startup for a given configuration. We can use a simple debug assertion to ensure this stays true. Note that this may not improve ts_paint in the same way, in the case that GPU process startup is no longer the long pole. But it's still likely to improve startup on systems with fewer cores and/or under load.
It doesn't really matter, but the program cache is technically optional, so if the device is to have a reliable in_startup() accessor we should track this state separately.
Depends on D15049
Huh, so it's almost within the noise, but this...seems to ever-so-slight regress ts_paint by 4ms or so [1]? That's really weird. I suppose the next thing to do would be to record the amount of time spent creating programs with and without this patch. [1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=6b4a345b50e94310091b01e8db32b3028ca58f89&newProject=try&newRevision=84b5eac9fc43715d97307be26ae3a1fe69338dc9&framework=1&filter=ts_&showOnlyComparable=1
(In reply to Bobby Holley (:bholley) from comment #9) > I suppose the next thing to do would be to record the amount of time spent > creating programs with and without this patch. Without: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6fb8d0d446674046f52c23a0f5a3546f5f70d82d With: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ebbe897d1fcee34f09331dbbcc699b5ad45f8bd
Alright, so the logging in comment 11 does show us spending ~50ms creating programs without these patches, versus ~10ms with the patches. That's pretty significant, and my only explanation for the ~4ms ts_paint regression is some kind of timing effect. I do think eliminating this startup work is probably important, because it could totally show up if the system isn't able to run GPU process initialization in parallel with whatever currently constitutes the long pole. That said, I agree that the whitelist is annoying, and last night I thought of an even better solution. We can just precompute the hashes for the program sources at compile-time in build.rs (along with the existing code that bakes the sources into the binary). That will mostly eliminate this startup overhead for all shaders, and then we don't need to be picky about which ones we create Program instances for. I'll hack that up on monday.
Assignee: nobody → bobbyholley
I don't think we do ourselves any favors by just parsing a blank shader in this situation.
I've got a new approach [1] which moves most of the hashing work to build time and drops the overhead under 1ms. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee4940ad9c699cde371121fb87e31c27f2585b64
Attachment #9032593 - Attachment is obsolete: true
Attachment #9032594 - Attachment is obsolete: true
Note that this does still seem to move ts_paint about 4ms in the wrong direction, but I'm pretty certain that's a scheduling artifact, and not representative of what we'll see in the wild. These patches take 50ms of avoidable work off the startup path for the GPU process, which is something we should obviously do.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: