Closed Bug 1516117 Opened 7 years ago Closed 11 months ago

Optimize locking in ChannelEventQueue::CompleteResume

Categories

(Core :: Networking: HTTP, enhancement, P2)

enhancement
Points:
2

Tracking

()

RESOLVED FIXED
134 Branch
Tracking Status
firefox66 --- wontfix
firefox134 --- fixed

People

(Reporter: mayhemer, Assigned: omansfeld)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-next])

Attachments

(1 file)

This shows heavily as a blocker (~1ms on a high perf hardware) for big number of resources, e.g. on https://www.facebook.com/DwayneJohnson/. We enter and release the lock 3 times. This can be easily reduced to 1 enter and leave instead. Ideally, rewrite the code as lock-less, but I'm afraid that is not easily doable.
self-time here can go up to 5ms in few cases, most of them are 1.5+ms, actually.
Question is if this is the locking. It more seems like the events being processed from inside cause the delay.
Blocks: 1516121

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

A perf profile would also be useful here.
P3 for now.

Blocks: necko-perf
Priority: P2 → P3
Severity: normal → S3
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-next]

Increased this to P2, since it's an easy change.

Severity: S3 → N/A
Priority: P3 → P2
Assignee: nobody → omansfeld

Just pushed a WIP. I think the locks I removed were relatively straight forward, but I'm not sure about FlushQueue(), because there are lots of locks in there and also stuff happening in between that doesn't need locking (see https://searchfox.org/mozilla-central/source/netwerk/ipc/ChannelEventQueue.cpp#39,46,56,81,86,103,108,113,120,125).

So in my mind it didn't make sense to put that whole function under a single lock, because then we are stopping other threads from accessing member vars for a prolonged time. I'm not sure about the performance tradeoffs here (or how to test them). That's why I called mMutex.Unlock() before calling FlushQueue() in MaybeFlushQueue() , but I'm also not sure if that's the correct pattern for unlocking before scope ends.

Any input is appreciated!

Hey Oskar,

Just pushed a WIP. I think the locks I removed were relatively straight forward, but I'm not sure about FlushQueue(), because there are lots of locks in there and also stuff happening in between that doesn't need locking (see https://searchfox.org/mozilla-central/source/netwerk/ipc/ChannelEventQueue.cpp#39,46,56,81,86,103,108,113,120,125).
So in my mind it didn't make sense to put that whole function under a single lock, because then we are stopping other threads from accessing member vars for a prolonged time. I'm not sure about the performance tradeoffs here (or how to test them). That's why I called mMutex.Unlock() before calling FlushQueue() in MaybeFlushQueue() , but I'm also not sure if that's the correct pattern for unlocking before scope ends.

The current code (w/o your patch) minimizes critical section duration, reducing the time that other threads are blocked. However, the large number of locks introduces a higher lock acquisition overhead. An optimal solution would be to balance the duration of critical sections and the number of locks used. We need to adjust the code so that critical sections avoid executing any long-running tasks particularly the runnables in the event queue. Hence, I think we can get rid of mMutex.Unlock() as well and further optimize FlushQueue. Apart from this I dont see any other code paths that could block for considerable duration. Kindly share if you come across more such paths.

Please run the code with the thread sanitizers in the try server. Let me know if you need more info on that.

Flags: needinfo?(omansfeld)
Attachment #9434567 - Attachment description: WIP: Bug 1516117 - Removed unnecessary locks in ChannelEventQueue::CompleteResume/MaybeFlushQueue/EndForcedQueueing → WIP: Bug 1516117 - Optimized locking in ChannelEventQueue::CompleteResume/MaybeFlushQueue/EndForcedQueueing/FlushQueue()
Attachment #9434567 - Attachment description: WIP: Bug 1516117 - Optimized locking in ChannelEventQueue::CompleteResume/MaybeFlushQueue/EndForcedQueueing/FlushQueue() → Bug 1516117 - Optimized locking in ChannelEventQueue::CompleteResume/MaybeFlushQueue/EndForcedQueueing/FlushQueue() r=#necko

Thanks for your input Sunil, I just put an updated patch up for review. Link to a build with thread sanitizers is also at Phabricator.

Flags: needinfo?(omansfeld)
Status: NEW → ASSIGNED
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/2e8cc75512de Optimized locking in ChannelEventQueue::CompleteResume/MaybeFlushQueue/EndForcedQueueing/FlushQueue() r=necko-reviewers,kershaw,valentin
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 134 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: