-
Notifications
You must be signed in to change notification settings - Fork 1k
res_musiconhold: Appropriately lock channel during start. #1104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This relates to asterisk#829 This doesn't sully solve the Ops issue, but it solves the specific crash there. Further PRs to follow. In the specific crash the generator was still under construction when moh was being stopped, which then proceeded to close the stream whilst it was still in use. Signed-off-by: Jaco Kroon <[email protected]>
cherry-pick-to: 20 |
ast_channel_lock(chan); | ||
state = ast_channel_music_state(chan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just do this (and the unlock) one time at the top of the function? Or do we need to serialize all access to state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect there's potentially some long-running sub-tasks which is why I went a bit more granular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long-running sub tasks like what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. You know better than me. I didn't dig into this much further beyond this specific PR, life happened and I deviated from this. My gut was that the construction of some of these generators could be time-consuming, and that that could have adverse impact on the channel structure.
But yea, if any of those do take overly long, it's going to be the case that the channel gets no audio anyway, and it would most likely be signalling stuff that just waits for a "long" duration, and that's probably OK if those gets blocked for a few hundred ms.
Based on that, sure, I think I can resubmit with "lock at top, release at bottom" and we see what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If memory serves, when this originally came up I did an audit of this file and I think that the channel lock is serializing access to members of the moh_files_state
structure, so the way it is here in this PR is probably the safest approach.
It would be helpful to go through and note which functions are being called with the channel lock already held, but that may be out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to go through and note which functions are being called with the channel lock already held, but that may be out of scope for this PR.
I went ahead and did this and pushed an additional commit that you can throw away or merge - up to you.
After your change, the only place in the module where the music state is manipulated without the channel lock held is in local_ast_moh_cleanup
which is only called during channel destruction (but ast_moh_cleanup()
is currently API so anyone could call it).
I think locking around state manipulation in local_ast_moh_cleanup
is technically the right thing, and the channel lock is still available at the time it is called, so it should be safe, and I've added another change to this PR that does that.
Attention! This pull request may contain issues that could prevent it from being accepted. Please review the checklist below and take the recommended action. If you believe any of these are not applicable, just add a comment and let us know.
Documentation: |
Workflow PRCheck failed |
Hi everyone. We appreciate the work that is going into resolving #829. I wanted to share some very coarse numbers with everybody. We have a script (which we unfortunately can't share, since it interfaces with our application, not Asterisk) that can reproduce #829, and we have been running it against various revisions to see how long it takes before reproduction. 20.9.3 (no patches applied)
20.9.3 (with just 4f5ecca)
20.9.3 (with all commits on this branch up to 36f842b)
Obviously more runs would be desirable in all cases but I am not sure if there's value in letting the script run for longer. So it seems like there is still something going on. If there's any additional diagnostic information that we can share (other than what we shared in #829), please let us know. |
Same details I requested in my email discussion with you, with details on exact code version (commit # ideally or asterisk release version, and list of additional patches would be helpful). You're welcome to email those to me as previous if you feel there is sensitive information in it. |
This relates to #829
This doesn't fully solve the Ops issue, but it solves the specific crash there. Further PRs to follow.
In the specific crash the generator was still under construction when moh was being stopped, which then proceeded to close the stream whilst it was still in use.