res_musiconhold: Appropriately lock channel during start. #1104

jkroonza · 2025-01-31T10:37:09Z

This relates to #829

This doesn't fully solve the Ops issue, but it solves the specific crash there. Further PRs to follow.

In the specific crash the generator was still under construction when moh was being stopped, which then proceeded to close the stream whilst it was still in use.

This relates to asterisk#829 This doesn't sully solve the Ops issue, but it solves the specific crash there. Further PRs to follow. In the specific crash the generator was still under construction when moh was being stopped, which then proceeded to close the stream whilst it was still in use. Signed-off-by: Jaco Kroon <[email protected]>

jkroonza · 2025-01-31T21:10:32Z

cherry-pick-to: 20
cherry-pick-to: 21
cherry-pick-to: 22

seanbright · 2025-02-05T23:38:24Z

res/res_musiconhold.c

+				ast_channel_lock(chan);
+				state = ast_channel_music_state(chan);


Can we just do this (and the unlock) one time at the top of the function? Or do we need to serialize all access to state?

I suspect there's potentially some long-running sub-tasks which is why I went a bit more granular.

Long-running sub tasks like what?

Good question. You know better than me. I didn't dig into this much further beyond this specific PR, life happened and I deviated from this. My gut was that the construction of some of these generators could be time-consuming, and that that could have adverse impact on the channel structure.

But yea, if any of those do take overly long, it's going to be the case that the channel gets no audio anyway, and it would most likely be signalling stuff that just waits for a "long" duration, and that's probably OK if those gets blocked for a few hundred ms.

Based on that, sure, I think I can resubmit with "lock at top, release at bottom" and we see what happens?

If memory serves, when this originally came up I did an audit of this file and I think that the channel lock is serializing access to members of the moh_files_state structure, so the way it is here in this PR is probably the safest approach.

It would be helpful to go through and note which functions are being called with the channel lock already held, but that may be out of scope for this PR.

It would be helpful to go through and note which functions are being called with the channel lock already held, but that may be out of scope for this PR.

I went ahead and did this and pushed an additional commit that you can throw away or merge - up to you.

After your change, the only place in the module where the music state is manipulated without the channel lock held is in local_ast_moh_cleanup which is only called during channel destruction (but ast_moh_cleanup() is currently API so anyone could call it).

I think locking around state manipulation in local_ast_moh_cleanup is technically the right thing, and the channel lock is still available at the time it is called, so it should be safe, and I've added another change to this PR that does that.

github-actions · 2025-04-08T20:19:35Z

Attention! This pull request may contain issues that could prevent it from being accepted. Please review the checklist below and take the recommended action. If you believe any of these are not applicable, just add a comment and let us know.

There is more than 1 commit in this PR and no PR comment with a multiple-commits: special header. Please squash the commits down into 1 or see the Code Contribution documentation for how to add the multiple-commits: header.

Documentation:

Asterisk Developer Documentation

github-actions · 2025-04-08T20:19:43Z

Workflow PRCheck failed
master-ari1: FAILED TEST: rest_api/bridges/error
master-ari2: FAILED TEST: rest_api/websocket_requests/bad_requests
master-ari2: FAILED TEST: rest_api/websocket_requests/parallel_requests
master-ari2: FAILED TEST: rest_api/websocket_requests/single_requests
master-local_iax2_mgr: FAILED TEST: manager/config/basic
master-local_iax2_mgr: FAILED TEST: manager/config/no_preserve_effective_context
master-local_iax2_mgr: FAILED TEST: manager/config/restricted
master-pjs2: FAILED TEST: channels/pjsip/diversion/diversion_caller_id
master-pjs5: FAILED TEST: channels/pjsip/stir_shaken/stir_shaken_anon_callerid

ashtonmoomoo · 2025-04-09T02:09:15Z

Hi everyone. We appreciate the work that is going into resolving #829.

I wanted to share some very coarse numbers with everybody. We have a script (which we unfortunately can't share, since it interfaces with our application, not Asterisk) that can reproduce #829, and we have been running it against various revisions to see how long it takes before reproduction.

20.9.3 (no patches applied)

Run	# Calls before crash
1	54
2	292
3	208
4	300

20.9.3 (with just 4f5ecca)

Run	# Calls before crash
1	77
2	486
3	233
4	257

20.9.3 (with all commits on this branch up to 36f842b)

Run	# Calls before crash
1	349
2	29
3	567
4	133

Obviously more runs would be desirable in all cases but I am not sure if there's value in letting the script run for longer.

So it seems like there is still something going on. If there's any additional diagnostic information that we can share (other than what we shared in #829), please let us know.

jkroonza · 2025-04-09T07:06:25Z

Same details I requested in my email discussion with you, with details on exact code version (commit # ideally or asterisk release version, and list of additional patches would be helpful). You're welcome to email those to me as previous if you feel there is sensitive information in it.

github-actions bot added pr-submit-testing-in-progress pr-submit-tests-passed and removed pr-submit-testing-in-progress labels Jan 31, 2025

asterisk deleted a comment from github-actions bot Feb 1, 2025

seanbright reviewed Feb 5, 2025

View reviewed changes

res_musiconhold.c: Annotate when the channel is locked.

73820dc

github-actions bot added pr-submit-testing-in-progress and removed pr-submit-tests-passed labels Apr 8, 2025

res_musiconhold.c: Ensure we're always locked around music state access.

36f842b

github-actions bot added pr-submit-testing-in-progress pr-submit-tests-failed and removed pr-submit-testing-in-progress labels Apr 8, 2025

github-actions bot added the has-pr-checklist A PR Checklist is present on the PR label Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

res_musiconhold: Appropriately lock channel during start. #1104

res_musiconhold: Appropriately lock channel during start. #1104

jkroonza commented Jan 31, 2025 •

edited by seanbright

Loading

jkroonza commented Jan 31, 2025

seanbright Feb 5, 2025

jkroonza Feb 6, 2025

jcolp Apr 2, 2025

jkroonza Apr 2, 2025

seanbright Apr 8, 2025

seanbright Apr 8, 2025

github-actions bot commented Apr 8, 2025

github-actions bot commented Apr 8, 2025

ashtonmoomoo commented Apr 9, 2025

jkroonza commented Apr 9, 2025

		ast_channel_lock(chan);
		state = ast_channel_music_state(chan);

res_musiconhold: Appropriately lock channel during start. #1104

Are you sure you want to change the base?

res_musiconhold: Appropriately lock channel during start. #1104

Conversation

jkroonza commented Jan 31, 2025 • edited by seanbright Loading

jkroonza commented Jan 31, 2025

seanbright Feb 5, 2025

Choose a reason for hiding this comment

jkroonza Feb 6, 2025

Choose a reason for hiding this comment

jcolp Apr 2, 2025

Choose a reason for hiding this comment

jkroonza Apr 2, 2025

Choose a reason for hiding this comment

seanbright Apr 8, 2025

Choose a reason for hiding this comment

seanbright Apr 8, 2025

Choose a reason for hiding this comment

github-actions bot commented Apr 8, 2025

github-actions bot commented Apr 8, 2025

ashtonmoomoo commented Apr 9, 2025

jkroonza commented Apr 9, 2025

jkroonza commented Jan 31, 2025 •

edited by seanbright

Loading