-
Notifications
You must be signed in to change notification settings - Fork 1k
[bug]: Long running channel iterator (in ChanSpy) blocks CEL HANGUP events of unrelated channels #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
wdoekes
added a commit
to ossobv/asterisk
that referenced
this issue
May 24, 2023
Because of upcoming planned changes/refactoring to app_chanspy, it is convenient to pass some of the many arguments as a struct. This changeset adds a channel_spy_context struct to pass around. Related: asterisk#68
wdoekes
added a commit
to ossobv/asterisk
that referenced
this issue
May 24, 2023
Because of upcoming planned changes/refactoring to app_chanspy, it is convenient to pass some of the many arguments as a struct. This changeset adds a channel_spy_context struct to pass around. Related: asterisk#68
wdoekes
added a commit
to ossobv/asterisk
that referenced
this issue
May 24, 2023
…tion This moves the guts of common_exec into channel_spy_consume_iterator. This makes refactoring/changing the code easier because there are fewer function local variables to consider. Related: asterisk#68
wdoekes
added a commit
to ossobv/asterisk
that referenced
this issue
May 24, 2023
Refactor channel spying so it never holds on to a channel iterator. Instead, we recreate the iterator when needed and skip channels that we've seen already, creating the illusion of using an iterator. This change was needed because the iterator caused the yet-unseen channels in the iterator to be referenced by the iterator. This reference ensured that the channel does not get destroyed. (Which is good, because the iterator needs valid channels to work on.) But, hanging on to a channel reference for longer than a short while conflicts with CEL logging. The CEL hangup logging is activated by the destruction of the channel. During chanspy activity, a bunch of channels would stay in limbo. First when the chanspying was done would those channels get their CEL hangup event logged. The fix here is to hang on to channel iterators for only a short while. An alternative fix that makes CEL hangup logging independent of channel destruction was deemed more invasive. This patch makes chanspy channel selection slightly more resource intensive. But that's a small price to pay for correct CEL hangup logging. Fixes: asterisk#68
its maybe related to some code logic? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Severity
Major
Versions
master,20,19,18,16
Components/Modules
chan_spy, channels
Operating Environment
Right now, testing with branch 18, but this should be on all versions.
Frequency of Occurrence
Frequent
Issue Description
The
ChanSpy()
channel iterator holds references to channels for the duration of the listening/spying. These referenced channels will get destroyed first when the channel iteration resumes (when done listening to that one channel). This is a problem for the CEL HANGUP event, because all channels still in the iterator will not get any HANGUP reporting until this resumption: anyone relying on CEL HANGUP time for call duration statistics will get wrong values.A.k.a. CEL HANGUP times can be wrong (late) and appear "batched".
Components
ChanSpy -- iterates over all channels using (e.g. ast_channel_iterator_by_name_new) -- keeps the iterator until the spying is done:
asterisk/apps/app_chanspy.c
Lines 956 to 964 in 1a7866b
asterisk/apps/app_chanspy.c
Lines 994 to 999 in 1a7866b
asterisk/apps/app_chanspy.c
Line 1146 in 1a7866b
asterisk/apps/app_chanspy.c
Line 1184 in 1a7866b
CEL hangup events -- are generated because of a publish by the ast_channel destructor:
asterisk/main/channel.c
Lines 2206 to 2213 in 1a7866b
asterisk/main/stasis_channels.c
Lines 948 to 950 in 1a7866b
asterisk/main/cel.c
Lines 910 to 921 in 1a7866b
To get accurate HANGUP timings from CEL, channel destruction should never be delayed for more than a short while.
But because some components (ChanSpy) can hold references to channels that are hung up, destruction can be delayed for an indetermined amount of time.
This results in inaccurate CEL HANGUP timestamps: those who rely on CEL HANGUP time for call duration statistics will get wrong values
Scenario
At t=0, we might have three channels:
At t=1 we start spying on channels using ChanSpy. The channels (hash) container is filtered and turned into a (list) container (
multi_container
()) holding a snapshot of then active channels that match the filter: themulti_iterator
(). This list/iterator is returned to the application (chanspy):__ao2_link()
->rb_ao2_new_node()
->"Container node creation"
)At t=1, we traverse the iterator. We might decide to skip channelA and start spying on channelB:
/* Bump ref of returned object */ ("Next iterator object.")
, replace last_node (NULL), last_node = channelA, return channelA/* Bump ref of returned object */ ("Next iterator object.")
, replace last_node (channelA, -1 ref), last_node = channelBchannel_spy
)At this point, the references look like this:
Now, at t=2, channelA and channelC hang up:
First at t=3, when either the spying channel hangs up, or the spyee hangs up, do we exit
channel_spy
and do we resume looking at the multi_iterator:This causes the channelC destroying event to fire at t=3 instead of t=2. And that means that the reported CEL time for the hangup event is wrong.
Solutions
Someone should make a design decision on this. I would say the least impactful is to make the chanspy iterator short-lived, by copying the IDs of the channels and iterating over the channel IDs instead.
But if we can make the HANGUP event fire sooner by moving the AST_FLAG_DEAD+publish to something earlier, that works for me too.
Cheers,
Walter Doekes
OSSO B.V.
(*) The channel iteration:
ast_channel_iterator_by_name_new()
->ast_channel_callback()
->ao2_callback_data()
->__ao2_callback_data()
->internal_ao2_traverse()
asterisk/main/astobj2_container.c
Lines 260 to 273 in 1a7866b
asterisk/main/astobj2_container.c
Line 356 in 1a7866b
(an ao2_container list is created to hold the filtered channels)
next_channel()
->ast_channel_iterator_next()
->ao2_iterator_next()
->__ao2_iterator_next()
:asterisk/main/astobj2_container.c
Lines 603 to 604 in 1a7866b
asterisk/apps/app_chanspy.c
Lines 994 to 999 in 1a7866b
(channels are popped from the ao2_container list, but not as immediate as we would like)
Relevant log output
Asterisk Issue Guidelines
The text was updated successfully, but these errors were encountered: