Limiting Routing Delay Time

roco · October 15, 2023, 10:33pm

Hi, is there a way to limit routing delay of first attempt to a limited time? For example if after 5 seconds the call is still pending because of routing delay I would like to just kill that session.
Lately happened an issue related to this: dns resolution servers of my dedicated services was down for about 30 minutes. In that range of time my terminations that were using hostnames instead of IP was not resolved and SEMS created a long queue of all this calls waiting to be resolved. I had calls with routing delay of 500 seconds and this long queue of thousands of calls pending to be resolved had crashed Kamailio (proxy) and all signaling were lost. In this way Kamailio never knew about problems on servers that had dns resolution problems and never deactivated that nodes because technically they were up and running.
So that’s why I’m asking if is there a way to limit the routing delay of the first attempts, so I can have the control of issues like these.
Maybe through Kamailio? Or is there a way to just set something on SEMS nodes?
Thank you

dmitry.s · October 17, 2023, 8:40am

For example if after 5 seconds the call is still pending because of routing delay I would like to just kill that session.

5s delay is not normal value. Something incorrectly configured in your system.

and all signaling were lost.

What does it mean? Could you provide pcap trace for such calls?

roco · October 17, 2023, 3:16pm

Try to configure a termination with an not existent hostname, for example sip.hotnmanenotexistent.com, that would cause the routing delay right? I would like to limit this delay, to have a control on that. Thank you

dmitry.s · October 17, 2023, 3:22pm

No, it will not cause routing delay. It will cause Unresolvable destination internal disconnect code without any delays.

Could you reproduce issue yourself and provide logs and traces?

roco · October 17, 2023, 4:06pm

Ok I give you another example: a real hostname but for a problem with dns resolution it is resolved after multiple seconds. That is what happened to me, I would like to set a timeout to dns resolution. I don’t have sip traces because I was not recording anything on that time. I can try to reproduce the error but in the meantime it’s clear that yeti waits for dns to be resolved without a timeout, and that create routing delay

dmitry.s · October 17, 2023, 4:31pm

but in the meantime it’s clear that yeti waits for dns to be resolved without a timeout

Have you checked this or this it just your idea? Could you not just generate ideas about problems but provide some proves like pcap traces and debug logs. This is 3rd time I asking for logs in this topic but you still generating hypotesis we have to check. I have not so many time to reproduce your issues without any details.

roco · October 17, 2023, 4:42pm

Ok I’ll do

roco · October 17, 2023, 8:43pm

I created a gateway with a real hostname that has wrong ns servers to try to replicate the routing delay issue. Routing delay increased to 7.6 seconds and then gave me the unresolvable destination message. This is for an unresolvable ns. I can’t replicate that for a slow dns resolver because I don’t know how to do it. If you multiply this for thousand of calls I then ask to you if this can cause some issues, and, if resolver is slow to reply, if SEMS just waits for reply from resolver.
Using direct ip terminations routing delay is below 0.005 seconds.

dmitry.s · October 17, 2023, 9:00pm

so this is not true

If you multiply this for thousand of calls I then ask to you if this can cause some issues,

Why I have to multiply this?
Could you explain what wrong with behavior you shown? You expecting no delay introduced by DNS resolving?

roco · October 17, 2023, 9:17pm

The problem is: if I use a termination with an hostname and that hostname can’t be resolved quickly because of nameservers not working properly, SEMS will wait for dns to be resolved without a timeout. And if you send thousands of call to that termination, the queue will increase without any way to manage. That’s what happened to me, it’s not hypotesis, it crashed my Kamailio because of that. I was only trying to report it to you.

Probably I can’t explain with right words, no problem, I will use only IP, that for sure will fix the issue.

dmitry.s · October 17, 2023, 9:58pm

This is not true, even your screenshot shows timeout there - 12s.

And if you send thousands of call to that termination, the queue will increase without any way to manage.

Calls processed asynchronously so delay in one should not affect other calls.

I was only trying to report it to you.

From my point of view you don’t want to spend time on reproducing issue and propose me do it instead. It may be ok if I will see some proves like logs and pcap but I don’t want to spent time on checking strange ideas like SEMS will wait for dns to be resolved without a timeout, all signaling were lost or queue will increase without any way to manage.

Probably I can’t explain with right words, no problem,

Provide not your explanations but logs and traces where we can see incorrect behavior and it will be helpful report.

That’s what happened to me, it’s not hypotesis, it crashed my Kamailio because of that.

or may be your kamailio just not configured properly? Kamailio should not crash in any situation, even when there is no response from node.

roco · October 17, 2023, 10:14pm

This is an example of an unreachable dns, tested and configured to give you some examples. The problem I’m talking about occurs when dns is reachable but slow to resolve, and it’s an issue I’m not able to replicate

check other topics and what I’ve done to report things to you

asynchronously means also that multiple calls send invites to single hostname

I can’t extract logs of this behavior in any way, you are the software developers, I just tried to give you all elements needed to replicate by yourself.

It can crash if you have thousands and thousands of call waiting for a reply from termination that increased up to 500 seconds.

Anyway you can close this topic, thank you for your support

roco · October 20, 2023, 9:54pm

problems are here and are 2:

-it continues rerouting even if the first route has sent 18x session progress/ringing
-there is no ring timeout set
This trace continues for 230 seconds waiting for 200 response

So how can I fix it?

I’ve just set a ringing timeout on terminations of 60 seconds. But what if the termination terminate the call after 30 seconds of ringing? It still continue to reroute to others terminations?
About ringing timeout, is there a parameter I can set directly on SEMS, instead of setting it on gateway via web panel?
Thanks

dmitry.s · October 21, 2023, 9:18am

Rerotuing continues because first route respond 480, it is expected behavior

-there is no ring timeout set

You can set it.

This trace continues for 230 seconds waiting for 200 response

Call originator may send cancel any time if he don’t want to wait.

About ringing timeout, is there a parameter I can set directly on SEMS, instead of setting it on gateway via web panel?

no

How this trace related to topic? signaling were lost, long queue of thousands of calls pending where we can see it?

roco · October 21, 2023, 9:28am

Signaling lost because kamailio crashed for some seconds, so no cancel received and queue was growing and growing because of no ringing timeout. That caused kamailio to crash again and again

roco · October 21, 2023, 9:29am

Or that’s what I think was happened

dmitry.s · October 21, 2023, 9:36am

There are no yeti related issues on your image. So as I said before looks like your kamailio is not configured properly.

roco · October 21, 2023, 9:52am

No there aren’t but you should consider setting ringing timeout to a default value for every termination created.

roco · October 21, 2023, 10:30am

I’m really not sure that was the problem of some days ago, I think that problem I’ve exposed at the beginning is different but I’m continuing to investigate

roco · October 23, 2023, 1:40pm

After I set ringing timeout to 60 seconds, it continues to reroute if first termination has gone in timeout.
For example first termination has ring for 50 seconds then replied 480, so sems reset timeout counters and sends an invite to another termination for other 60 seconds.
I would expect that after 60 seconds from first 18x of first termination, the call is cancelled, without attempting to other terminations. Is there a way to fix this?

Another thing is if is it possible to stop rerouting after a 480, some switches act in that way. Maybe you can put a toggle. Thanks

Update:
I’ve stopped rerouting of 480 and 408 codes via stop hunting on codes list.

For ringing timeout I wait a reply from you. Thanks