Skip to content

lwIP-v2: new patch to randomize tcp source ports #5906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 5, 2019

Conversation

d-a-v
Copy link
Collaborator

@d-a-v d-a-v commented Mar 22, 2019

@d-a-v
Copy link
Collaborator Author

d-a-v commented Mar 22, 2019

Following the links above is instructive.
To summarize:

  • currently and from a long time lwIP tcp client connections always uses the same tcp source port number right after boot
  • this port number is increased everytime a new one is needed (= new tcp client connection)
    (to be noted, linux has the same increasing behavior)
  • when connecting to the same server (right after boot), the triplet (esp-ip-address, source port, destination port) are the same, and may hit remote server list of sockets in time-wait-state (previous connection unproperly closed from the same esp). Consequently the new connection fails when it happens.
  • this is happening only when debugging (esp reboots often, in less time than time-wait expiration), so the nasty effect is amplified especially when bugs are being chased
  • efforts had been done when espressif's lwIP implementation wasn't open source, with WiFiClient::setLocalPortStart() Ephemeral source port not random #632 but it must be explicitely called with a different random number at every reboot. Efficient but not ideal.

This PR uses espressif firmware's r_rand() everytime a new local source port is needed. A different source port number is now showed by tcpdump right after boot. Source port range and duplication is verified everytime in lwIP's src/core/tcp.c:tcp_new_port(). It is implemented as a local patch for upstream lwIP so it is valid not only with WiFiClient but also with @me-no-dev's Async libraries (they don't use WiFiClient).

WiFiClient::setLocalPortStart() is still usable with the same effects as before.

@TD-er
Copy link
Contributor

TD-er commented Mar 30, 2019

I guess this can also lead to issues when a connection attempt is retried, when the ESP misses the acknowledgement of a successful connection attempt?

@d-a-v
Copy link
Collaborator Author

d-a-v commented Mar 30, 2019

I guess this can also lead to issues when a connection attempt is retried, when the ESP misses the acknowledgement of a successful connection attempt?

If I understand the question, before of after this patch, nothing is changed about TCP connection attempts.
A connecting request is uniquely identified by a quadruplet (source ip & port, destination ip & port). They remain constant during all the time of this connection, and missed ACKs for SYN are retransmitted on timeout some times (like any other ACKs) before giving up.
Source ports are meaningless (except for identifying an incoming packet and their uniqueness among all tcp connections on an host) and supposed to be random. This PR only changes the way the source port is worked out for every new connection.

@TD-er
Copy link
Contributor

TD-er commented Mar 30, 2019

I understand that it does so on a new connection.
I was thinking about whether the current situation could also lead to the issues you try to fix with this patch.
For example, what if the ESP tries to make a connection and it assumes the connection is not successful, for some reason (e.g. missing ack packets)
In the current situation, could it then also lead to these situations?

The use case I have in mind is that I've seen several issues when connecting to a MQTT broker.
It may retry almost indefinitely, but it may succeed when you stop trying for a while and make a new attempt. Another "fix" (which introduces other issues) was to use a new client ID when connecting to the broker.

@d-a-v
Copy link
Collaborator Author

d-a-v commented Mar 30, 2019

It may retry almost indefinitely, but it may succeed when you stop trying for a while and make a new attempt.

Who is, or how is it retrying,

  • your sketch with a new WiFiClient for each retry ?
    before or after this PR, a new source port is used
  • Let the same WiFiClient continuously try (so the timeout is too long) ?
    This behaviour is not addressed nor changed by this PR
  • a reboot ?
    After a reboot, before this PR, the same source port was used (leading to an issue described in OP's links and second message)

but it may succeed when you stop trying for a while and make a new attempt

That looks like a firewall policy. They check for SYN frequency and block the originating IP address for a short (=lucky) or long (=hair-pulling debugging session) period of time. It's an anti-DOS protection.

@TD-er
Copy link
Contributor

TD-er commented Mar 31, 2019

Hmm, I may have to look at the Domoticz Raspberry Pi image then to see if it uses some kind of filtering like that. Never thought of that option.
Not sure if it is making a new instance of WiFiClient on each reconnect attempt, but I think it does.

I will look into the firewall option.
If that does rise new questions/issues I will open a new issue and stop polluting your PR :)

@d-a-v d-a-v merged commit a1796f4 into esp8266:master Apr 5, 2019
@d-a-v d-a-v deleted the lwiprandomport branch April 5, 2019 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants