-
Notifications
You must be signed in to change notification settings - Fork 578
Issue with driver not honoring the server keep-alive timeout settings #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@alexey-milovidov Is this something you can review ? |
@amitsharma10, great work! Would you mind to change target to develop branch? I can do more test there and make it part of the upcoming |
@zhicwu I have updated the target to develop branch, please go ahead with review. |
@zhicwu Please let me know if you need any help with testing/review. |
Thank you @amitsharma10 and apologize for the late response. I'm good for the change. Unit test should be added in general, but I think we're good as it's been tested in your environment for a while. Moreover, I'll start to create a new branch to refactor the code by removing unnecessary dependencies, replacing httpclient with light4j, and maybe multi-protocol support(http, grpc and native). |
It seems it does not solve the issues I am still getting: |
Definitely not the message you want to see when you wake up in the morning ;) This is probably the most critical issue needs to be fixed. I'm working on performance/stress test and I'll look into this today. |
Would you mind sharing your test case? This error doesn’t seem to be caused by the code in this PR |
I was kind of hoping this PR can fix or mitigate the @den-crane, you may give this a shot and see if it works at your end. I'll see if I can find a way to reproduce the issue and add unit test accordingly. |
me too. Another phenomenon is that my server's RECV-Q is often very large and i change the httpclientbuilder with #515 changed code |
@gj-zhang , did you try 0.2.5-SNAPSHOT which contains a workaround(see PR #540)? It seems working at my end. Anyway, I'll try to implement tests to further validate the fix. |
with reuse strategy and without retry |
hi, do you resolve this problem? |
The pull request is WIP as we should retry request only when the sql is idempotent. I'm working on a loose parser using JavaCC(with tailored grammar from the ANTLR4 version), which can be used to check idempotency of given query. I'll try to make it in a day or two. On other hand, I noticed a similar error happens randomly in batch inserting during regression test, which won't be fixed by the PR. Anyway, meantime(before 0.2.5 release), if your work is just about query(instead of mutation/DDL), I'd suggest you to try the snapshot build, which works well in the past weeks at my end. |
We faced an issue where the client was getting broken pipe error while sending a request to insert data into clickhouse. We figured that the server had <keep_alive_timeout>3</keep_alive_timeout> and client (driver) was trying to keep the connection alive for 30 seconds (default).
Ideally, client shouldn't try to use it's own settings for keep-alive when server responds with a different value. If server doesn't respond with a value of timeout, TCP defaults should be used.
The chances of getting this issue are more when data is written as an InputStream as HttpClient can't retry that request due to stream processing.
After we fixed and deployed this change locally, we haven't received any broke pipe issues.