-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Using i2c slave and OTA together makes ESP8266 crash. #6875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The only problem I see is the long delay due to the 9600 baud serial clock you're using. Boost that to 115.2K and your exceptions may magically disappear. I suspect you're spending too much time with those two slow print statements, and it's crashing because of that. Personally, I'd set a flag in the callback after I'd received the data, and then use the flag to print it out from inside loop() or some other less-critical function. Asynchronous callbacks have looser restrictions that ISRs do, but that depends on WiFi traffic. When you're in the middle of an OTA download, that's a heck of a lot of traffic. BOOM! edit: even worse is the 12KHz I2C clock, as that's spending a long time inside the Slave section. I don't see a simple way around that in your setup, as it probably won't run above 50KHz. If it's caused merely because the ISR in the slave code is fired, then maybe the OTA code should disable the other interrupts while critical sections are in process. It's better to lose an I2C transaction than your update payload. The OTA update ends with a reset, so you'd lose it in any event. I don't know if there's a way to un-hook the I2C interrupts from the sketch side once you've recognized an OTA is in process. |
The Serial.prints are there only because they are so in the example code I used to create the MVCE. The speed is 9600 as I didnt bother to figure out how to set the Mac terminal serial speed so I could capture the stack trace with a screen command. in the "Production" code I have the same crash problem and there I only assigns a value to a volatile declared variable in the receiveEvent. I just tried in the slave code to remove all Serial.print and compile it for 160MHz but the result was the same, crash, so it seems like it is like You print in the last sentence. As You also write it's no problems losing the i2c communication during a OTA as it will reset anyway. It has been working with 2.5.x so the question is if the problem nows lies in changes in the i2c or the OTA code. Anyway, The i2c seems a bit more stable now. I have a reset in the "production" code that reboots the slave when loosing connection and when on 2.5.x it was rebooting on average once a day, now with 2.6.x i would say it's on average once a week so thats a big step forward. If it was possible in the OTA to disable other interrupts or maybe have some way to do something like wire.end() I suppose this problem could be solved. Unfortuneately I'm confident enough to write that code. Edit: typos |
@carpelux You're confident enough, as you're already tweaking the I2C core file. :-) detachInterrupt(4);
detachInterrupt(5); Here's what the finished insert looks like, in case the line numbers changed between 2.6.2 and the current git: //this needs to be called in the loop()
void ArduinoOTAClass::handle() {
if (_state == OTA_RUNUPDATE) {
detachInterrupt(4);
detachInterrupt(5);
_runUpdate();
_state = OTA_IDLE;
} That would be cleaner if you extern'd two variables that match your SCL and SDA pins, but it's a fast test with only one file to tweak. If the OTA fails, you'll need to re-init the I2C slave routine with Wire.begin(SDA_PIN, SCL_PIN, I2C_SLAVE); again as the interrupts are now toast. |
Thanks a lot. A quick test showed that it works fine now. I will get back with the result as soon I'm back, but it looks really promising. |
I made a mistake in the insert above, it should have been detachInterrupt(digitalPinToInterrupt(interruptPin));, so: detachInterrupt(digitalPinToInterrupt(4));
detachInterrupt(digitalPinToInterrupt(5)); I'm surprised it worked with the original syntax. Chalk it up to a newbie mistake and using Google to find examples. ;-) Google finds a LOT of bad examples. A cleaner solution would be to replace in your slave code: extern const int SDA_PIN = 4;
extern const int SCL_PIN = 5;
//#define SDA_PIN 4
//#define SCL_PIN 5 and then change the tweak in ArduinoOTA.cpp (in case you change the port pins): detachInterrupt(digitalPinToInterrupt(SDA_PIN));
detachInterrupt(digitalPinToInterrupt(SCL_PIN)); I checked out the slave code already and it compiles and runs, just about to check your whole test case. By the way, the master side can be compiled at 80MHz CPU, it's the slave side that needs 160MHz. |
That simpler solution just above worked, but may have blown up in your face down the road. There's a more definitive fix (PR #6898) in place which should be out for the 2.6.3 release. Since there were so many small changes to the code, I want to re-run a bunch of tests this weekend before blessing it. |
After a week away I got back home late at night yesterday. It is impressive work that has been done while I was away. I downloaded the core_esp8266_si2c.cpp from the PR and set it to work in my production code. It has been running fine now for about 10 hours, including a few OTA updates. A big thanks for all effort @Tech-TX and to all other involved in this great project! |
It's definitely a group effort, as I suck at C++ and I'm only barely functional at C. All of the heavy lifting has been done by the other two gentlemen. I'm doing the grunt work (it's what I'm good at) and trying to break things. 😇 I'm pretty good at breaking things. edit: which PR are you using? The first one worked, but may still have issues down the road if the compiler optimizes incorrectly. #6898 is the one that's going into the release code, as I haven't been able to break it yet. I should finish my testing within a couple more hours. |
Yes, It's #6898 is the one that I'm using and it still works perfect so I am pretty sure my problem is solved. Once more, big thanks to You all! |
Excellent! For my stress test (where I showed the logic analyzer capture above) I was hitting the bus pretty heavily, and I did around 100 OTA uploads with no errors on this latest PR. Before the fix OTA uploads always died at around 5 to 10% completion since parts of the slave code were in Flash. If your slave is controlling something delicate, I'd verify each received slave packet with a CRC or something so that you don't send your (whatever) off the rails when the I2C packet(s) get mangled during OTA updates. |
Closing via #6898 |
@carpelux you might want to play with the latest git from today with the new core_esp8266_si2c.cpp file. We've just tweaked the clock generation to go smoothly down below where you're running it. You won't need to jiggle the core file to get a 12KHz bus clock any longer. 😄 It's not perfect, but it's within 1% of the programmed bus frequency down at the range you're using it. It gets more choppy on frequency accuracy above 100KHz because we're bit-banging the clock from a loop timer, but it's about 15 times more accurate than it was previously at the high end. For your program just enter your desired speed in Wire.setClock(freq) after Wire.begin. 12000 should mimic what you were doing before. The 160MHz compile recommendation for the slave hasn't changed, as it's limited by how fast we can respond to interrupts. With a slow bus clock (below ~36KHz) you can compile the slave for 80MHz CPU. |
Basic Infos
Platform
Settings in IDE
Problem Description
I am interfacing a ESP8266 as a slave trough I2C with a old legacy system as master in order to read parameters from the system. It is a slow I2C communication, 10 KHz and I have been running with the ESP8266 for about one year with core versions starting from 2.5.0 an up.
It has been working resonable good, with some glitches in the communication where it was hanging once in a while. I solved that hanging by simply reboot.
After 2.6.0 the communication seems to be more stable but there is a new issue. When I start to do a OTA update the ESP chrashes as soon there is some data on the SDA line.
In order to do debugging I set up two wemos D1 mini's. They were connected to each other like this:
GPIO 4 (pin D2) <----> GPIO 4 (pin D2)
GPIO 5 (pin D1) <----> GPIO 5 (pin D1)
Ground <----> Ground
I tried to use the ESP826 Wire examples: master_writer and slave_receiver combined with BasicOTA but couldn't get them running properly. However, If I used 160MHz clock it worked.
I then did a change in core_esp8266_si2c.cpp to the Twi::setClock function adding some longer delays and thereby managed to get the slowest clockspeed down to about 12KHz in order to mimic the behaviour of the legacy system.
After that I also could run the sketches on 80MHz.
The line 108 in core_esp8266_si2c.cpp that is causing Exception 0: Illegal instruction says:
I hope I supplied all relevant inf and that it wasn't to long-winded.
MCVE Sketch
master_writer.ino :
slave_receiver.ino
core_esp8266_si2c.cpp
Debug Messages
Decoded stack trace:
The text was updated successfully, but these errors were encountered: