Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digi WiFi S6b becomes unstable #347

Closed
neilh10 opened this issue Jan 5, 2021 · 21 comments
Closed

Digi WiFi S6b becomes unstable #347

neilh10 opened this issue Jan 5, 2021 · 21 comments

Comments

@neilh10
Copy link
Contributor

neilh10 commented Jan 5, 2021

I'm using the Digi Xbee XB2B-WFWT/XB2B-WFUT or WiFi S6B hybrid based off a 0.27.0 baseline .

After some time running it starts to become unstable - after sleeping and being woke up it doesn't connect with the local WiFi network.
Occasionally it seems to stop responding to +++ commands.
Sometimes this takes two days to show with these problems, and other times its within an hour.

It was working on the 0.25.0 base line. I have introduced changes to reduce and then remove the polling of the MetaData in case this is causing the issue, but it still shows.

I'm documenting this in case anybody else is seeing anything similar and this could be a discussion point.

It is a relatively "soft" issue, as it doesn't show straight away and there are a lot of complex elements in the network. It has been happening against two different WiFi gateways I have. It is POSTing data to the MMW.
After Mayfly reset, it reliably connects to the local network WiFi, gets the NIST time successfully, then its only later it isn't able to connect again.
It has been happening across three different Mayfly test systems (each with a WiFi S6B) I was putting on a 0.27.0 stability test.
I have introduced changes to make sure the WiFi S6B hybrid is HARD reset when it shows this, but hasn't appeared to make any difference.
I have moved to the 0.27.5 baseline to be compatible with the latest release, but there isn't much added functionality between 0.27.0 and 0.27.5

I was thinking my next step is to use the "LTE Bee Adapter Rev 1b" board, modified to power down when reset is active.

@SRGDamia1
Copy link
Contributor

I've seen this happening, but I haven't had time to troubleshoot it yet.

@neilh10
Copy link
Contributor Author

neilh10 commented Jan 5, 2021

Ok thanks - good to know (negative logic) - that I'm on to something :( I'm focusing on my SDI-12/LT500 issues 1st.

@neilh10
Copy link
Contributor Author

neilh10 commented Feb 16, 2021

The interface code, and on every time the S6B wakes forces the S6B to do a write to its persistent store.
Some persistent stores (eg EEPROM's technologies) often have limited writes - that is read many times, write a few times.
I asked a question about this
https://www.digi.com/support/forum/77803/wifi-s6b-wr-use-sparingly?show=78105#a78105

The WiFi S6B manual in the "AT Commands" about WR (Write)
"Use the WR command sparingly to preserve flash"
Ref Pg197
XBee Wi-Fi RF Module S6B User Guide 90002180 RevU 2019Aug

I'm using a library https://github.com/vshymanskyy/TinyGSM
and when it comes out of sleep it sets a few parameters ATAP0 ATGT64 ATCT64 followed by an ATWR .

What does "Use the WR command sparingly to preserve flash" mean.
If the values of a register are already set, is the ATWR smart enough to see that and not make changes in the flash?.

many thanks

and the answer came today by "mvut Veteran of the Digi Community "
It means that you should only issue the WR if the values you are setting have not already been written to flash. IE, you should read the values and ONLY if they are different should you set the value and write it.
So I'm just identifying this as part of the issue, as to why the Digi WiFi S6b hybrid has become unstable, and for my purposes essentially unuseable as radio.
I have used the Digi WiFi in various versions since 2010, and not seen this before. I had been hoping to use this configuration for a local monitoring station of a wifi portal, but delayed it until I can make it work reliably.

@SRGDamia1
Copy link
Contributor

I had never noticed that warning before in the XBee manual. That needs to be fixed in TinyGSM.

@SRGDamia1
Copy link
Contributor

I've modified TinyGSM to read most parameters from the XBee before attempting to write so it does not have to write to flash when no actual change has been made. It's not perfect, though. For some fields (like passwords) the current value cannot be read back from the XBee so we have no choice but to write it each time.

@neilh10
Copy link
Contributor Author

neilh10 commented May 15, 2021

Hey thanks. Will check it out.

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 2, 2021

Hi @SRGDamia1 could you point at what you did in TinyGSM - I can't find any updates, and I'm still having issues. Thanks

@SRGDamia1
Copy link
Contributor

I created the function changeSettingIfNeeded (https://github.com/vshymanskyy/TinyGSM/blob/a5a2ce34538955bb43d89df09fad4e30242be9c1/src/TinyGsmClientXBee.h#L1510) and use that everywhere to check if there are actually changes to be made before writing anything to flash. Unfortunately, it won't work for everything. Some settings, like the wifi password, cannot be read back from the XBee. So there's no way of knowing if you're making a change or not so you have to play safe and write the "new" value to flash. ModularSensors does check if the internet is connected before trying to modify the connection settings (and password) but if the XBee cannot connect fast enough, the password will end up being re-written to flash. There should not be any more flash for the IP address if it doesn't change or for any other non-password-like settings.

@SRGDamia1
Copy link
Contributor

Unfortunately.. if your S6B's flash has already become unstable because of excessive writing, no changes here will help with that. I doubt anything at all could be done to remedy it.

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 2, 2021

Thanks for the info. OK thats what I thought, it is using changeSettingIfNeeded()
Hmm flash is interesting for multiple writes, it is usually a write wear, but could be something else after running for a period.

I did test with my older Xbee S6B, and still saw the same instability, and didn't have the time to dig into it.
I will move to a new Xbee S6B, and also track the updating which I didn't have time to do.
I'm hoping to get some time to work on this soon :).
Again many thanks for doing changeSettingIfNeeded() It is the right thing to do to not be updating every communication cycle, which is what I think was happening.
Writing security once per reset sequence I would think would be fine.

@neilh10
Copy link
Contributor Author

neilh10 commented Jul 19, 2021

I ran the tests with a brand new Xbee S6B. No different.
Reverted back to an older S6B, and with some accelerated testing was running it at 2minute sleep interval.
Having an "unreliable link" (ideally able to manage the reliability) is good for my protocol "reliable delivery" testing to MMW.
At the 2minute sleep interval, it suddenly started deliverying to MMW.
As part of some stability testing this weekend, had it running at a 10minute interval.
Worked for the first few MMW POSTs, and then failed for the next 30hrs. Then 3am this morning - when my network router rebooted - its been very reliable since then.
So it does seem like its something to do with the TCP/IP link expecting to be there when it comes out of sleep.
I have been trying a TinyGSM lib adaptation to reset the destination (tearing down) when going to sleep, and then re-estabilishing when waking, but haven't managed to navigate through TinyGSM that well yet. Have had to shelve it to deal with a more urgent field reliability issue.

@neilh10
Copy link
Contributor Author

neilh10 commented Sep 13, 2021

Some recent testing with an update time set at every two minutes resulted in some successful POSTS.
This suggests it is the TCP/IP link from the XBEE to the destination MMW that is decaying. That is the when the Xbee WiFi goes to sleep, it doesn't tear down the connection. Then when it wakes it isn't doing enough to re-establish the connection.
As suggested in the code, it needs to torn down before it sleeps. Perhaps by setting it to 128.0.0.1.
However, there is also some caching of the target IP# in the tinyGsm code, and haven't quite figured out where and when the cache is referenced.

@neilh10
Copy link
Contributor Author

neilh10 commented Oct 1, 2021

I have a potential user/Environmental Scientist (Biologist) who has a stream location that has a WiFi access, to connect to a depth gage sensor.
I've been thinking that maybe the way to create a clean tcp/ip link is to be sure its torn down from the DigiXBeeWifi::disconnectInternet() by changing the IP to localhost 127.0.0.1
Some further investigation, and watching the stream of AT commands, there is an attempt at teardown after every post on TinyGsmClientXbee:modemStop() by changing the timeout to 0 "TM0" and then back to the default. It also forces a write update as well.
The issue is partly what is the model for accessing the Server - that discussion is identified in ODM2/ODM2DataSharingPortal#485
The model I'm attempting is to initialize the Xbee S6B WiFi once (after Mayfly Reset), and do a ATWR. Then afterwards coming out of Xbee sleep setup a link to MMW, attempt to validate it, and then do all the POSTS waiting for responses, and then be clear about tearing it down.

The communication model incorporates an application layer "delayed delivery" with an internal "reliable delivery".
That is having multiple new readings generated between each attempt at connnecting over WiFI to MMW. There is then an attempt to POST all of them. Sometimes it appears the first POST, fails with a 5 second timeout, but then the second attempted POST within the same TCP/IP link setup succeeds.

@neilh10
Copy link
Contributor Author

neilh10 commented Oct 13, 2021

I've got a fix for this and had it testing for the last couple of days successfully.
There are two parts 1) TinyGSM update 2) DigiXbeeWiFi
Would this be of interest for a PR for EnviroDIY/ModularSensors
If for TinyGSM then EnviroDIY/TinyGSM or vshymanskyy/TinyGSM
More details at neilh10#21

@SRGDamia1
Copy link
Contributor

I'm (finally!) looking at this. Could you explain what's going on with your fix? What are the caller ID offsets?

@neilh10
Copy link
Contributor Author

neilh10 commented Nov 25, 2021

Hey welcome back. ! and happy thanksgiving, trying to get this comment in before an evening meal.

Compare against
https://github.com/neilh10/TinyGSM/blob/rel1/src/TinyGsmClientXBee.h
ignore all the waitResponse() in my private branch as this is debugging and wouldn't be included in a PR.

It seems the issue is that S6B isn't tearing down the tcp/ip connection on sleep. So what i do is change the tcp/ip connection to local: before sleeping,
which should have been enough I think to solve the problem, but it isn't
so then I do a software reset
also tried some longer guard times for responses.

Anyway if you want a working S6B (not clear about that as I know there is a lot going on) then tell me which repo it should be against and I'll put a tested version against it.
I realize there is a lot of outstanding PRs, and finite time in the day, so thought to myself I'll wait until there is band width to deal with it before doing all the work on my par.

@neilh10
Copy link
Contributor Author

neilh10 commented Nov 10, 2022

Just to reference the above comment - on merging its broken the curated version of TinyGsm for Digi WiFi module that I have. Its taken me two hours to track it down.
The new code updated in LogerModemMacros.h is doing some highly unusual reprogramming to cope it seems with EspressifESP32 challenges, and possibly having knock on effects with all other modems.
IMHO the real issue is that there are a lot of communication modules supported and there is no curated list of modules that work on any release.
The matrix of parts that works for any project is a big challenge. There is no standard way that ModularSensors identifies regression tests, that users of ModularSensors can contribute to the testing process.
The DIgi WiFi module works for me as it has a variety of RF connectors for adding high gain antennas. WiFi/2.4Mhz is attenuated in the outdoors by moisture (leaves) and often benefits from a simple antenna extension for a signal boost to go further.

solution neilh10#125

neilh10 pushed a commit to neilh10/TinyGSM that referenced this issue Jun 20, 2023
neilh10 pushed a commit to neilh10/ModularSensors that referenced this issue Jun 21, 2023
neilh10 pushed a commit to neilh10/ModularSensors that referenced this issue Jun 21, 2023
@neilh10
Copy link
Contributor Author

neilh10 commented Jun 22, 2023

The core of the solution is to force the tcp//ip link to close after each POST, before the WiFi device is put to sleep to save power.
The WiFi acc/pwd are also only programmed at startup - this is done by having the device driver know the state of the modem, and only programming it once on power up. Thereafter its cached locally.

There are other debugging and housekeeping also included, and I haven't wanted to change it from the core of what I have tested over 2years.

The solution I have is based on two systems that have been working in the field.

Both of these systems have had other problems - disconnected solar and the virtual failure - however when these problems where fixed the upload over WiFi using reliable delivery/batch queue algorithms on my fork (neilh10#1) has worked for 100% of outstanding records.

https://monitormywatershed.org/sites/nh_LCC45/ transmitting reliably since at least since Jan14/2022
https://monitormywatershed.org/sites/TUCA_Sa01/ - this transmits to a comcast wifi point - the strongest signal I've seen from a WiFi SSID. It is currently stopped transmitted due to a solar wire pulled out May 28th - probably by a deer, before that it had ODM2/ODM2DataSharingPortal#658 - which was out for some 5 weeks. When it was fixed, it uploaded the fastest of all the systems.

I'm generating two PRs -
a) reference enviroDIY (develop) - 0.34.0 - and testing it locally, and submitting the files including two test setups. The Mayfly version should come up as 0.34.1-iss347a - #441
b) reference TinyGSM 0.11.5 - vshymanskyy/TinyGSM#731

The test setup only uses local sensors. As this is a verification test setup I also describe and track equipment.,
The software runs on real equipment, Mayfly 1.1 REv A - S/n unreadable - however initially programmed in EEPROM as sn[ MAYFLY22150 ]
XbeeWiFi internet comms with Digi XBee Wi-Fi Mac/Sn 409D8F65B4 HwVer 2730 FwVer 2026
Has a 2.2A LiP Adafruit battery
The WiF Nework access is Synology RT2600ac

For building I'm using the latest Pio on VSC. For working files in src - there is an alpha development environment that I configure in folder ModularSensors\a\DRWI_SIM7080LTE
Both the following tests manage the visibility of the AT cmd stream through the platformio.ini, thanks to the amazing StreamDebugger.h https://github.com/vshymanskyy/StreamDebugger

The platformio.ini is setup for development that is against against local source ModularSensors\src. Change to desired destination for TinyGSM

For files referencing to the lib ModularSensors use folder ModularSensors\sensors_test\DRWI_SIM7080LTE. Change to desired destination

For testing - I've let it run a couple of hours at two minute sampling and verified it gets a '201' - this isn't a long term test

Then I've turned off the WiFi signal, let it try a couple of times to find it, then turned it back on, and its continued transmitting. Loosing of course any attempted readings as Reliable Delivery is not implemented yet.

Hope I haven't missed anything. Happ to answer any questions - I'm out tomorrow 22nd at https://www.sensorsconverge.com/sensorsconvergecom/expo-highlights

@neilh10
Copy link
Contributor Author

neilh10 commented Jun 23, 2023

@neilh10
Copy link
Contributor Author

neilh10 commented Jun 29, 2023

Sara added to (develop) as part of
#445
I have extensively tested this in my fork, however been modified on accepting into (main) and I can't guarantee the traceability of my testing.

@neilh10 neilh10 mentioned this issue Jun 29, 2023
neilh10 added a commit to neilh10/ModularSensors that referenced this issue Jun 30, 2023
@neilh10
Copy link
Contributor Author

neilh10 commented Jul 6, 2023

A weekend of testing and no level2 modem driver issues.
Plan on posting testing data here ODM2/ODM2DataSharingPortal#661

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants