How to test for reliable server #524

neilh10 · 2021-11-18T04:30:27Z

The https://monitormywatershed.org architecture has some inherent occasional stability/reliability issues.
Since Nov13 it has been exhibiting a lot of date losses from POSTs as seen by test06 - probably because of maintenance work.
The test system https://monitormywatershed.org/sites/tu_rc_test06/
through a "reliable delivery" algorithm only records a successful post when a 201 is received, so from past experience, data losses as seen by downloading the .csv file have been associated with the server.

https://staging.monitormywatershed.org/ is setup to sync internal databases with monitormywatershed.org, and this is great setup for testing some of the new presentation features, and much appreciated.

I believe staging.monitormywatershed.org also provides a new method for ensuring that POSTs are guaranteed to be successful.

What type of test plan do you see could be used to verify test the new algorithms for no loss of POSTs.

It might not be possible to do this until the staging transitions to production, but just thought I would check in and ask the question.
Related question - #485

neilh10 · 2021-12-17T23:44:13Z

The release v0.12.x is now production on the AWS servers and live for POSTs to data.enviroDIY.org.
Visualizations of these readings is through https://monitormywatershed.org/ (MMW)
I've updated #485 (comment)

I'm looking to try and defined a test so that it can characterize current conditions

My specific purpose in characterizing this process, is what I can tell hydrologists I work with as to whether they should continue downloading data from the depth gauges Insitu LT500s that they visit in the field.
Currently they reliably retrieve data by "boot-net" - that is they walk up to the site, retrieve the data.
The process of retrieving the data from a Mayfly station, is unplug the rugged insitu cable from an insitu wire rat-tail connector, plugin a USB cable into the sensor end, connect over USB, down-load the data set, unplug the USB cable, and then restore the connection to the Mayfly rat-tail.
This is sporadic, and has its own sources or errors and equipment malfunction, but has been working reliably for them. The Insitu outdoor cable is designed to be rugged with an o-ring gasket, but is still liable to failure, and require replacement. Long term seems like its better to download reliably from MMW, and then only visit the site when required by rating curves.

The objective would be that retrieving data from the MMW would have as good a confidence as retrieving the data by boot-net.
This uses my forked version of ModularSensors that include a reliable delivery - I designate it as azModularSensors - https://github.com/neilh10/ModularSensors/releases/tag/v0.25.0.release1_200906
The term "reliable delivery" is from functionality discussed in these issues

This reliable delivery is configured through a local ms_cfg.ini file that is read on startup by azModularSensors.
The following reliable delivery parameters can be changed, without having to rebuild the load

[COMMON]
LOGGING_INTERVAL_MINUTES=2 ; aggressive testing, typically though every 15minutes

[NETWORK]
COLLECT_READINGS=5 ;Number of readings to collect before send 0to30
POST_MAX_NUM=100 ; Max number of readings to POST after 
SEND_OFFSET_MIN=0 ;minutes to wait after collection complete to send

[PROVIDER_MMW]
CLOUD_ID=data.enviroDIY.org
TIMER_POST_TOUT_MS=7000; Gateway Timeout (ms)
TIMER_POST_PACE_MS=3000;

Initial testing is to define a baseline that can be used for dual characterization of a base ModularSensors release, and can appear as a "normal load" to the server by a sensor node.
The simplest test scenario is to start a test on Friday 5pm and then inspect it Monday 9am - or 64hours.
If 30 sensor readings are created every hour this is 1920 readings.
The test for success, providing the server is up and there is a reasonable transmission medium, is that the reliable delivery algorithm ensures the readings are transferred to the servers.
If there is one failure then this represents a reliability of 99.948%. If there are no failures then this represents a reliability of at least 99.948 (say 99.95%)
This would allow the base ModularSensors to go through an accelerated testing, which if extrapolated from accelerated 2min to more standard 15minutes represents 20days of elapsed time.
The 30messages an hour represent the fastest that the Mayfly can be operated at that represent the normal operation of the Mayfly's internal clock mechanism.

Periodically, when the above targets are met a longer soak test lasting 3months could be initiated.

aufdenkampe · 2021-12-20T16:40:39Z

@neilh10, think it is worth changing your host to monitormywatershed.org, and also removing any references to our old IP address, as it seems that maybe even your newer sites are somehow being affected by maintenance on LimnoTech servers and not fully benefiting from AWS 99.999% uptime.

See my explanation here.
#542 (comment)

neilh10 · 2021-12-28T22:23:51Z

Sorry responding to aufdenkampe, this was posted in a number of places, I did change all the two device testing destination to monitormywatershed.org, and there was never any hardcoded IP addresses in there.

Seems to me that the current state of the server architecture as in #543 , #541, there should be no testing against the production server monitormywatershed.org unless specifically requested.

aufdenkampe · 2022-03-21T16:01:30Z

@neilh10, I think we addressed most of these issues with our Jan. 6 v0.12.1 Hotfixes & Tweaks to AWS release.

I'm closing this issue. Feel free to reopen if feel any of these issues remain.

This was referenced Dec 11, 2021

Excessive readings lost #535

Closed

On posting to new site getting 500 response #538

Closed

aufdenkampe added this to the Release 0.13 - Refactor code for performance & scalability milestone Dec 20, 2021

neilh10 mentioned this issue Dec 28, 2021

MMW slow response time via browser #543

Closed

This was referenced Dec 28, 2021

Device POST addr #546

Closed

DigiXBeeCellular possible timing issue EnviroDIY/ModularSensors#396

Open

aufdenkampe closed this as completed Mar 21, 2022

aufdenkampe modified the milestones: v0.13.0 release - Address Tech Debt, Improve TSV, v0.12.1 Hotfixes & Tweaks to AWS release Mar 21, 2022

neilh10 mentioned this issue Mar 17, 2023

General cleanup and refactoring in preparation for batch transmission EnviroDIY/ModularSensors#434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to test for reliable server #524

How to test for reliable server #524

neilh10 commented Nov 18, 2021

neilh10 commented Dec 17, 2021 •

edited by aufdenkampe

Loading

aufdenkampe commented Dec 20, 2021

neilh10 commented Dec 28, 2021

aufdenkampe commented Mar 21, 2022

How to test for reliable server #524

How to test for reliable server #524

Comments

neilh10 commented Nov 18, 2021

neilh10 commented Dec 17, 2021 • edited by aufdenkampe Loading

aufdenkampe commented Dec 20, 2021

neilh10 commented Dec 28, 2021

aufdenkampe commented Mar 21, 2022

neilh10 commented Dec 17, 2021 •

edited by aufdenkampe

Loading