Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to test for reliable server #524

Closed
neilh10 opened this issue Nov 18, 2021 · 4 comments
Closed

How to test for reliable server #524

neilh10 opened this issue Nov 18, 2021 · 4 comments

Comments

@neilh10
Copy link

neilh10 commented Nov 18, 2021

The https://monitormywatershed.org architecture has some inherent occasional stability/reliability issues.
Since Nov13 it has been exhibiting a lot of date losses from POSTs as seen by test06 - probably because of maintenance work.
The test system https://monitormywatershed.org/sites/tu_rc_test06/
through a "reliable delivery" algorithm only records a successful post when a 201 is received, so from past experience, data losses as seen by downloading the .csv file have been associated with the server.

https://staging.monitormywatershed.org/ is setup to sync internal databases with monitormywatershed.org, and this is great setup for testing some of the new presentation features, and much appreciated.

I believe staging.monitormywatershed.org also provides a new method for ensuring that POSTs are guaranteed to be successful.

What type of test plan do you see could be used to verify test the new algorithms for no loss of POSTs.

It might not be possible to do this until the staging transitions to production, but just thought I would check in and ask the question.
Related question - #485

@neilh10
Copy link
Author

neilh10 commented Dec 17, 2021

The release v0.12.x is now production on the AWS servers and live for POSTs to data.enviroDIY.org.
Visualizations of these readings is through https://monitormywatershed.org/ (MMW)
I've updated #485 (comment)

I'm looking to try and defined a test so that it can characterize current conditions

My specific purpose in characterizing this process, is what I can tell hydrologists I work with as to whether they should continue downloading data from the depth gauges Insitu LT500s that they visit in the field.
Currently they reliably retrieve data by "boot-net" - that is they walk up to the site, retrieve the data.
The process of retrieving the data from a Mayfly station, is unplug the rugged insitu cable from an insitu wire rat-tail connector, plugin a USB cable into the sensor end, connect over USB, down-load the data set, unplug the USB cable, and then restore the connection to the Mayfly rat-tail.
This is sporadic, and has its own sources or errors and equipment malfunction, but has been working reliably for them. The Insitu outdoor cable is designed to be rugged with an o-ring gasket, but is still liable to failure, and require replacement. Long term seems like its better to download reliably from MMW, and then only visit the site when required by rating curves.

The objective would be that retrieving data from the MMW would have as good a confidence as retrieving the data by boot-net.
This uses my forked version of ModularSensors that include a reliable delivery - I designate it as azModularSensors - https://github.com/neilh10/ModularSensors/releases/tag/v0.25.0.release1_200906
The term "reliable delivery" is from functionality discussed in these issues

This reliable delivery is configured through a local ms_cfg.ini file that is read on startup by azModularSensors.
The following reliable delivery parameters can be changed, without having to rebuild the load

[COMMON]
LOGGING_INTERVAL_MINUTES=2 ; aggressive testing, typically though every 15minutes

[NETWORK]
COLLECT_READINGS=5 ;Number of readings to collect before send 0to30
POST_MAX_NUM=100 ; Max number of readings to POST after 
SEND_OFFSET_MIN=0 ;minutes to wait after collection complete to send

[PROVIDER_MMW]
CLOUD_ID=data.enviroDIY.org
TIMER_POST_TOUT_MS=7000; Gateway Timeout (ms)
TIMER_POST_PACE_MS=3000; 

Initial testing is to define a baseline that can be used for dual characterization of a base ModularSensors release, and can appear as a "normal load" to the server by a sensor node.
The simplest test scenario is to start a test on Friday 5pm and then inspect it Monday 9am - or 64hours.
If 30 sensor readings are created every hour this is 1920 readings.
The test for success, providing the server is up and there is a reasonable transmission medium, is that the reliable delivery algorithm ensures the readings are transferred to the servers.
If there is one failure then this represents a reliability of 99.948%. If there are no failures then this represents a reliability of at least 99.948 (say 99.95%)
This would allow the base ModularSensors to go through an accelerated testing, which if extrapolated from accelerated 2min to more standard 15minutes represents 20days of elapsed time.
The 30messages an hour represent the fastest that the Mayfly can be operated at that represent the normal operation of the Mayfly's internal clock mechanism.

Periodically, when the above targets are met a longer soak test lasting 3months could be initiated.

@aufdenkampe
Copy link
Member

@neilh10, think it is worth changing your host to monitormywatershed.org, and also removing any references to our old IP address, as it seems that maybe even your newer sites are somehow being affected by maintenance on LimnoTech servers and not fully benefiting from AWS 99.999% uptime.

See my explanation here.
#542 (comment)

@neilh10
Copy link
Author

neilh10 commented Dec 28, 2021

Sorry responding to aufdenkampe, this was posted in a number of places, I did change all the two device testing destination to monitormywatershed.org, and there was never any hardcoded IP addresses in there.

Seems to me that the current state of the server architecture as in #543 , #541, there should be no testing against the production server monitormywatershed.org unless specifically requested.

@aufdenkampe
Copy link
Member

@neilh10, I think we addressed most of these issues with our Jan. 6 v0.12.1 Hotfixes & Tweaks to AWS release.

I'm closing this issue. Feel free to reopen if feel any of these issues remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants