Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-visible warning for disconnected/not saving state #56

Open
1 of 2 tasks
cben opened this issue Oct 3, 2014 · 15 comments
Open
1 of 2 tasks

User-visible warning for disconnected/not saving state #56

cben opened this issue Oct 3, 2014 · 15 comments

Comments

@cben
Copy link
Owner

cben commented Oct 3, 2014

Yesterday I pushed broken CM/firepad combo such that pages loaded, you could edit but edits weren't saved! There was no user-visible indication this is the case.
But simply being offline gives a similar result.

  • Visible indication that firebase is offline and not syncing.
  • Close the loop verifying edits really made it into firebase (REST api?), such that unsaved edits — for any reason — for several seconds give the warning.
    It's not enough to check firepad's idea of version made it to the server; I suspect in this case there was an exception early in firepad such that it's version. Ideal might be comparing full text (not too frequently)?

Local storage backup wouldn't hurt too, but that's another story.

@cben
Copy link
Owner Author

cben commented Apr 29, 2015

Asked for advice at firepad group: https://groups.google.com/forum/#!topic/firepad-io/s1aTZXZniAM

cben added a commit that referenced this issue Apr 30, 2015
cben added a commit that referenced this issue May 19, 2015
Also initial "loading..." indication.

Not sure it's water-tight wrt bugs like #85.
@cben
Copy link
Owner Author

cben commented May 22, 2015

This is partially implemented, and works very nicely when typing and when internet goes offline/online.
But I haven't yet tested how it behaves in the face of bugs like #85 — I'm relying on Firepad knowing when it's synced but there might be a situation when Firepad itself isn't aware there was a change in the editor.
I think I should use headless firepad API to periodically (e.g. once a minute) do a clean text reconstruction of what is currently saved in firebase and compare to the editor.
Also, cross-compare with Firebase's connected status for more useful reports e.g. "unsaved because offline" vs "online but not saved — BUG?"...

cben added a commit that referenced this issue May 22, 2015
restore generic "Alpha quality" warning.
While #56 is not water-tight yet, the specific bug has been fixed.
@cben
Copy link
Owner Author

cben commented May 25, 2015

Bah. What I pushed was somewhat broken
(interleaved for extra karma with pushing a an extremely sloppy "redesign" breaking the [New document] button and the whole display on IE)

  • The status indication overlaps the upper part of the header line, interfering with clicking stuff like [bugs]. e0f1b36 fixed this for most of the width, but the transparent "saved" or "done" text is still there, making it very confusing why you sometimes can't click the underlying [bugs] link (if your window width is such that they overlap).
  • On IE8, not only the transitions don't animate (which I expected an am fine with) but opacity doesn't work at all, so "done" and "saved" texts are visible all the time.

I'm somewhat fond of the fading out to "done" / "saved" so looking at solutions that don't give them up. http://viget.com/inspire/css-pointer-events-and-a-pure-css3-animating-tooltip and http://www.greywyvern.com/?post=337 sound promising...

@cben
Copy link
Owner Author

cben commented May 25, 2015

BTW, pointer-events: none initially looked like a good direction, except it doesn't work on IE<11,
and also didn't work on my Chrome until I set visibility: hidden on #status_container.
Setting that also helped visibility transition on status to work, and visibility support is robust so I'm going with only that.

cben added a commit that referenced this issue May 25, 2015
Also actually hide them on IE8,9.
(part of collateral damage I inflicted dealing with #56)

Tested on Chrome and Firefox, will test more before deploying.
This really could use automated testing, but it's rather tricky :-(

http://stackoverflow.com/a/10352525/239657 and
http://www.taccgl.org/blog/css-transition-visibility.html
have good explanations why `visibility` transition does the right thing.

I discovered that I had to set `#status_container{ visibility: hidden }`
or it wouldn't work on Chrome; this also avoids #status_container
from blocking clicks, so reverting the less elegant `height: 0`.

I wasn't sure if I should also set `pointer-events: none`.  It makes sure
you can click through the status even when it's visible, but I'm not sure
that's a feature, it doesn't work on IE<11, and would mysteriously prevent
adding a hover tooltip to statuses.
@cben
Copy link
Owner Author

cben commented May 26, 2015

Back to the real goal: would current indication based on firepad's synced event catch #85? => NO.

(Reproducing #85 took some effort. Going back to the buggy CM-MJ wasn't enough; it seems firebase-debug.js not only reports but also tolerates exceptions better, and I only managed to reproduce with the non-debug firepad.js)

@cben
Copy link
Owner Author

cben commented Jun 3, 2015

  • If we detect this kind of error, report it somehow in firebase (with full base version + doc text?) so future analysis can at least know how much this happened (and retroactively merge?).

@cben
Copy link
Owner Author

cben commented Jun 3, 2015

Aha moment!
After sendOperation() called with invalid operation., firepad goes back to report that it's "synced".
I can — and should — fix this upstream.

cben added a commit that referenced this issue Jun 4, 2015
Periodically creates new Firepad.Headless, loads the text and compares
to current editor content.
Also checks Firebase's online status for better messages.

After a couple days of torturing, this seems to reliably detect
#85 borked firepad state and warn "Online but unsaved — BUG?".
Problems:
- False positives during typing (saw this) or incoming changes (theoretical).
  Needs debouncing after a period of quiet...
- Currently for easy testing runs the check *every second*!
  Should increase testSyncIntervalMs to 30-60sec before landing.
  - If only to avoid constant 1-5% CPU usage.
- Performance concerns: after a day of holding and torture-editing
  2 chrome tabs, things crawled to unusably slow stuck-after-every-press.
  Their combined memory usage grew to ~1.5GB! (plus ~1GB for their dev tools)
  And at some point CPU usage grew to 100%.
  - Near the end, I timed Headless getText to take >3 seconds!
    Initially it's ~1ms for existing Headless and ~230ms for new Headless!
    This explains 100% CPU - 1sec timer should be after prev timer finished
    execution, but alas getText() returns immediately.
  - AFAICT, snapshots were being taken so it's not op composition overhead.
  - It seems memory leak does not happen in current mathdown.net version
    (0f69d29).
@cben
Copy link
Owner Author

cben commented Jun 4, 2015

Progress: Headless firepad works. (For some reason it didn't for ~ a day when I initially played in the JS console but now it works both from script and console)
And it does seem to reliably warn when firepad gets borked and doesn't save.
There are race conditions leading to false positives, should debounce it after document stops changing.

But the much bigger issue is performance. As commented on 0e8a2d4 , I'm seeing a memory leak and slowdown;
after about a day of holding 2 tabs open (with a tiny document), memory grew to 1.5GB and CPU approached 100%. I think there was also high GC churn but didn't look at numbers.
[This is with new Headless -> getText -> comparison every 1sec, for debugging.]
Memory leak doesn't seem to happen in current prod version (stays ~50MB / tab, half of which is JS memory).
CPU is explained by Headless getText() latency growing to several seconds, with the timer piling up another Headless -> getText every second.

Is memory leak root cause of slowness? I did experince whole computer slowdown.
Is getText pileup root cause of leak? Easy to imagine - Headless objects would pile up - but unlikely, as leak is observable immediately in a fresh tab.
Are they independent problems?

Ideas:

  • Dig deeper - heap profile, CPU profile on getText, ...
  • Lower checking interval to something like 1 minute. This is necessary in any case, as I won't have perfect understanding of performance ramifications. But first let's debug 1sec which amplifies problems.
  • Reuse one Headless instance instead of new one every time? new Firepad.Headless returns instantly but getText on new instance takes >200ms (>400ms after significant edits), while on existing one it's ~1ms. On my paper-length doc, getText is 20ms but still 200600ms after new.
    But I'd like to keep the new as more bullet-proof.
  • Prevent pileup by scheduling new check only once getText completes?
    Do I still care about pileup if it's done once per minute? Maybe, I'm worried getText might take unbounded time when offline.
    Can this lead to losing one check and never resuming? OTOH, if it's possible for getText to get lost, we might in theory get stuck in false testedSynced = true forever. But here is a concrete worrying scenario: data in firebase gets really corrupt (never happened AFAIK, just unsaved) causing getText logic to crash every time.
    Idea: assume the worst, set testedSynced = false every time until getText returns? [perhaps better to have separate less worrying "Veryfying save..." message.]
  • Debouncing should mitigate a lot. Do checks only when we've been online, firepad believes it's synced, and no incoming changes for some time (5sec?). But let's debug root causes first while it's bad.

@cben
Copy link
Owner Author

cben commented Jun 4, 2015

Oh, dev consoles leak a lot as well. Quickly reach hundreds of megabytes.
This is easily explained by my very verbose logging, including JS objects and DOM elements.
Chrome and FF do throw away old messages when console is closed for a long time, but is that true for all browsers?
Perhaps it's time to default logging to off in prod (with function and/or URL param turn it on)?

@cben
Copy link
Owner Author

cben commented Jun 4, 2015

https://groups.google.com/forum/#!topic/firebase-talk/AEDekJ_B480 is interesting.
It's likely that new Firepad.Headless() registers new firebase callbacks that never go away (and keep the Headless itself alive and updating...)

@cben
Copy link
Owner Author

cben commented Jun 4, 2015

Just reducing CPU usage to say once/minute is not good enough.
Consider a person with 30 mathdown tabs — now one of the tabs wakes up (and largely swaps in) every 2 seconds, even if nothing changed.
I should either avoid checks entirely when nothing changed, dynamically backoff frequency, or both.

I should also check whether this whole feature noticably increases a tab's memory usage. At best, I'll still be constructing 2 copies of the text (codemirror getValue, firepad getText). But that's negligible compared to the full usage I see.

@cben
Copy link
Owner Author

cben commented Jun 4, 2015

Using heap profiling on the ~100K document.
Without headless checking it's 70MB on load, 170MB after renderingm math.
With checking, after being open for 1000s, it's 360MB.
During 50s heap allocations profile, the biggest allocation is the string with the full text (~200K due to UTF-16) => 7MB over 50s.

All these strings are retained! They're retained in TextOp.text, which is retained in TextOperation.ops[0] array, retained as FirebaseAdapter.document_, which is retained in fb.core.view.ChildEventRegistration under callbacks.child_added closure's self.
Over many allocations, these accumulate in an array of callbacks (I think on the history/ ref, not 100% sure).

Which is exactly what I assumed above - new Firepad.Headless() retains everything via callbacks.
Let's try with single Headless now. Then I'll mail Firepad folks (about fixing synced and about destroying a Headless — I also need that for #7 long lived server) and possibly proceed.

@cben
Copy link
Owner Author

cben commented Jun 4, 2015

Oh, and the accumulating callbacks also explain slowdown :-)

cben added a commit that referenced this issue Jun 4, 2015
See #56 for detailed investigation; in short every Headless registers
firebase callbacks so it never goes away, and that's expensive because
the Headless has a copy of the text (and eventually expensive in CPU).
@cben
Copy link
Owner Author

cben commented Jun 4, 2015

Brilliant - 8d031b7 fixed the leak according to Chrome allocations profiler :-)
Process RAM still grows to >400MB but is then garbage-collected back to 250MB, same as the mathdown.net non-checking tab stabilized. This happens every 3040s but remember that's with getText every 1s!
The stable heap dump usage mainly consists of 50
60MB of DOM and ~40MB of CodeMirror.Lines arranged in a b-tree. It's similar with checking and without.
[P.S. One would expect the partial DOM to be much smaller than CodeMirror's full b-tree, but this is a math-heavy document.]

@cben
Copy link
Owner Author

cben commented Jun 29, 2015

Looked at firebase analytics and saw this strange bandwidth graph:

uploads/e2c919eb-c06b-4042-9402-27b2f0a4b270/bandwidth30d.png

It's never been this sharp! And why is it suddenly flat?
These are exactly the days when I was testing this feature!
(and yes, it's sad that I alone can dominate Mathdown's bandwidth.)

I really hope that the high bandwidth was specific to the fresh-Headless-every second version and the one-Headless is saner. Need to test before landing.
(of course, it's not gonna be every second. Good that I'm stress-testing this way or I wouldn't notice it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant