-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binance: Retry keep alive. #2958
base: master
Are you sure you want to change the base?
Conversation
f436428
to
7b64763
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do something like this so that we can test.
Also, please look into the LIST_SUBSCRIPTIONS
websocket message to see if we can leverage that in a periodic check to catch discrepancies.
Looking into querying the subscriptions. |
Sorry I have not tested this on testnet/mainnet yet, but I should be able to use on testnet correct? |
Maybe? I haven't found testing mm on testnet very useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better technique would be if an orderbook has not received an update within a certain period, maybe 30 seconds, then resync the orderbook. Currently we will only know that an orderbook is stale every time checkSubs
is run, so it could remain stale for 30 mins.
This would result in tons of unnecessary reloads on low-volume markets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using this diff to test on mainnet. I've turned off my internet connection for 20 mins, then turned it back on, and the orderbooks immediately resync without hitting the new code here. The code looks good otherwise and works fine with the simnet test. Would be nice to be able to reproduce the issue though.
56ed9ba
to
e3cf2a8
Compare
@martonp does the order book desync look ok? https://github.com/decred/dcrdex/compare/1a223db8b8ae83b3e4da70b6ddd1adbf421ef027..e3cf2a8358ff0ac4a6e6e61507439a2b1478f56b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I disconnect for > 3 minutes, then reconnect, there seems to be a deadlock when calling VWAP
. I'm looking into it.
client/mm/libxc/binance.go
Outdated
@@ -288,6 +291,11 @@ func (b *binanceOrderBook) Connect(ctx context.Context) (*sync.WaitGroup, error | |||
if retry != nil { // don't hammer | |||
continue | |||
} | |||
case <-b.disconnectedChan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if for some reason the Websocket has not reconnected, but the snapshot request successfully completes, the orderbook will appear synced, but no messages will be coming through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the order book stuff doesn't happen over websocket? It's a separate secure http request? So we should stop the books from looking synced until the websocket looks up again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the last change https://github.com/decred/dcrdex/compare/7833026d42ae23acc577fd07548c97c2158e580f..adef014da6fd9c56e84197ce13a1a5e3a24d0147
So, not allowing sync again until the websocket tells us it is up. If the connect messages came out of order or we missed one that would be bad though, unsure if that can happen.
7833026
to
adef014
Compare
I don't think this test fail is due to my changes.... |
client/mm/libxc/binance.go
Outdated
// will not place new orders. | ||
connected := cs == comms.Connected | ||
bnc.booksMtx.RLock() | ||
defer bnc.booksMtx.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer bnc.booksMtx.RLock() | |
defer bnc.booksMtx.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops
client/mm/libxc/binance.go
Outdated
select { | ||
case reconnectC <- struct{}{}: | ||
default: | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I think this is a batch of changes I got from @buck54321
A natural reconnect shouldn't need a new connection I guess.
There were a few issues I fixed.. consider these changes: 68008b3 You can test using the I think another good change would be to trigger a reconnect if the list subscriptions request fails, because this probably means that the connection is broken. |
d2dc7fb
to
b97d591
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martonp I added your changes.
The changes in wsconn.go, why not just add the reconnect timer to keepAlive? As it is now we have to wait for a read or a write for the scheduled reconnect to happen. So, it will happen after we want it to, or not at all if we never have another read or write.
PingWait: time.Minute * 4, | ||
EchoPingData: true, | ||
ReconnectSync: func() { | ||
bnc.log.Debugf("Binance reconnected") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK not to trigger a full reconnect here, but maybe a call to checkSubs
would be prudent.
client/comms/wsconn.go
Outdated
type WsCfg struct { | ||
// URL is the websocket endpoint URL. | ||
URL string | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a separate argument to the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to URL to only be stored in one place, be able to be updated, and I didn't want to add a mutex for the config or a field in the config. If we don't update the URL, when there is a read error, and it reconnects, it will only reconnect to the first market that was subscribed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to URL to only be stored in one place
I don't really see this as a compelling reason to change the function signature for every consumer. You can still have an unexported field that you update. When we have structs for config settings, we generally try to keep everything contained to the struct.
I made some updated to that commit: a15e541
You're right, but I don't think the timer should be in |
b97d591
to
a15e541
Compare
Thanks those changes look good to me. Added them. They replaced the last commit. |
Untested. Attempting to solve a problem with stuck books.