Initial write up of restore / desync troubles. #30

timbru · 2024-09-23T15:11:09Z

Needs more thought...

how normative can we / should we be wrt publishing CAs?
we should probably also say that the publication server MUST monitor their own RRDP repository against hash regressions as per https://datatracker.ietf.org/doc/draft-ietf-sidrops-rrdp-desynchronization/

…ervice.

tomhrr · 2024-09-24T06:06:06Z

draft-ietf-sidrops-publication-server-bcp-01.md

+synchronisation events where it issues an [@!RFC8181] list query even if it has
+no new content to publish. Because this interaction requires that the
+Publication Server signs an [@!RFC8181] list reply, this operation can be costly
+for Publication Servers that serve a large number of publishers. Therefore,


Why is this costly? Is it possible that this is a particular problem with Krill because it's generating a new EE certificate for each transaction, as it does for the RFC 6492 service?

I think it's mostly okay, but in case of 1500+ CAs (which happens) it does get rather chatty..

We spoke about this in the past and then indeed there was a problem in Krill, but that was something else - there was an inefficiency in the implementation that meant it spent lots of time on deserializing state - that has been fixed.

Nonetheless, 1500 signed responses every minute could be significant and because the publication protocol does not (yet) support rate limiting there would not be a clean way to tell clients to back off if needed.

So, therefore I thought... well getting the notification file just hits the CDN, so there is no problem in hammering that from delegated CAs, which will likely be fewer in number than RPs anyway.

To be clear: I am open to suggestions :) How about:

Because this interaction requires that the Publication Server signs an [@!RFC8181] list reply, this operation can be costly for Publication Servers that serve a large number of publishers.

New:

For Publication Server that serve a large number (1000s) of publishers this operation could become costly, and unfortunately the [@!RFC8181] protocol has no clean support for rate limiting.

Would that work?

Yep, I think mentioning the rate limiting consideration is a good idea. The suggested text above sounds fine to me.

tomhrr · 2024-09-24T06:10:12Z

draft-ietf-sidrops-publication-server-bcp-01.md

+Notification file the CA MAY perform this verification every minute.
+
+If the expected files are not found to be published within a reasonable time
+(let's say 5 minutes?), or if the CA recognises that there is a regression in


I'm not sure about a suggestion like this. There will typically be a few moving parts between the publication server and the RRDP service accessed by the client, and it may not be that often that repeating the updates will fix the problem. Maybe instead something like "expect to see objects within five minutes, and if you don't, please contact the publication service operator"?

As mentioned above for context there, my idea was that hammering the CDN every minute would be fine.

wrt published within a reasonable time

But, you are right of course. There are moving parts in between and the delay can even be 10 minutes or longer dependent on setup. And, unfortunately (another thing for my publication++ wishlist) there is no indication to the client about how long is normal here (e.g. the RFC8181 success reply could have included a hint).

So, if we include a time here then we should probably be conservative and use something like 15 minutes? But note, that if the publisher does a full resync after 5 minutes, this is probably not an issue. It would just be another list request with a reply that tells the publisher that everything is there.

wrt CA recognises... regression

I think there is merit in monitoring the repository for changes regardless of recent publication activity. It could help publishers discover that the repository has regressed, in which cast "contact publication server operator" could just be, issue a warning and do another full synchronisation (RFC8181 list and publish diff).

Does this make sense? I will think about updated text, but suggestions are welcome of course.

So, if we include a time here then we should probably be conservative and use something like 15 minutes?

I think 5 minutes is better, because in the normal course of things that should be the outer limit of any delay. If it's regularly taking longer than that, then the setup on the server side needs to be improved.

But note, that if the publisher does a full resync after 5 minutes, this is probably not an issue. It would just be another list request with a reply that tells the publisher that everything is there.

Yep, that's a fair point. (I'd assumed that 'resynchronisation' meant 'delete existing objects and re-upload', which I think can happen inside a single request, so it may be worth documenting the resynchronisation process.)

timbru added 3 commits September 23, 2024 13:01

Make it clear that CDN is nor mandatory, recommend publication as a s…

80c19ab

…ervice.

Avoid term RP "software": not needed and possibly confusing.

2993ac2

Initial write up of restore / desync troubles.

94c6eb7

timbru marked this pull request as draft September 23, 2024 15:11

timbru linked an issue Sep 23, 2024 that may be closed by this pull request

Server restore with loss of data should be clarified #29

Open

tomhrr reviewed Sep 24, 2024

View reviewed changes

Base automatically changed from cdn-not-mandatory to main September 25, 2024 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial write up of restore / desync troubles. #30

Initial write up of restore / desync troubles. #30

timbru commented Sep 23, 2024

tomhrr Sep 24, 2024

timbru Sep 24, 2024

timbru Sep 24, 2024

tomhrr Sep 25, 2024

tomhrr Sep 24, 2024

timbru Sep 24, 2024

tomhrr Sep 25, 2024

Initial write up of restore / desync troubles. #30

Are you sure you want to change the base?

Initial write up of restore / desync troubles. #30

Conversation

timbru commented Sep 23, 2024

tomhrr Sep 24, 2024

Choose a reason for hiding this comment

timbru Sep 24, 2024

Choose a reason for hiding this comment

timbru Sep 24, 2024

Choose a reason for hiding this comment

tomhrr Sep 25, 2024

Choose a reason for hiding this comment

tomhrr Sep 24, 2024

Choose a reason for hiding this comment

timbru Sep 24, 2024

Choose a reason for hiding this comment

tomhrr Sep 25, 2024

Choose a reason for hiding this comment