[Confluence] Improve error handling in Confluence pagination #2610

parth-elastic · 2024-06-03T13:45:39Z

Closes #2394

Enhanced error handling mechanism to raise the error if exception is more than 30%.
The api_call function utilizes retry and re-raises errors if they are not handled.

Checklists

Pre-Review Checklist

Changes Requiring Extra Attention

Security-related changes (encryption, TLS, SSRF, etc)
New external service dependencies added.

Related Pull Requests

Release Note

api_call method utilizes retry and re-raises errors if they are not handled.
Implemented a constant API_FAILURE_THRESHOLD set to 10% of the total number of indexed documents. An error will be raised if the exception rate surpasses this threshold.

seanstory

I wish you'd started a discussion about how you'd go about solving this, before making a whole PR. I feel bad that you put in this work, but I don't agree with the approach taken here. I'd prefer that you revisit the Acceptance Criteria on the linked ticket, and take the approach suggested there.

connectors/sources/confluence.py

parth-elastic · 2024-06-04T09:21:50Z

We appreciate your perspective and understand your preference for discussing the solution before implementing it. According to the Acceptance Criteria, current approach involves utilizing retryable for error handling and re-throwing errors if they are not resolved. Furthermore, a mechanism is implemented to manage errors in a manner that prevents a large synchronization from failing due to minor data issues. This is achieved by establishing a threshold of 30% for the number of indexed documents before triggering an error, striking a balance between the imperative of quick error detection ("fail fast") and the necessity to prevent catastrophic failures in large synchronization processes.

connectors/sources/confluence.py

…pagination-swallow-error

moxarth-rathod · 2024-07-16T11:32:34Z

buildkite test this

parth-elastic · 2024-07-18T10:03:10Z

@seanstory The PR is ready for review

moxarth-rathod · 2024-08-09T05:21:32Z

buildkite test this

moxarth-rathod · 2024-08-09T05:24:17Z

@artem-shelkovnikov @seanstory we have implemented the changes, can we have another round of review?

…pagination-swallow-error

seanstory

Sorry for the delay, for some reason your comments did not show up in my github notifications.

This PR is still not in a mergable state. There are still numerous comments from the last review that were not responded to or closed, and it's unclear if they were considered at all.

This implementation raises more exceptions in some spots, swallows exceptions in other spots, but other than one place does not handle the errors at all. It is unclear to me how this benefits the end user or solves the ticket it is linked to.

If it is still unclear how to move forward with this, we will close this PR and implement it internally.

connectors/sources/confluence.py

seanstory · 2024-09-13T19:40:10Z

connectors/sources/confluence.py

+        known_errors = {
+            ServerConnectionError,
+            ClientResponseError,
+            ClientPayloadError,


Why is this in the list if there's a _handle_client_payload_errror() function?

To re-raised the exception if error is not handled by _handle_client_payload_errror()

seanstory · 2024-09-13T19:40:45Z

connectors/sources/confluence.py

+    async def _handle_client_payload_error(self, exception):
+        retry_seconds = DEFAULT_RETRY_SECONDS
+        response_headers = exception.headers or {}
+        if "Retry-After" in response_headers:


What about client payload errors makes it special and the only thing that should look at a retry-after header?

Client payload errors often include a Retry-After header to manage rate limits and server load, indicating how long to wait before retrying, similar to how we handle it for OneDrive and SharePoint Online.

But why not other error types too? Like 429 response codes should pretty much always include a Retry-After header

You're right, 429 response codes should also include a Retry-After header. However, we already handle this logic in the _handle_client_errors(), which checks for the Retry-After header for 429 response code

seanstory · 2024-09-13T19:42:25Z

connectors/sources/confluence.py

+            except (
+                ServerConnectionError,
+                ClientResponseError,
+                ClientPayloadError,
+                Forbidden,
+                UnauthorizedException,
+                ThrottledError,
+                NotFound,
+                InternalServerError,
+                Exception,
+            ) as exception:


What's the purpose in having this whole list, but including Exception? Isn't this the same as

except Exception as exception:

?

…pagination-swallow-error

…tic/connectors into confluence-pagination-swallow-error

parth-elastic · 2024-09-18T12:08:35Z

This PR is still not in a mergable state. There are still numerous comments from the last review that were not responded to or closed, and it's unclear if they were considered at all.

I've reviewed and addressed all previous comments. If I've missed anything, please let me know.

This implementation raises more exceptions in some spots, swallows exceptions in other spots, but other than one place does not handle the errors at all. It is unclear to me how this benefits the end user or solves the ticket it is linked to.

Ideally we are raising the exception in api_call method and the unknown exception is being swallowed if threshold is within the criteria and handling the exception where the api is being called. If you have any confusion or doubt redarding this please let us know

If it is still unclear how to move forward with this, we will close this PR and implement it internally.

As per the certiera
We have tried to handled the exception in api_call whichever can we handled like ServerConnectionError,ClientResponseError,ClientPayloadError
If those exception are not handled we are re-raising them whereever the api_call() is called
And to balance the need to "fail fast" with the need to not crash a large sync because of a small data issue we are using the API_FAILURE_THRESHOLD and MIN_API_CALL variables which raise the exception if error percentage is more than 10%
If we are missing anything else please let us know

seanstory · 2024-10-18T15:00:29Z

I'm going to close this PR in favor of #2903. @moxarth-elastic @parth-elastic if you'd like to contribute to that branch (it still needs unit tests), please go ahead. Otherwise, when I get to it, I'll re-assign the parent issue internally, and close it.

[Confluence] Improve error handling in Confluence pagination

a46cd16

parth-elastic requested a review from a team June 3, 2024 13:45

github-actions bot added auto-backport v8.15.0.0 labels Jun 3, 2024

parth-elastic added confluence team:external and removed auto-backport v8.15.0.0 labels Jun 3, 2024

seanstory requested changes Jun 3, 2024

View reviewed changes

connectors/sources/confluence.py Outdated Show resolved Hide resolved

parth-elastic requested a review from seanstory June 4, 2024 09:25

Introduce threshold constants for error triggering

d9bc3d6

parth-elastic added the release_note label Jun 5, 2024

seanstory reviewed Jun 5, 2024

View reviewed changes

Addressed comments

55074f8

parth-elastic requested a review from seanstory June 10, 2024 13:03

parth-elastic added 5 commits June 10, 2024 19:03

Modify Logger and Fix pipeline

8d16015

MIN_API_CALL for error threshold handling

615f5d9

Merge branch 'main' of github.com:elastic/connectors into confluence-…

2841c50

…pagination-swallow-error

Merge branch 'main' of github.com:elastic/connectors into confluence-…

77957b4

…pagination-swallow-error

expanded error handling

32bb417

parth-elastic added 3 commits July 18, 2024 13:25

resolved conflict

475157d

revert changes

aa48872

revert change

4bdd8a3

parth-elastic added 4 commits August 8, 2024 12:01

resolve conflicts

2298450

resloved conflicts

e896eef

revert changes

f2e2e36

revert deleted files

1dda55f

parth-elastic and others added 3 commits August 13, 2024 11:24

Merge branch 'main' of github.com:elastic/connectors into confluence-…

c41b86e

…pagination-swallow-error

increase queue size

1e63d05

Delete lib64

b2eb195

seanstory requested changes Sep 13, 2024

View reviewed changes

parth-elastic added 4 commits September 18, 2024 11:03

Merge branch 'main' of github.com:elastic/connectors into confluence-…

f4f0ee5

…pagination-swallow-error

Remove redundant Exception

d57fee3

Merge branch 'confluence-pagination-swallow-error' of github.com:elas…

dababfd

…tic/connectors into confluence-pagination-swallow-error

fix lint

9c837c1

seanstory mentioned this pull request Oct 18, 2024

Stop swallowing errors when paginating #2903

Draft

7 tasks

seanstory closed this Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Confluence] Improve error handling in Confluence pagination #2610

[Confluence] Improve error handling in Confluence pagination #2610

parth-elastic commented Jun 3, 2024 •

edited

Loading

seanstory left a comment

parth-elastic commented Jun 4, 2024

moxarth-rathod commented Jul 16, 2024

parth-elastic commented Jul 18, 2024

moxarth-rathod commented Aug 9, 2024

moxarth-rathod commented Aug 9, 2024

seanstory left a comment

seanstory Sep 13, 2024

parth-elastic Sep 18, 2024

seanstory Sep 13, 2024

parth-elastic Sep 18, 2024

seanstory Sep 20, 2024

parth-elastic Sep 23, 2024

seanstory Sep 13, 2024

parth-elastic commented Sep 18, 2024

seanstory commented Oct 18, 2024 •

edited

Loading

[Confluence] Improve error handling in Confluence pagination #2610

[Confluence] Improve error handling in Confluence pagination #2610

Conversation

parth-elastic commented Jun 3, 2024 • edited Loading

Closes #2394

Checklists

Pre-Review Checklist

Changes Requiring Extra Attention

Related Pull Requests

Release Note

seanstory left a comment

Choose a reason for hiding this comment

parth-elastic commented Jun 4, 2024

moxarth-rathod commented Jul 16, 2024

parth-elastic commented Jul 18, 2024

moxarth-rathod commented Aug 9, 2024

moxarth-rathod commented Aug 9, 2024

seanstory left a comment

Choose a reason for hiding this comment

seanstory Sep 13, 2024

Choose a reason for hiding this comment

parth-elastic Sep 18, 2024

Choose a reason for hiding this comment

seanstory Sep 13, 2024

Choose a reason for hiding this comment

parth-elastic Sep 18, 2024

Choose a reason for hiding this comment

seanstory Sep 20, 2024

Choose a reason for hiding this comment

parth-elastic Sep 23, 2024

Choose a reason for hiding this comment

seanstory Sep 13, 2024

Choose a reason for hiding this comment

parth-elastic commented Sep 18, 2024

seanstory commented Oct 18, 2024 • edited Loading

parth-elastic commented Jun 3, 2024 •

edited

Loading

seanstory commented Oct 18, 2024 •

edited

Loading