Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Confluence] Improve error handling in Confluence pagination #2610

Closed
wants to merge 22 commits into from

Conversation

parth-elastic
Copy link
Collaborator

@parth-elastic parth-elastic commented Jun 3, 2024

Closes #2394

  • Enhanced error handling mechanism to raise the error if exception is more than 30%.
  • The api_call function utilizes retry and re-raises errors if they are not handled.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

Changes Requiring Extra Attention

  • Security-related changes (encryption, TLS, SSRF, etc)
  • New external service dependencies added.

Related Pull Requests

Release Note

  • api_call method utilizes retry and re-raises errors if they are not handled.
  • Implemented a constant API_FAILURE_THRESHOLD set to 10% of the total number of indexed documents. An error will be raised if the exception rate surpasses this threshold.

Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish you'd started a discussion about how you'd go about solving this, before making a whole PR. I feel bad that you put in this work, but I don't agree with the approach taken here. I'd prefer that you revisit the Acceptance Criteria on the linked ticket, and take the approach suggested there.

connectors/sources/confluence.py Outdated Show resolved Hide resolved
@parth-elastic
Copy link
Collaborator Author

We appreciate your perspective and understand your preference for discussing the solution before implementing it. According to the Acceptance Criteria, current approach involves utilizing retryable for error handling and re-throwing errors if they are not resolved. Furthermore, a mechanism is implemented to manage errors in a manner that prevents a large synchronization from failing due to minor data issues. This is achieved by establishing a threshold of 30% for the number of indexed documents before triggering an error, striking a balance between the imperative of quick error detection ("fail fast") and the necessity to prevent catastrophic failures in large synchronization processes.

connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
connectors/sources/confluence.py Outdated Show resolved Hide resolved
@moxarth-rathod
Copy link
Collaborator

buildkite test this

@parth-elastic
Copy link
Collaborator Author

@seanstory The PR is ready for review

@moxarth-rathod
Copy link
Collaborator

buildkite test this

@moxarth-rathod
Copy link
Collaborator

@artem-shelkovnikov @seanstory we have implemented the changes, can we have another round of review?

Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, for some reason your comments did not show up in my github notifications.

This PR is still not in a mergable state. There are still numerous comments from the last review that were not responded to or closed, and it's unclear if they were considered at all.

This implementation raises more exceptions in some spots, swallows exceptions in other spots, but other than one place does not handle the errors at all. It is unclear to me how this benefits the end user or solves the ticket it is linked to.

If it is still unclear how to move forward with this, we will close this PR and implement it internally.

connectors/sources/confluence.py Outdated Show resolved Hide resolved
known_errors = {
ServerConnectionError,
ClientResponseError,
ClientPayloadError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in the list if there's a _handle_client_payload_errror() function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-raised the exception if error is not handled by _handle_client_payload_errror()

async def _handle_client_payload_error(self, exception):
retry_seconds = DEFAULT_RETRY_SECONDS
response_headers = exception.headers or {}
if "Retry-After" in response_headers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about client payload errors makes it special and the only thing that should look at a retry-after header?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client payload errors often include a Retry-After header to manage rate limits and server load, indicating how long to wait before retrying, similar to how we handle it for OneDrive and SharePoint Online.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not other error types too? Like 429 response codes should pretty much always include a Retry-After header

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, 429 response codes should also include a Retry-After header. However, we already handle this logic in the _handle_client_errors(), which checks for the Retry-After header for 429 response code

Comment on lines 484 to 494
except (
ServerConnectionError,
ClientResponseError,
ClientPayloadError,
Forbidden,
UnauthorizedException,
ThrottledError,
NotFound,
InternalServerError,
Exception,
) as exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose in having this whole list, but including Exception? Isn't this the same as

except Exception as exception:

?

@parth-elastic
Copy link
Collaborator Author

This PR is still not in a mergable state. There are still numerous comments from the last review that were not responded to or closed, and it's unclear if they were considered at all.

I've reviewed and addressed all previous comments. If I've missed anything, please let me know.

This implementation raises more exceptions in some spots, swallows exceptions in other spots, but other than one place does not handle the errors at all. It is unclear to me how this benefits the end user or solves the ticket it is linked to.

Ideally we are raising the exception in api_call method and the unknown exception is being swallowed if threshold is within the criteria and handling the exception where the api is being called. If you have any confusion or doubt redarding this please let us know

If it is still unclear how to move forward with this, we will close this PR and implement it internally.

As per the certiera
We have tried to handled the exception in api_call whichever can we handled like ServerConnectionError,ClientResponseError,ClientPayloadError
If those exception are not handled we are re-raising them whereever the api_call() is called
And to balance the need to "fail fast" with the need to not crash a large sync because of a small data issue we are using the API_FAILURE_THRESHOLD and MIN_API_CALL variables which raise the exception if error percentage is more than 10%
If we are missing anything else please let us know

@seanstory
Copy link
Member

seanstory commented Oct 18, 2024

I'm going to close this PR in favor of #2903. @moxarth-elastic @parth-elastic if you'd like to contribute to that branch (it still needs unit tests), please go ahead. Otherwise, when I get to it, I'll re-assign the parent issue internally, and close it.

@seanstory seanstory closed this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

confluence swallows any error that occurs during pagination
4 participants