Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag redacted document in ingest pipeline #113552

Merged
merged 6 commits into from
Sep 27, 2024

Conversation

samxbr
Copy link
Contributor

@samxbr samxbr commented Sep 25, 2024

Adds a new option trace_redact in redact processor to indicate a document has been redacted in the ingest pipeline. If a document is processed by a redact processor AND any field is redacted, ingest metadata _ingest._redact._is_redacted = true will be set.

Closes #94633

Copy link
Contributor

Documentation preview:

@samxbr samxbr added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.16.0 labels Sep 25, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @samxbr, I've created a changelog YAML for you.

@samxbr samxbr changed the title Tag redacted document in ingest metadata Tag redacted document in ingest pipeline Sep 25, 2024
@samxbr
Copy link
Contributor Author

samxbr commented Sep 25, 2024

@elasticmachine update branch

@samxbr samxbr marked this pull request as ready for review September 25, 2024 18:15
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 25, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@samxbr samxbr requested a review from jbaiera September 25, 2024 18:17
Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple quick comments but otherwise this looks good

boolean isRedacted = fieldValue.equals(redacted) == false;

// document newly redacted
if (alreadyRedacted == false && isRedacted) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if it's already redacted? We only skip setting this if the metadata field is true already. It doesn't look like it can ever be false. Couldn't we just overwrite the field regardless of its existing value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that a redact processor does not actually modify the document (the pattern is not found), in that case I think we should not set _is_redacted to true .

In the case of multiple redact processors, if the first processor redacts the doc and sets _is_redacted to true, and the second processor does not redact the doc, we want to skip setting _is_redacted in the second processor to prevent overriding the previously true value.

Comment on lines 219 to 220
if (traceRedact == false) return;
if (fieldValue == null) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nitpick: These conditionals could be combined, but more importantly I think it's best to always put the braces in.

Comment on lines 58 to 62
protected static final String REDACT_KEY = "_redact";
protected static final String IS_REDACTED_KEY = "_is_redacted";
protected static final String METADATA_PATH_REDACT = IngestDocument.INGEST_KEY + "." + REDACT_KEY;
// indicates if document has been redacted
protected static final String METADATA_PATH_REDACT_IS_REDACTED = METADATA_PATH_REDACT + "." + IS_REDACTED_KEY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we condense these down into one constant? It's a little hard to see exactly what the final key is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these key constants are used in tests, and I didn't want to hard code the constants in other places, that's why I had to define them separately.

If it helps, I can add a comment for the exact key _ingest._redact._is_redacted

@samxbr
Copy link
Contributor Author

samxbr commented Sep 26, 2024

@elasticmachine update branch

@samxbr samxbr requested a review from jbaiera September 27, 2024 14:49
Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samxbr samxbr merged commit 6917f16 into elastic:main Sep 27, 2024
15 checks passed
samxbr added a commit to samxbr/elasticsearch that referenced this pull request Sep 29, 2024
Adds a new option trace_redact in redact processor to indicate a document has been redacted in the ingest pipeline. If a document is processed by a redact processor AND any field is redacted, ingest metadata _ingest._redact._is_redacted = true will be set.

Closes elastic#94633
Comment on lines +58 to +59
protected static final String REDACT_KEY = "_redact";
protected static final String IS_REDACTED_KEY = "_is_redacted";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm adding those fields to the Elasticsearch specification, I'm wondering, why do we need underscores here? Under _ingest, we already have timestamp and pipeline that don't use underscores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[REDACT] Tag documents that have been redacted by the Redact Processor
5 participants