Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To help us get this pull request reviewed and merged quickly, please be sure to include the following items:
PR Type
What kind of change does this PR introduce?
Backward Compatibility
Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?
"Almost" -- see description below for corner cases.
Issue Linking
closes #315
What's new?
The loosey-goosey "base64-anything" matching introduced in 3.0.0 turned out to be problematic because we several examples of things that looked like base64 encodings but weren't and which generated new issues thanks to the extended combined alphabet. Simple tweaks looked problematic (read "nondeterministic") going forward.
This fix is a little bit more invasive.
util.find_strings_by_regex()
which spent most of its time throwing away short strings. Nothing in the tartufo codebase allows users to supply a different minimum length; I added a warning for folks who might be building other code on top of tartufo.scanner.scan_entropy()
used to split lines into words and scan the words individually. Since the supplied regexs will do this by themselves anyway, we eliminate theline.split()
step and eliminate a pass over the input text. This required fixes to a number of unit tests where we generally make fewer calls on longer targets.scanner.evaluate_entropy_string()
now returns an optional issue instead of being a generator (that returned either 1 or 0 issues). This allows its only caller to inspect the return before passing it back to higher-level callers.scanner.scan_entropy()
now jumps through deduplication hoops:My testing shows variations in the noise level when tartufo scans itself, but it seems like the extra overhead should be approximately balanced by the efficiency gains and we'll see a wash. Time will tell.
Note it is still possible to get new issues (not reported by 2.x) if a string begins or ends with
-
or_
and the old string didn't generate an issue but the new longer string does.