Replies: 1 comment
-
This will available via #2578 in the v0.40.0 :) Thanks @Mac-lp3! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem
Devs should use masked PII in the lower environment.
The current SDK uses
faker
to generate masked values. However, the code generates a new masked value each time. Eg:This inconsistency can cause issues when performing SQL joins:
If luck was not on the developer's side, they may see very different results in the lower environment vs production, depending on the random values generated.
This is a very simple example, but imagine a highly-targeted join that relies on multiple address fields. Even with large data sets, there is a chance no records in
tbl1
will match any intbl2
.This is actually something that happened to me once, and we were unable to test the business logic in the lower environment due to inconsistent masked data. The join returned 0 rows in dev, but over 12k in prod).
Solution
Give developers the option to enable masking consistency (first name
MIKE
is always masked toTONY
).This is possible with Faker. The seed just needs to be reapplied each time:
Thus, if the seed is the original value of the column, we can ensure the same first name is always mapped to the same masked value:
Proposal 1
Add a new property to the faker_config called
always_reseed
:Then update the
_eval
function in mapper.py to reseed the faker instance before calling the expression evaluator.Proposal 2
Expressions are evaluated with
simpleeval
, which only allows a single statement:Let devs specify a list of commands which are run in sequence. The value returned by the last one will be used:
Beta Was this translation helpful? Give feedback.
All reactions