Once redacted, does this field still maintain the data value/utility for analysis downstream?.Which redaction technique should be used (full, partial, lookup)?.What sensitive data should be de-identified with redaction?.This entails reviewing your sensitive data, and determining: When designing a data privacy strategy, data redaction is often considered as a first step.
In the example above, the letter “S” designates that the diagnosis relates to “Injuries, poisoning and certain other consequences of external causes.” The first three characters of the ICD-10 above (S86) would reveal “Injury of muscle, fascia and tendon at lower leg level.” More information, but again, not the full ICD-10 code information and the exact injury is not revealed. For example, S86.011D is the code for a “strain of the right Achilles tendon.” By leaving the first letter, and redacting the rest, we have some information, without the full ICD-10 code meaning.
This could also be accomplished with other privacy techniques like generalization. DOB: Simply replacing the ‘Date of Birth’ field with the ‘Year’ or the ‘Month/Year’ is a form of redaction.
This technique of partial redaction goes beyond the SSN and phone number. In data privacy terms, we have turned a direct identifier (SSN) into an indirect or quasi identifier (SSN last 4 digits).īecause it is not a direct identifier, we often are asked another question, like the last 4 digits of our phone number, and the same methodology applies.
So, the redacted SSN (last 4 digits) has value in the verification process, but has been redacted enough to not be a direct identifier. In this case, it is likely that the person asking you that question can only see the last 4 digits of your SSN. We’ve all been on the phone when we are asked to verify ourselves using the ‘last 4 digits’ of our SSN. This part can be shown without exposing the entire column. Often, part of a sensitive column will have value.
For example, when applying redaction to a Social Security Number (SSN), the result might be ‘N/A’ or ‘XXX-XX-XXXX.’Īnother redaction technique is to have a ‘look-up’ to find a value to put in the resulting column, instead of a constant value.For example, a column ‘FirstName’ might have a value of “Susan.” and the look-up would get a name from a random list and replace it with “Cathy.” In this method, the query returns the proper number of columns, but instead of the actual value, the column value is replaced with a constant. Some redaction techniques can be referred to as ‘simple masking’ as it is a one-way substitution scheme.The most common use of the redaction technique is to ‘redact’ the entire column, and replace with a constant. Redaction was one of the first methods to protect sensitive data, yet return a column value. This method, while effective, can present issues to the calling application (like a BI tool), as it is expecting a certain number of columns to be returned from the query. Column based security can ensure a sensitive column is not exposed to a user without the proper privileges. One of the first methods to protect sensitive information was to implement column based security. This helps protect sensitive personally identifying data. By David Bernstein, Data Privacy Engineer at Privitarĭata redaction is a data masking technique that enables you to mask (redact) data by removing or substituting all or part of the field value.