SQL Data Masking

For over 20 years Microsoft SQL Server has been one of the most widely used Relational Database Systems in the world. Countless organizations use it to support their application, data storage, reporting and analytics needs. With the world’s increasing dependence on technology, the amount of data being created on a daily basis has exploded.  The amount data created on a given day is measured in quintillions!?  How much is that? I don’t exactly know, but it’s a lot of zeros.

With the increased volume, there is a growing need for organizations to analyze this data to make business decisions.  Historically this would fall to the IT Department to create reports for End Users.  In that model it was fairly easy for DB Administrators and Security Teams to manage and monitor access.  Times have changed, no longer are departments willing to wait in a queue to have reports and data models built, business units want instant access to their data and are increasingly hiring their own data analysts to provide it for them.  While the intention is good, it has created a Data Governance “Wild West” for Security Teams.

What can Security Teams do to manage access and prevent disclosure?  Encryption is always the best option, but let’s face it not everything can be encrypted.  Legacy Applications and Third-Party applications don’t always support it and organizations may not want to invest in the development effort needed in legacy applications.  Data Loss Prevention tools can help, but they are prone to false positives and can be defeated.  One, relatively new option offered by Microsoft is Dynamic Data Masking which was introduced in SQL 2016 and is also supported with Azure SQL.

Dynamic Data Masking allows those tasked with protecting Data to Mask data at runtime, in so doing you can prevent information from being exposed as users explore data and develop new applications.  The ability to “unmask” data can be applied by User or Security Group, the one limitation however is that it is Database wide.  Meaning a user with “unmask” privilege has it for an entire DB.  Hopefully this will be addressed in future updates, but for now it’s simply: GRANT UNMASK to [USER].

While somewhat limited, when combined with SQL Views it can be quite powerful, particularly in DEV and QA environments.  Note however that Masked data is subject to Inference as the underlying value is evaluated in any WHERE clause filter.

How to Create a Data Mask:

CREATE TABLE [dbo].[UserTable](
    [ID] [int] NULL,
    [FirstName] [varchar](50) NULL,
    [LastName] [varchar](50) NULL,
    [DOB] [varchar](10) MASKED with (FUNCTION = 'partial(0,"XXXXXXX",4)'),
    [Phone] [varchar](10) NULL,
    [Address] [varchar](50) NULL,
    [City] [varchar](25) NULL,
    [State] [varchar](2) NULL,
    [Email] [varchar](50)  MASKED with (FUNCTION='email()'),
    [CreditCard] [varchar](20) MASKED with (FUNCTION = 'partial(4,"XXXXXXX",4)'),
    [Balance] [varchar](20) MASKED with (FUNCTION = 'default()')
) ON [PRIMARY]

The partial function allows you to create a custom mask. In our example the first and last four characters of the CreditCard field will be visible while the middle digits will be masked with an ‘x’.

Sample Output:

Masked Data

Incident Metrics

How To Grant Access to Unmasked Data

GRANT UNMASK to PrivilegedReader

Sample Output:

Unmasked Data

Find all masked columns by querying:

select * from sys.masked_columns

Pros:

  • Easy and inexpensive to implement

  • Low risk of breaking existing applications

  • Can secure sensitive data from users with Ad-Hoc query Access

Cons:

  • Unmask permission is at the Database Level

  •  Data is not masked at rest

  • Subject to Inference even if output is masked. (Select * where salary is >10000000)

Summary:

Masking Data is not a panacea, but it does have its place and can help with security and compliance initiatives.  It’s a great tool to add to your Defense in Depth Strategy.  Be sure to check out related topics on SQL Data Classification and SQL Encryption.

Previous
Previous

Why You Need Multi-factor Authentication

Next
Next

Classifying SQL Data