Enterprises often interpret a data security mandate as identifying configuration issues or vulnerabilities in their data infrastructure. To improve security posture, though, the scope of data security activities must be to protect sensitive data assets such as customer information, trade secrets, financial information or patents. DSPM-based data classification offers a granular view that helps define adequate policies for the type, context and sensitivity of the data.
Typical labeling practices (public, internal, confidential, secret) fail to capture the differences and nuances between different types of data, such as the difference between R&D documents and customer payment information. In this blog post, we’ll present a set of data classification categories that can help you extract context from your data for richer and more accurate labeling.
Classification is the process of labeling and categorizing data based on the type of information it holds. Data classification helps organizations understand the value and sensitivity of their data, as well as the impact on the business if that data were exposed. This allows them to set more effective security policies.
Data classification plays a major part in improving an organization’s security posture. It’s also explicitly required by some compliance frameworks and can help streamline other GRC efforts (HIPAA, SOC 2, ISO 27001). This can manifest in multiple ways:
Data classification is only effective if carried out consistently at a company level. Today’s complex data infrastructure means that data is often left unclassified or inadequately classified, rendering downstream policies ineffective.
Data Fragmentation
It’s challenging to discover and monitor every repository where data needs to be classified when data is spread across services in hybrid environments (cloud-based or on-premises databases, big data platforms, data lakes, collaboration systems).
Use of Unstructured Data
While structured data is queryable, its unstructured counterpart (documents, media files, PDFs and emails) requires more resources and frequent manual intervention to classify.
Shadow Data
The cloud’s elasticity that enables developers to spin services up and down with minimal friction is a key reason for unknown, undiscovered and, implicitly, unclassified data.
Mergers and Acquisitions
Differences in security policies, classification practices and IT architectures between two distinct business entities result in inconsistent classification and inadequate policy enforcement.
To define rich and comprehensive security policies, data must be classified based on its type, context, subject and sensitivity.
Data types are the most granular building block of classification to enable policy definition and enforcement. Some examples of data types include email addresses, social security numbers, country codes, payment card information, and the like. DSPM solutions will usually have pre-built classifiers or data types, as well as custom data types based on specific business needs.
It’s worth noting that using Data Types can correctly classify data, which would otherwise be difficult to identify with simple techniques like regular expressions. For example, not all eight-number strings are social security numbers (SSN), so regular expressions that query for eight-number strings to identify SSNs may produce false positives. More advanced classification engines use context analysis, validation functions and ML/AI models to validate accuracy. This should be achieved with low resource consumption, high performance, and without compromising on accuracy.
Simply labeling data by its type isn’t enough to derive appropriate policies. This is because some data types require different policies based on the business context. An email address, for example, requires different policies depending on who it belongs to and how it’s used. It can be associated with an employee or a customer, belong to someone from the US or the EU, or have a generic domain name such as @gmail.com or a sensitive one such as @gov.us.
Organizations can determine the context surrounding a data point by identifying metadata (e.g., timestamps, format, location) and by enriching the data - for example, by comparing it against other sources such as CRM or ERP.
Enrichment can also provide context by associating two disparate data points to extract the true value and level of sensitivity. For example, a name and address are qualified as personally identifiable information and are subject to regulations such as GDPR. However, a name, address and credit card number are also subject to the Payment Card Industry Data Security Standard (PCI DSS).
DSPM tools can automate the data classification process to identify and enrich data points with business, privacy and security attributes such as location, how the data was generated, modifications, residency, retention period and applicable laws.
Some types/instances/flavors of sensitive data can’t be accurately identified by predefined data types. For example, a contract might not match a specific PII pattern but still be considered sensitive due to trade secrets or intellectual property.
Sensitive data may be created and stored in a variety of file formats. The file’s subject offers a great deal of information about the type of data it holds. For example, these can be contracts, resumes, hospital discharge forms, patents, IT architecture documents, and even database tables.
Defining policies according to file subjects is both intuitive and rich. For example, IT architecture documents are entirely reserved for senior IT staff, such as architects. These are also highly sensitive documents, and any leaks would pose major cybersecurity concerns.
One challenge in using file subjects to define security policies is the inconsistency of naming conventions. For example, job applications may have associated files that can take multiple forms, such as ‘FirstName-LastName-Resume’ or ‘FirstName-LastName-CV,’ or even just ‘FirstName-LastName.’ Mature DSPM solutions can accurately classify these types of data across inconsistent naming conventions.
Standards organizations, such as the International Standards Organization (ISO) and the National Institute of Standards and Technology (NIST), advise against practices that treat all data equally: Organizations are mandated by regulation to classify data and label data sensitivity, based on the contents of the data. The risk related to a specific dataset or record is determined based on the sensitivity and level of exposure.
Classifying data can help organizations determine the sensitivity levels associated with their data assets. This would often be determined by the consequences of this data exposure.
Additionally, sensitivity is determined by the breadth and depth of the affected data. For example, a shallow and narrow data point can include just a list of first and family names. While this is considered PII, the impact of having this data compromised is low, and as such, the sensitivity is also low. As the information gets richer, such as adding a billing address, card number, transactions and the location of the transaction, the impact and associated sensitivity become much higher.
Microsoft Information Protection is a system applicable to the whole Microsoft estate (as well as non-Microsoft resources) that assigns sensitivity labels to documents such as emails, Word documents, and spreadsheets. These labels are customizable by each customer, but default to the following:
Each label has additional security measures, such as encryption read access controls, as well as restricted file sharing via email or uploaded to file servers or storage services. From the above, the default label assigned whenever a document is created is ‘general.’
Besides the default label assignment when a document is created, the MIP labels are static, meaning that any changes to the labels are often made manually or via limited automations, without adequate consideration of the content of the document. This is an issue when a collaborative document labeled as ‘general’ has confidential information added to it without a label change.
A mature DSPM solution can read and interpret the contents of an MIP-labeled document to alert the security teams of the mislabeled file and suggest an adequate sensitivity level.
For insights into cloud security and a better understanding of how your data is exposed in the cloud, read our comprehensive State of Cloud Data Security 2023 report. This research sheds light on crucial aspects of cloud data security and provides actionable steps to effectively defend your valuable data.
And if you haven’t tried Prisma Cloud, take it for a test drive with a free 30-day Prisma Cloud trial.
By submitting this form, you agree to our Terms of Use and acknowledge our Privacy Statement. Please look for a confirmation email from us. If you don't receive it in the next 10 minutes, please check your spam folder.