Introduction
Classification management is the systematic process of organizing data into specific categories based on predefined criteria such as sensitivity, type, regulatory compliance, usage, and/or other categories useful to the business. Strong classification management practices within your organization are essential for several reasons:
- Data security: Implement security measures to protect against unauthorized access and potential by classifying data by its severity and importance.
- Regulatory compliance: Comply with key regulations by ensuring sensitive data is handled according to legal and industry standards, such as HIPAA.
- Risk management: Identify critical data types and apply stringent protections to reduce the risk of data loss or exposure.
- Efficiency and cost savings: Reduce storage and backup costs by eliminating redundant data and streamlining the search and retrieval process for more efficient data management.
- Data lifecycle management: Ensure data at each stage is managed (eg. stored, accessed, shared, archived, and deleted) according to its classification level.
- Data accessibility and retrieval: Make finding and retrieving essential data easier, particularly important for legal discovery and responding to information requests.
- Data protection policies: Provide a clear understanding of data sensitivity and necessary controls to drive effective data protection policies and data loss prevention strategies.
Read on to learn how Netskope One DSPM handles data sampling and classification, including built-in and custom sensitive data type (SDT) classifiers, confidence scoring, built-in and custom data tags, and possible use cases.
How Netskope One DSPM Classifies Data
Netskope One DSPM samples data from scanned data stores to identify and classify sensitive data. Across data types, classification is based on a combination of heuristic signals, including but not limited to:
- Keywords
- Field or file names
- Field or file content
- Dictionaries
- Proximity
- Query logs
- Regex
- Checksums
Sampling for Structured Data
Netskope One DSPM's default sampling and classification process for structured data involves obtaining 50 non-null samples from each data store field and matching them against the heuristic signals described above. Thus, the process determines whether the field contains sensitive data, its type and sensitivity level, and the resulting confidence score. This approach is fully configurable, as you can determine the scan frequency and schedule when connecting the data store. Sampling does not impact data store performance.
Sampling for Unstructured Data
In addition to scan frequency, sampling rate, defined as the percentage of files in a bucket Netskope One DSPM receives per scan, can be specified for unstructured data when connecting the data store. Sampling then works at the file level, matching your scanned files against relevant heuristic signals and determining if the file contains sensitive data, its type and sensitivity level, and resulting confidence score. This approach is fully configurable, as you can determine scan frequency, schedule, and sampling rate when connecting the data store. Sampling does not impact data store performance.
Read more about customizing the sampling rate when connecting AWS S3 and Google Cloud Storage unstructured data stores.
Supported File Types for Unstructured Data Store Scanning
Excerpt: Supported File Types for Unstructured Data Store Scanning
The below file types are currently supported for unstructured data classification:
Image Files |
.png , .jpeg , .jpg
|
---|---|
Archive Files |
.zip , .tar , .tar.gz
|
Plain Text Files |
.txt , .pem , .crt , .cer , .key , .p7b , .p7c
|
Other Files |
1 Text portions only |
If a scanned data store contains files without an identifiable file type, “Unknown” will display within the Classifiable File Types field.
Sensitive Data Types
Netskope One DSPM uses the following built-in Sensitive Data Type classifiers and categories to classify sensitive data. Click each of the category tabs below for the full list, description, and default sensitivity level of classifiers.
Direct Identifiers (14)
Classifier Name | Description | Default Sensitivity Level |
---|---|---|
Name | Indicates the presence of an individual's name | Medium |
Indicates the presence of an email | Medium | |
Email (Masked) | Indicates the presence of a masked email | Medium |
Phone Number | Indicates the presence of a telephone number | Medium |
Phone Number (Masked) | Indicates the presence of a masked telephone number | Medium |
Address | Indicates the presence of a physical address | Medium |
Drivers License Number | Indicates the presence of a driver's license number | Medium |
Drivers License Number (Masked) | Indicates the presence of a masked driver's license number | Medium |
Social Security Number | Indicates the presence of a social security number | High |
Social Security Number (Masked) | Indicates the presence of a masked social security number | Medium |
Birth Date | Indicates the presence of a birth date | Medium |
Birth Date (Masked) | Indicates the presence of a masked birth date | Medium |
Birth Certificate Number | Indicates the presence of a birth certificate number | Medium |
International Passport Number | Indicates the presence of a passport number | High |
India Permanent Account Number | Indicates the presence of an India Permanent Account Number (PAN) | High |
India Unique Identification Number | Indicates the presence of an India Unique Identification Number (known as Aadhaar) | High |
India Universal Account Number | Indicates the presence of an India Universal Account Number | High |
India Voter ID (EPIC) | Indicates the presence of Elector's Photo Identity Card (EPIC) number | High |
India Tax Deduction and Collection | Indicates the presence of The Tax Deduction and Collection Account Number (TAN) | High |
India Goods and Services Tax | Indicates the presence of The Goods and Services Tax Identification Number (GSTIN) | High |
Indirect Identifiers (4)
Classifier Name | Description | Default Sensitivity Level |
---|---|---|
IP Address | Indicates the presence of an IP address | Low |
MAC Address | Indicates the presence of a MAC address | Low |
Vehicle Identification Number | Indicates the presence of a vehicle identification number | Medium |
Gender | Indicates the presence of gender | Medium |
Leverage direct and indirect identifier Sensitive Data Types to drive GDPR and CCPA compliance.
Financial Information (8)
Classifier Name | Description | Default Sensitivity Level |
---|---|---|
Credit Card Number | Indicates the presence of a credit card number | High |
Credit Card Number (Masked) | Indicates the presence of a masked credit card number | Medium |
International Bank Account Number | Indicates the presence of bank account numbers | Medium |
Bank Routing Number | Indicates the presence of bank routing numbers | Medium |
SWIFT Code | Indicates the presence of a SWIFT code | Medium |
Tax Identification Number | Indicates the presence of a tax identification number | Medium |
Legal Entity Identifier (LEI) | Indicates the presence of a Legal Entity Identifier | Medium |
Committee on Uniform Securities Identification Procedures numbers (CUSIP) | Indicates the presence of a CUSIP number | Medium |
Leverage financial information Sensitive Data Types to drive PCI DSS and SOX compliance.
Health Information (12)
Classifier Name | Description | Default Sensitivity Level |
---|---|---|
A1C Measurement | Indicates the presence of an A1C test result | Medium |
Blood Pressure | Indicates the presence of a blood pressure test result | Medium |
Body Mass Index | Indicates the presence of a BMI number | Medium |
Glucose Levels | Indicates the presence of a glucose level test result | Medium |
Health Plan Beneficiary Number | Indicates the presence of a health plan number | Medium |
Height | Indicates the presence of an individual's height | Medium |
ICD-9 Code | Indicates the presence of an ICD-9 code | Medium |
ICD-10 Code | Indicates the presence of an ICD-10 code | Medium |
Cholesterol Level | Indicates the presence of an LDL cholesterol test result | Medium |
Medication | Indicates the presence of a medication name | Medium |
National Drug Code | Indicates the presence of an NDC identifier | Medium |
Pulse | Indicates the presence of a pulse rate | Medium |
Leverage health information Sensitive Data Types to drive HIPAA and FERPA compliance.
Credentials (10)
Classifier Name | Description | Default Sensitivity Level |
---|---|---|
AWS Access Key | Indicates the presence of an Amazon Web Services access key | Medium |
Azure Access Token | Indicates the presence of a Microsoft Azure access token | Medium |
Certificate & Private Key | Indicates the presence of a certificate or private key | Medium |
Databricks Access Token | Indicates the presence of a Databricks access token | Medium |
Google API Key | Indicates the presence of a Google API key | Medium |
Heroku Access Token | Indicates the presence of a Heroku access token | Medium |
MongoDB Access Token | Indicates the presence of a MongoDB access token | Medium |
Password Hash | Indicates the presence of a password hash | High |
Snowflake Access Token | Indicates the presence of a Snowflake access token | Medium |
User Name | Indicates the presence of a user name | Medium |
Leverage credentials Sensitive Data Types to mitigate risk for data breaches.
Customization
Netskope One DSPM's classification process described above can be fully adapted to meet your organization's needs with customizable tags, sensitivity levels, scan frequency, scheduling, and sampling rates, including custom regex and context words to improve classification accuracy.
Read more about using Custom Sensitive Data Types to classify and protect sensitive data types specific to your organization.
Confidence Scoring
Each identified Sensitive Data Type has an associated confidence score on the Classification Management page.

Confidence scoring represents the likelihood that the field or file contains the Sensitive Data Type predicted by Netskope One DSPM. Specifically, it measures how strongly the field or file matches the assigned Sensitive Data Type. This is calculated based on several signals, including data patterns, domain-specific validation, metadata, and more. The confidence score increases as these features produce more matches against our trained classifier model. We recommend utilizing the confidence score while evaluating your classification results to determine false positives and negatives.
If you manually-mark a field as reviewed, the confidence score will be set to 100%, as the application will assume your review take precedence over any scoring the system may perform.
Additional Documentation
To learn more about how to fully utilize our Classification capabilities, please read the articles below:
Using the Classification Management Page
Creating a Custom Sensitive Data Type
Managing Custom Sensitive Data Types
Using Regular Expressions in Custom Sensitive Data Types
Using Data Dictionaries in Custom Sensitive Data Types