Classification Management Overview

Introduction

Classification management is the systematic process of organizing data into specific categories based on predefined criteria such as sensitivity, type, regulatory compliance, usage, and/or other categories useful to the business. Strong classification management practices within your organization are essential for several reasons:

Data security: Implement security measures to protect against unauthorized access and potential by classifying data by its severity and importance.
Regulatory compliance: Comply with key regulations by ensuring sensitive data is handled according to legal and industry standards, such as HIPAA.
Risk management: Identify critical data types and apply stringent protections to reduce the risk of data loss or exposure.
Efficiency and cost savings: Reduce storage and backup costs by eliminating redundant data and streamlining the search and retrieval process for more efficient data management.
Data lifecycle management: Ensure data at each stage is managed (eg. stored, accessed, shared, archived, and deleted) according to its classification level.
Data accessibility and retrieval: Make finding and retrieving essential data easier, particularly important for legal discovery and responding to information requests.
Data protection policies: Provide a clear understanding of data sensitivity and necessary controls to drive effective data protection policies and data loss prevention strategies.

Read on to learn how Netskope One DSPM handles data sampling and classification, including built-in and custom sensitive data type (SDT) classifiers, confidence scoring, built-in and custom data tags, and possible use cases.

How Netskope One DSPM Classifies Data

Netskope One DSPM samples data from scanned data stores to identify and classify sensitive data. Across data types, classification is based on a combination of heuristic signals, including but not limited to:

Keywords
Field or file names
Field or file content
Dictionaries
Proximity
Query logs
Regex
Checksums

Sampling for Structured Data

Netskope One DSPM's default sampling and classification process for structured data involves obtaining 50 non-null samples from each data store field and matching them against the heuristic signals described above. Thus, the process determines whether the field contains sensitive data, its type and sensitivity level, and the resulting confidence score. This approach is fully configurable, as you can determine the scan frequency and schedule when connecting the data store. Sampling does not impact data store performance.

Sampling for Unstructured Data

In addition to scan frequency, sampling rate, defined as the percentage of files in a bucket Netskope One DSPM receives per scan, can be specified for unstructured data when connecting the data store. Sampling then works at the file level, matching your scanned files against relevant heuristic signals and determining if the file contains sensitive data, its type and sensitivity level, and resulting confidence score. This approach is fully configurable, as you can determine scan frequency, schedule, and sampling rate when connecting the data store. Sampling does not impact data store performance.

Read more about customizing the sampling rate when connecting AWS S3 and Google Cloud Storage unstructured data stores.

Supported File Types for Unstructured Data Store Scanning

Excerpt: Supported File Types for Unstructured Data Store Scanning

The below file types are currently supported for unstructured data classification:

Image Files	`.png`, `.jpeg`, `.jpg`
Archive Files	`.zip`, `.tar`, `.tar.gz`
Plain Text Files	`.txt`, `.pem`, `.crt`, `.cer`, `.key`, `.p7b`, `.p7c`
Other Files	`.avro`, `.csv`, `.doc`, `.docx`, `.eml`, `.htm`, `.html`, `.js`, `.json`, `.jsonl`, `.parquet`, `.pdf`, `.ppt`¹, `.pptx`¹, `.tsv`, `.xls`, `.xlsx`, `.xml`, `.yaml`, `.yml` ¹Text portions only

If a scanned data store contains files without an identifiable file type, “Unknown” will display within the Classifiable File Types field.

Sensitive Data Types

Netskope One DSPM uses the following built-in Sensitive Data Type classifiers and categories to classify sensitive data. Click each of the category tabs below for the full list, description, and default sensitivity level of classifiers.

Direct Identifiers (14)

Classifier Name	Description	Default Sensitivity Level
Name	Indicates the presence of an individual's name	Medium
Email	Indicates the presence of an email	Medium
Email (Masked)	Indicates the presence of a masked email	Medium
Phone Number	Indicates the presence of a telephone number	Medium
Phone Number (Masked)	Indicates the presence of a masked telephone number	Medium
Address	Indicates the presence of a physical address	Medium
Drivers License Number	Indicates the presence of a driver's license number	Medium
Drivers License Number (Masked)	Indicates the presence of a masked driver's license number	Medium
Social Security Number	Indicates the presence of a social security number	High
Social Security Number (Masked)	Indicates the presence of a masked social security number	Medium
Birth Date	Indicates the presence of a birth date	Medium
Birth Date (Masked)	Indicates the presence of a masked birth date	Medium
Birth Certificate Number	Indicates the presence of a birth certificate number	Medium
International Passport Number	Indicates the presence of a passport number	High
India Permanent Account Number	Indicates the presence of an India Permanent Account Number (PAN)	High
India Unique Identification Number	Indicates the presence of an India Unique Identification Number (known as Aadhaar)	High
India Universal Account Number	Indicates the presence of an India Universal Account Number	High
India Voter ID (EPIC)	Indicates the presence of Elector's Photo Identity Card (EPIC) number	High
India Tax Deduction and Collection	Indicates the presence of The Tax Deduction and Collection Account Number (TAN)	High
India Goods and Services Tax	Indicates the presence of The Goods and Services Tax Identification Number (GSTIN)	High

Indirect Identifiers (4)

Classifier Name	Description	Default Sensitivity Level
IP Address	Indicates the presence of an IP address	Low
MAC Address	Indicates the presence of a MAC address	Low
Vehicle Identification Number	Indicates the presence of a vehicle identification number	Medium
Gender	Indicates the presence of gender	Medium

Leverage direct and indirect identifier Sensitive Data Types to drive GDPR and CCPA compliance.

Financial Information (8)

Classifier Name	Description	Default Sensitivity Level
Credit Card Number	Indicates the presence of a credit card number	High
Credit Card Number (Masked)	Indicates the presence of a masked credit card number	Medium
International Bank Account Number	Indicates the presence of bank account numbers	Medium
Bank Routing Number	Indicates the presence of bank routing numbers	Medium
SWIFT Code	Indicates the presence of a SWIFT code	Medium
Tax Identification Number	Indicates the presence of a tax identification number	Medium
Legal Entity Identifier (LEI)	Indicates the presence of a Legal Entity Identifier	Medium
Committee on Uniform Securities Identification Procedures numbers (CUSIP)	Indicates the presence of a CUSIP number	Medium

Leverage financial information Sensitive Data Types to drive PCI DSS and SOX compliance.

Health Information (12)

Classifier Name	Description	Default Sensitivity Level
A1C Measurement	Indicates the presence of an A1C test result	Medium
Blood Pressure	Indicates the presence of a blood pressure test result	Medium
Body Mass Index	Indicates the presence of a BMI number	Medium
Glucose Levels	Indicates the presence of a glucose level test result	Medium
Health Plan Beneficiary Number	Indicates the presence of a health plan number	Medium
Height	Indicates the presence of an individual's height	Medium
ICD-9 Code	Indicates the presence of an ICD-9 code	Medium
ICD-10 Code	Indicates the presence of an ICD-10 code	Medium
Cholesterol Level	Indicates the presence of an LDL cholesterol test result	Medium
Medication	Indicates the presence of a medication name	Medium
National Drug Code	Indicates the presence of an NDC identifier	Medium
Pulse	Indicates the presence of a pulse rate	Medium

Leverage health information Sensitive Data Types to drive HIPAA and FERPA compliance.

Credentials (10)

Classifier Name	Description	Default Sensitivity Level
AWS Access Key	Indicates the presence of an Amazon Web Services access key	Medium
Azure Access Token	Indicates the presence of a Microsoft Azure access token	Medium
Certificate & Private Key	Indicates the presence of a certificate or private key	Medium
Databricks Access Token	Indicates the presence of a Databricks access token	Medium
Google API Key	Indicates the presence of a Google API key	Medium
Heroku Access Token	Indicates the presence of a Heroku access token	Medium
MongoDB Access Token	Indicates the presence of a MongoDB access token	Medium
Password Hash	Indicates the presence of a password hash	High
Snowflake Access Token	Indicates the presence of a Snowflake access token	Medium
User Name	Indicates the presence of a user name	Medium

Leverage credentials Sensitive Data Types to mitigate risk for data breaches.

Customization

Netskope One DSPM's classification process described above can be fully adapted to meet your organization's needs with customizable tags, sensitivity levels, scan frequency, scheduling, and sampling rates, including custom regex and context words to improve classification accuracy.

Read more about using Custom Sensitive Data Types to classify and protect sensitive data types specific to your organization.

Confidence Scoring

Each identified Sensitive Data Type has an associated confidence score on the Classification Management page.

Confidence scoring represents the likelihood that the field or file contains the Sensitive Data Type predicted by Netskope One DSPM. Specifically, it measures how strongly the field or file matches the assigned Sensitive Data Type. This is calculated based on several signals, including data patterns, domain-specific validation, metadata, and more. The confidence score increases as these features produce more matches against our trained classifier model. We recommend utilizing the confidence score while evaluating your classification results to determine false positives and negatives.

If you manually-mark a field as reviewed, the confidence score will be set to 100%, as the application will assume your review take precedence over any scoring the system may perform.

Additional Documentation

To learn more about how to fully utilize our Classification capabilities, please read the articles below:

Using the Classification Management Page

Data Tags

Object-Level Tagging

Creating a Custom Sensitive Data Type

Managing Custom Sensitive Data Types

Using Regular Expressions in Custom Sensitive Data Types

Using Data Dictionaries in Custom Sensitive Data Types

Welcome to the Netskope One DSPM Knowledge Base

You will find your answers here!

Sorry, we didn't find any relevant articles for you.