Welcome to the Netskope One DSPM Knowledge Base

You will find your answers here!

    Sorry, we didn't find any relevant articles for you.

    Send us your queries using the form below and we will get back to you with a solution.

    Classification Management Overview

    Introduction

    Classification management is the systematic process of organizing data into specific categories based on predefined criteria such as sensitivity, type, regulatory compliance, usage, and/or other categories useful to the business. Strong classification management practices within your organization are essential for several reasons:

    • Data security: Implement security measures to protect against unauthorized access and potential by classifying data by its severity and importance.
    • Regulatory compliance: Comply with key regulations by ensuring sensitive data is handled according to legal and industry standards, such as HIPAA.
    • Risk management: Identify critical data types and apply stringent protections to reduce the risk of data loss or exposure.
    • Efficiency and cost savings: Reduce storage and backup costs by eliminating redundant data and streamlining the search and retrieval process for more efficient data management.
    • Data lifecycle management: Ensure data at each stage is managed (eg. stored, accessed, shared, archived, and deleted) according to its classification level.
    • Data accessibility and retrieval: Make finding and retrieving essential data easier, particularly important for legal discovery and responding to information requests.
    • Data protection policies: Provide a clear understanding of data sensitivity and necessary controls to drive effective data protection policies and data loss prevention strategies.

    Read on to learn how Netskope One DSPM handles data sampling and classification, including built-in and custom sensitive data type (SDT) classifiers, confidence scoring, built-in and custom data tags, and possible use cases.

    How Netskope One DSPM Classifies Data

    Netskope One DSPM samples data from scanned data stores to identify and classify sensitive data. Across data types, classification is based on a combination of heuristic signals, including but not limited to:

    • Keywords
    • Field or file names
    • Field or file content
    • Dictionaries
    • Proximity
    • Query logs
    • Regex
    • Checksums

    Sampling for Structured Data

    Netskope One DSPM's default sampling and classification process for structured data involves obtaining 50 non-null samples from each data store field and matching them against the heuristic signals described above. Thus, the process determines whether the field contains sensitive data, its type and sensitivity level, and the resulting confidence score. This approach is fully configurable, as you can determine the scan frequency and schedule when connecting the data store. Sampling does not impact data store performance. 

    Sampling for Unstructured Data

    In addition to scan frequency, sampling rate, defined as the percentage of files in a bucket Netskope One DSPM receives per scan, can be specified for unstructured data when connecting the data store. Sampling then works at the file level, matching your scanned files against relevant heuristic signals and determining if the file contains sensitive data, its type and sensitivity level, and resulting confidence score. This approach is fully configurable, as you can determine scan frequency, schedule, and sampling rate when connecting the data store. Sampling does not impact data store performance. 

    Read more about customizing the sampling rate when connecting AWS S3 and Google Cloud Storage unstructured data stores.

    Supported File Types for Unstructured Data Store Scanning

    Excerpt: Supported File Types for Unstructured Data Store Scanning

    The below file types are currently supported for unstructured data classification:

    Image Files .png, .jpeg, .jpg
    Archive Files .zip, .tar, .tar.gz
    Plain Text Files .txt, .pem, .crt, .cer, .key, .p7b, .p7c
    Other Files

    .avro, .csv, .doc, .docx, .eml, .htm, .html, .js, .json, .jsonl, .parquet, .pdf, .ppt1,  .pptx1, .tsv, .xls, .xlsx, .xml, .yaml, .yml

    1 Text portions only

    If a scanned data store contains files without an identifiable file type, “Unknown” will display within the Classifiable File Types field.

     

    Sensitive Data Types

    Netskope One DSPM uses the following built-in Sensitive Data Type classifiers and categories to classify sensitive data. Click each of the category tabs below for the full list, description, and default sensitivity level of classifiers.

    Direct Identifiers (14)

    Classifier Name Description Default Sensitivity Level
    Name Indicates the presence of an individual's name Medium
    Email Indicates the presence of an email Medium
    Email (Masked) Indicates the presence of a masked email Medium
    Phone Number Indicates the presence of a telephone number Medium
    Phone Number (Masked) Indicates the presence of a masked telephone number Medium
    Address Indicates the presence of a physical address Medium
    Drivers License Number Indicates the presence of a driver's license number Medium
    Drivers License Number (Masked) Indicates the presence of a masked driver's license number Medium
    Social Security Number Indicates the presence of a social security number High
    Social Security Number (Masked) Indicates the presence of a masked social security number Medium
    Birth Date Indicates the presence of a birth date Medium
    Birth Date (Masked) Indicates the presence of a masked birth date Medium
    Birth Certificate Number Indicates the presence of a birth certificate number Medium
    International Passport Number Indicates the presence of a passport number High
    India Permanent Account Number Indicates the presence of an India Permanent Account Number (PAN) High
    India Unique Identification Number Indicates the presence of an India Unique Identification Number (known as Aadhaar) High
    India Universal Account Number Indicates the presence of an India Universal Account Number High
    India Voter ID (EPIC) Indicates the presence of Elector's Photo Identity Card (EPIC) number High
    India Tax Deduction and Collection Indicates the presence of The Tax Deduction and Collection Account Number (TAN) High
    India Goods and Services Tax Indicates the presence of The Goods and Services Tax Identification Number (GSTIN) High
     
     

    Indirect Identifiers (4)

    Classifier Name Description Default Sensitivity Level
    IP Address Indicates the presence of an IP address Low
    MAC Address Indicates the presence of a MAC address Low
    Vehicle Identification Number Indicates the presence of a vehicle identification number Medium
    Gender Indicates the presence of gender Medium
     
     

    Leverage direct and indirect identifier Sensitive Data Types to drive GDPR and CCPA compliance. 

    Financial Information (8)

    Classifier Name Description Default Sensitivity Level
    Credit Card Number Indicates the presence of a credit card number High
    Credit Card Number (Masked) Indicates the presence of a masked credit card number Medium
    International Bank Account Number Indicates the presence of bank account numbers Medium
    Bank Routing Number Indicates the presence of bank routing numbers Medium
    SWIFT Code Indicates the presence of a SWIFT code Medium
    Tax Identification Number Indicates the presence of a tax identification number Medium
    Legal Entity Identifier (LEI) Indicates the presence of a Legal Entity Identifier Medium
    Committee on Uniform Securities Identification Procedures numbers (CUSIP) Indicates the presence of a CUSIP number Medium
     
     

    Leverage financial information Sensitive Data Types to drive PCI DSS and SOX compliance.

    Health Information (12)

    Classifier Name Description Default Sensitivity Level
    A1C Measurement Indicates the presence of an A1C test result Medium
    Blood Pressure Indicates the presence of a blood pressure test result Medium
    Body Mass Index Indicates the presence of a BMI number Medium
    Glucose Levels Indicates the presence of a glucose level test result Medium
    Health Plan Beneficiary Number Indicates the presence of a health plan number Medium
    Height Indicates the presence of an individual's height Medium
    ICD-9 Code Indicates the presence of an ICD-9 code Medium
    ICD-10 Code Indicates the presence of an ICD-10 code Medium
    Cholesterol Level Indicates the presence of an LDL cholesterol test result Medium
    Medication Indicates the presence of a medication name Medium
    National Drug Code Indicates the presence of an NDC identifier Medium
    Pulse Indicates the presence of a pulse rate Medium
     
     

    Leverage health information Sensitive Data Types to drive HIPAA and FERPA compliance.

    Credentials (10)

    Classifier Name Description Default Sensitivity Level
    AWS Access Key Indicates the presence of an Amazon Web Services access key Medium
    Azure Access Token Indicates the presence of a Microsoft Azure access token Medium
    Certificate & Private Key Indicates the presence of a certificate or private key Medium
    Databricks Access Token Indicates the presence of a Databricks access token Medium
    Google API Key Indicates the presence of a Google API key Medium
    Heroku Access Token Indicates the presence of a Heroku access token Medium
    MongoDB Access Token Indicates the presence of a MongoDB access token Medium
    Password Hash Indicates the presence of a password hash High
    Snowflake Access Token Indicates the presence of a Snowflake access token Medium
    User Name Indicates the presence of a user name Medium
     
     

    Leverage credentials Sensitive Data Types to mitigate risk for data breaches.

    Customization

    Netskope One DSPM's classification process described above can be fully adapted to meet your organization's needs with customizable tags, sensitivity levels, scan frequency, scheduling, and sampling rates, including custom regex and context words to improve classification accuracy. 

    Read more about using Custom Sensitive Data Types to classify and protect sensitive data types specific to your organization.

    Confidence Scoring

    Each identified Sensitive Data Type has an associated confidence score on the Classification Management page.

    Confidence scoring represents the likelihood that the field or file contains the Sensitive Data Type predicted by Netskope One DSPM. Specifically, it measures how strongly the field or file matches the assigned Sensitive Data Type. This is calculated based on several signals, including data patterns, domain-specific validation, metadata, and more. The confidence score increases as these features produce more matches against our trained classifier model. We recommend utilizing the confidence score while evaluating your classification results to determine false positives and negatives.

    If you manually-mark a field as reviewed, the confidence score will be set to 100%, as the application will assume your review take precedence over any scoring the system may perform.

    Additional Documentation

    To learn more about how to fully utilize our Classification capabilities, please read the articles below:

    Using the Classification Management Page

    Data Tags

    Object-Level Tagging

    Creating a Custom Sensitive Data Type

    Managing Custom Sensitive Data Types

    Using Regular Expressions in Custom Sensitive Data Types

    Using Data Dictionaries in Custom Sensitive Data Types

     

     

    Was this article helpful?

    Still can't find what you are looking for?

    Contact Netskope Technical Support