Data Classification Governance: Why Taxonomy Failures Undermine Every Downstream Control

Why Most Sensitivity Label Deployments Fail

After deploying Microsoft Purview sensitivity labels across more than fifty organisations - from 200-seat law firms to 15,000-seat financial services groups, the evidence is clear that the technology is not the hard part. The taxonomy is. Every failed deployment shares the same root cause: someone opened the Purview compliance portal, created fifteen labels on day one, and pushed them to production without a naming convention, a hierarchy plan, or any concept of how auto-labelling would interact with DLP policies downstream.

This article is the operational playbook that should be standard reading before any deployment.

The 3-to-5 Tier Taxonomy

The single most important design decision is your label hierarchy. Microsoft supports up to five tiers of parent/sub-labels, but three tiers suit most organisations and reserve four or five for regulated industries with explicit data classification mandates (defence, financial services, healthcare).

Tier 1 - Sensitivity Level (Parent)

Public
Internal
Confidential
Highly Confidential

Tier 2 - Context Sub-labels Under each parent, sub-labels encode the audience or handling requirement:

Confidential \ All Employees
Confidential \ Project Specific
Confidential \ External Recipients (Encrypt)
Highly Confidential \ Board Only
Highly Confidential \ Legal Privilege

Tier 3 (regulated environments only) - Regulatory Tag

Highly Confidential \ Board Only \ FCA Restricted
Highly Confidential \ Legal Privilege \ SRA Matter

The Golden Rule

Never exceed five sub-labels per parent. Some tenants end up with 38 labels. Users ignore them all and pick "Internal" for everything. Cognitive overload kills classification accuracy faster than any technical limitation.

Auto-Labelling: Trainable Classifiers vs Exact Data Match

Microsoft offers two main approaches for auto-labelling, and choosing the wrong one wastes months.

Trainable Classifiers

These are machine-learning models you train with 50-500 positive examples and a set of negative examples. They work well for:

Legal contracts
Financial statements
Board minutes
HR disciplinary records

Navigate to Microsoft Purview > Information Protection > Classifiers > Trainable classifiers to create a custom classifier. Expect 2-4 weeks for the model to train and stabilise.

A critical mistake is that people train classifiers on too-small sample sets and then enable auto-labelling in enforce mode immediately. Always run in simulation mode for at least 30 days. Check the simulation results:

# Connect to Security & Compliance PowerShell
Connect-IPPSSession -UserPrincipalName admin@contoso.com

# Get auto-labelling policy simulation results
Get-AutoSensitivityLabelPolicy | Where-Object {$_.Mode -eq "TestWithNotifications"} | Format-List Name, Status, SimulationStatistics

Exact Data Match (EDM)

EDM is deterministic - it hashes your sensitive data (National Insurance numbers, account numbers, patient IDs) and matches against content at rest and in transit. No false positives if your source data is clean.

The setup is more involved:

Define your EDM schema in the Purview portal under Data Classification > Exact data matches
Prepare your sensitive information source table as a CSV
Hash and upload using the EDM Upload Agent:

# Hash the sensitive data source file
EdmUploadAgent.exe /CreateHash /DataFile "C:\EDM\employee-data.csv" /HashFile "C:\EDM\employee-data.hash" /Salt "YourSaltValue"

# Upload the hashed data
EdmUploadAgent.exe /UploadHash /DataStoreName "EmployeeData" /HashFile "C:\EDM\employee-data.hash"

EDM is the right choice when you have structured, deterministic sensitive data. Trainable classifiers are right when the sensitivity is contextual. Most mature deployments use both.

Default Labels and Justification on Downgrade

Two settings that are non-negotiable in every deployment:

Default Label

In the Microsoft Purview compliance portal, navigate to Information Protection > Label policies. Edit your policy and set a default label for documents and emails. The recommended default is "Internal" for most organisations. This ensures no document leaves a user's device unclassified.

# Verify default label assignment via PowerShell
Get-LabelPolicy | Select-Object Name, Settings | ForEach-Object {
    $_.Settings | Where-Object {$_ -like "*DefaultLabel*"}
}

Justification on Downgrade

When a user attempts to remove or lower a sensitivity label, require justification. This is configured in the same label policy settings - tick Require users to provide justification for removing a label or lowering its classification. The justification text is logged to the unified audit log and becomes evidence for ISO 27001 A.5.12 audits.

# Query downgrade justification events
Search-UnifiedAuditLog -StartDate (Get-Date).AddDays(-30) -EndDate (Get-Date) -Operations "SensitivityLabeledFileRenamed","SensitivityLabelRemoved","SensitivityLabelUpdated" -ResultSize 5000 | Select-Object CreationDate, UserIds, AuditData | Export-Csv -Path "C:\Audit\label-downgrades.csv" -NoTypeInformation

SharePoint Library Defaults

This is the feature most organisations miss entirely. You can set a default sensitivity label at the SharePoint document library level, meaning every document uploaded to that library inherits the label automatically - no user action required.

Navigate to the document library in SharePoint, click the gear icon, then Library settings > Default sensitivity label. Alternatively, use PowerShell:

# Set a default sensitivity label on a SharePoint library
Set-SPOSite -Identity "https://contoso.sharepoint.com/sites/finance" -SensitivityLabel "Confidential"

For project-specific sites, the library default should be set to "Confidential \ Project Specific" at provisioning time via a PnP PowerShell site template. This means that even if a user drags and drops a file without thinking, it is classified from the moment it lands.

The Five Common Failures

1. Too Many Labels

As noted above, more than 20 labels total (including sub-labels) correlates with less than 30% voluntary classification rates in measured deployments.

2. Orphaned Labels

When you delete a label that is already applied to documents, those documents retain a label GUID that no longer resolves to a policy. This breaks DLP rules that reference the label. Always retire labels by making them invisible to users first, then run a content search to find and relabel affected items before deletion.

3. Label Conflicts with DLP

If your DLP policy says "block external sharing of Confidential documents" but your sensitivity label encryption allows "anyone with the link," you have a conflict. The DLP policy wins at the transport layer, but the user experience is confusing. Align your label encryption settings with your DLP policies before go-live.

4. Mobile and Co-authoring Gaps

Sensitivity labels in Office mobile apps lag behind desktop. Test on iOS and Android before deploying encryption-enabled labels. Co-authoring with encrypted documents requires all participants to be on current channel Office builds.

5. No Monitoring

Labels are deployed and then forgotten. A monthly label analytics report addresses this:

# Get label usage analytics from Purview
Get-ComplianceTag | ForEach-Object {
    $tag = $_
    $count = (Get-ComplianceTagStorage -TagName $tag.Name).ItemCount
    [PSCustomObject]@{
        LabelName = $tag.Name
        ItemCount = $count
        LastModified = $tag.LastModifiedTime
    }
} | Sort-Object ItemCount -Descending | Format-Table -AutoSize

The Purview Compliance Portal Walkthrough

For those new to the portal, here is the recommended navigation path on day one of every engagement:

Microsoft Purview compliance portal (compliance.microsoft.com)
Left nav: Information Protection > Labels - create your taxonomy
Left nav: Information Protection > Label policies - assign labels to users, set defaults, configure downgrade justification
Left nav: Information Protection > Auto-labelling - create simulation policies first
Left nav: Data Classification > Overview - monitor label adoption metrics
Left nav: Data Classification > Content explorer - verify labels are being applied to the right content

Closing Thoughts

Sensitivity labels are the foundation upon which every other Purview workload is built - DLP, Insider Risk, Records Management, and eDiscovery all reference labels. Getting the taxonomy wrong creates compounding technical debt. Invest the first two weeks of any engagement in taxonomy workshops with legal, compliance, and IT stakeholders. Run simulations for 30 days minimum. Monitor adoption monthly. And never, ever, create more labels than your users can remember.