Ontario’s 2025 De-identification Guidelines for Structured Data
Disclaimer The information provided in this post is for general informational purposes only and does not constitute legal advice. Laws and regulations vary by jurisdiction, and the application of legal principles depends on specific facts and circumstances. You should consult a qualified lawyer for advice regarding your individual situation. No lwayer-client relationship is created by your use of this material.
Raising the Bar
Ontario’s Information and Privacy Commissioner (IPC) has released the 2025 De-identification Guidelines for Structured Data, providing organizations with a practical, risk-based framework governing the creation, management and monitoring of de-identified datasets. This comprehensive, risk-based guidance stands in stark contrast to the federal Office of the Privacy Commissioner of Canada, which, after more than two decades, has yet to issue anything remotely comparable to guide de-identification of structured data for legitimate business and research uses.
Why De-identification Guidance Matters
Structured data, such as spreadsheets and databases, are essential for analytics, research, and service delivery. However, hospitals and public bodies must exercise caution in using and sharing data to ensure the protection of individual privacy rights and compliance with statutory authorities.
For example, the Personal Health Information Protection Act (PHIPA) defines personal health information as identifying information about an individual that relates to, among other things, the providing of health care to the individual (s. 4(1)). “Identifying information” is defined as “information that identifies an individual or for which it is reasonably foreseeable in the circumstances that it could be utilized, either alone or with other information, to identify an individual” (s. 4(2)).
Conversely, to de-identify means “to remove any information that identifies the individual or for which it is reasonably foreseeable in the circumstances that it could be utilized, either alone or with other information, to identify the individual.” Amendments to PHIPA made in 2020 (not yet in force) will introduce regulations specifically addressing de-identification requirements.
By de-identifying personal health information, custodians under PHIPA can remove restrictions on uses and sharing without consent, thereby increasing the opportunities for research and innovation.
Key Takeaways
Key points in the IPC’s updated and expanded guidance are:
Zero Risk Is Unrealistic:
Achieving zero risk of re-identification is not practical. The goal is to reduce risk to a “very low” level, using quantitative thresholds and ongoing governance.Public vs. Non-public Data Sharing:
Public releases require aggressive data transformation, since there are effectively no enforceable controls on recipients. The probability of adversary attack is assumed to be certain, and risk must be reduced through data transformations alone.
Non-public sharing allows for enforceable controls (contracts, security, privacy protocols) and a more balanced approach. Risk is evaluated from the perspective of the anticipated data recipient, taking into account their motives, capacity, and context.
Controls and Context Matter:
For non-public data, the controls and context at the recipient are critical in the risk assessment. For public data, controls are either absent or unenforceable, so transformation is the only safeguard.Pseudonymization vs. De-identification:
Pseudonymized data—where only direct identifiers are transformed—generally remains personal information. Only when both direct and indirect identifiers are transformed, and controls are in place (where applicable), can data be considered non-personal.Risk Assessment Is Ongoing:
Risk assessments that demonstrate data are de-identified have limited time validity. They must be regularly reviewed—every two to three years, or upon any material change—to remain valid.Model-based Evaluation Is the Default:
Model-based data vulnerability evaluations are the default approach for assessing risk, rather than commissioned re-identification attacks.Maximum vs. Average Vulnerability:
Maximum vulnerability measures are used for public data releases (focusing on the most vulnerable record), while average vulnerability measures (and variants) are generally used for non-public data releases.Linking Increases Risk:
The risk of re-identification may increase after linking datasets, given the heightened vulnerability. De-identification must be performed after linking to address this.
Governance Process
The IPC sets out a 12-step de-identification governance process:
Prepare: Assemble expertise, define objectives, and provide notice to affected individuals.
Determine Release Model: Decide if data will be released publicly or to specific recipients.
Classify Variables: Identify direct, indirect, and other variables.
Pseudonymize Direct Identifiers: Use suppression, encryption, or tokenization.
Set Risk Thresholds: Use quantitative thresholds (e.g., 0.09 for low privacy invasion) to define “very low” risk.
Measure Data Vulnerability: Assess how easily records could be re-identified.
Assess Probability of Attack: Consider deliberate insider attacks, inadvertent recognition, and data breaches.
Calculate Overall Risk: Multiply vulnerability by probability of attack.
Transform Data and Controls: Apply further transformations and controls to reduce risk below the threshold.
Assess Data Utility: Ensure the transformed data remains useful.
Document Process and Results: Maintain detailed records for compliance, audits, and transparency.
Monitor and Review: Continually reassess risk and update processes as needed.
For privacy officers, legal professionals, and data custodians, these guidelines are an essential reference for responsible data governance.