Data Governance
Policies, Ownership, and Compliance Frameworks for ComplyAI Data Assets
π Table of Contentsβ
- Data Governance Framework
- Data Ownership Model
- Data Classification
- Data Quality Standards
- Privacy & Compliance
- Data Retention Policies
- Access Control
- Change Management
- Incident Response
- Audit & Monitoring
Data Governance Frameworkβ
Governance Principlesβ
| Principle | Description |
|---|---|
| Accountability | Every data asset has a designated owner responsible for its quality and security |
| Transparency | Data definitions, lineage, and policies are documented and accessible |
| Integrity | Data is accurate, complete, and consistent across systems |
| Security | Data is protected according to its classification level |
| Compliance | Data handling meets regulatory and contractual requirements |
| Stewardship | Data is treated as a valuable organizational asset |
Governance Structureβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA GOVERNANCE COUNCIL β
β (Executive Sponsor + Domain Leads + Compliance) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β DATA OWNERS β β DATA STEWARDS β β DATA CUSTODIANSβ
β β β β β β
β Business β β Quality & β β Technical β
β Accountabilityβ β Standards β β Implementationβ
β for domains β β Enforcement β β & Operations β
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
Roles & Responsibilitiesβ
Data Governance Councilβ
- Meets: Monthly (or as needed for urgent matters)
- Members: CEO/CTO (Sponsor), Engineering Lead, Product Lead, Compliance Lead
- Responsibilities:
- Set data governance strategy and priorities
- Approve major policy changes
- Resolve cross-domain data conflicts
- Allocate resources for governance initiatives
Data Owners (by Domain)β
| Domain | Owner Role | Responsibilities |
|---|---|---|
| Customer Domain | Head of Product | Users, Organizations, Subscriptions |
| Ad Content Domain | Head of Engineering | Ad Accounts, Ads, Media Assets |
| Compliance Domain | Head of Compliance | Reviews, Policies, Violations |
| Operational Domain | Head of Engineering | Activity Events, Notifications |
Data Stewardsβ
- Role: Cross-functional team members who ensure data quality
- Responsibilities:
- Monitor data quality metrics
- Enforce naming conventions
- Review data change requests
- Maintain documentation accuracy
Data Custodiansβ
- Role: Engineering/DevOps team members
- Responsibilities:
- Implement technical controls
- Manage database access
- Execute backup/recovery
- Monitor system health
Data Ownership Modelβ
Domain Ownership Matrixβ
| Data Entity | Domain | Owner | Steward | Primary System |
|---|---|---|---|---|
users | Customer | Product | Engineering | complyai-core |
organizations | Customer | Product | Engineering | complyai-core |
subscriptions | Customer | Finance | Engineering | complyai-core |
roles | Customer | Product | Engineering | complyai-core |
org_business_accounts | Ad Content | Engineering | Engineering | complyai-core |
org_ad_accounts | Ad Content | Engineering | Engineering | complyai-core |
org_ads | Ad Content | Engineering | Engineering | complyai-api |
org_ads_score | Compliance | Compliance | Engineering | complyai-violin |
facebook_ad_status | Compliance | Compliance | Engineering | complyai-maestro |
activity_events | Operational | Engineering | Engineering | complyai-core |
notifications | Operational | Product | Engineering | complyai-triangle |
Ownership Responsibilitiesβ
Data Owners Must:
- Define business rules for their data
- Approve access requests
- Review data quality reports monthly
- Sign off on schema changes
- Ensure compliance with regulations
System of Record (SOR)
| Entity | System of Record | Rationale |
|---|---|---|
| User Profile | complyai-core | Central authentication/identity |
| Organization Details | complyai-core | Central customer management |
| Ad Account Metadata | Meta Graph API | External authoritative source |
| Ad Content/Creative | Meta Graph API | External authoritative source |
| Compliance Scores | complyai-violin | ComplyAI's proprietary analysis |
| Billing/Subscription | Stripe | External payment processor |
Data Classificationβ
Classification Levelsβ
| Level | Label | Description | Examples |
|---|---|---|---|
| 1 | π΄ Restricted | Highly sensitive, regulatory impact | Access tokens, passwords, SSN, payment data |
| 2 | π Confidential | Business-sensitive, limited access | User PII, organization financials, contracts |
| 3 | π‘ Internal | Internal use only | Internal metrics, employee data, system configs |
| 4 | π’ Public | Can be shared externally | Marketing content, public documentation |
Data Classification by Tableβ
| Table | Classification | Rationale |
|---|---|---|
users.password | π΄ Restricted | Authentication credential |
users.access_token | π΄ Restricted | Meta API authentication |
org_business_accounts.system_user_access_token | π΄ Restricted | Service authentication |
subscriptions.stripe_customer_id | π΄ Restricted | Payment reference |
users.email | π Confidential | Personal identifiable information |
users.first_name, users.last_name | π Confidential | Personal identifiable information |
organizations.name | π Confidential | Business information |
org_ads.* | π Confidential | Client advertising data |
activity_events.* | π‘ Internal | Audit/operational data |
notifications.* | π‘ Internal | System messages |
policies.* (CMS) | π’ Public | Published compliance guidance |
Handling Requirements by Classificationβ
| Classification | Storage | Transmission | Access | Logging |
|---|---|---|---|---|
| π΄ Restricted | Encrypted at rest (AES-256) | TLS 1.2+ required | Role-based, MFA required | Full audit trail |
| π Confidential | Encrypted at rest | TLS 1.2+ required | Role-based | Access logged |
| π‘ Internal | Standard database | TLS recommended | Team-based | Standard logging |
| π’ Public | Standard | Any | Open | Minimal |
Data Quality Standardsβ
Data Quality Dimensionsβ
| Dimension | Definition | Target | Measurement |
|---|---|---|---|
| Accuracy | Data correctly represents reality | 99%+ | Comparison with source systems |
| Completeness | Required fields are populated | 98%+ | NULL/empty field counts |
| Consistency | Data agrees across systems | 99%+ | Cross-system reconciliation |
| Timeliness | Data is up-to-date | Per SLA | Lag time monitoring |
| Uniqueness | No unintended duplicates | 100% | Duplicate detection queries |
| Validity | Data conforms to business rules | 99%+ | Constraint violation counts |
Quality Rules by Entityβ
Users Tableβ
| Field | Rule | Validation |
|---|---|---|
email | Must be valid email format | Regex validation |
email | Must be unique | Unique constraint |
auth0_user_id | Must match Auth0 record | Periodic sync check |
created_at | Must not be future date | Check constraint |
Organizations Tableβ
| Field | Rule | Validation |
|---|---|---|
name | Required, non-empty | NOT NULL constraint |
stripe_customer_id | Must exist in Stripe (if set) | API verification |
subscription_status | Must be valid enum | Check constraint |
Ad Accounts Tableβ
| Field | Rule | Validation |
|---|---|---|
meta_id | Must be valid Meta ID format | Format validation |
meta_id | Must be unique per organization | Unique constraint |
status | Must match Meta current status | 15-min sync check |
Data Quality Monitoringβ
Automated Checks (Run Daily)
- NULL field analysis across critical tables
- Orphan record detection (FKs pointing to deleted records)
- Duplicate detection on unique business keys
- Cross-system reconciliation (Meta β Local counts)
- Stale data detection (records not updated per expected cadence)
Quality Dashboards
- Daily quality score by domain
- Trend analysis (week over week)
- Alert thresholds for quality degradation
Quality Issue Resolution Process
- Automated alert triggers on threshold breach
- Data Steward triages and assigns to owner
- Root cause analysis performed
- Fix implemented and validated
- Post-mortem for systemic issues
Privacy & Complianceβ
Regulatory Frameworkβ
| Regulation | Applicability | Key Requirements |
|---|---|---|
| GDPR | EU users/customers | Consent, right to erasure, data portability |
| CCPA/CPRA | California residents | Disclosure, opt-out, deletion rights |
| SOC 2 | All operations | Security, availability, confidentiality |
| Meta Platform Terms | Ad data | Data use restrictions, retention limits |
Personal Data Inventoryβ
| Data Element | Legal Basis | Retention | Subject Rights |
|---|---|---|---|
| User email | Contract performance | Account lifetime + 30 days | Access, Rectification, Erasure |
| User name | Contract performance | Account lifetime + 30 days | Access, Rectification, Erasure |
| Activity logs | Legitimate interest | 2 years | Access |
| Ad account data | Contract performance | Account lifetime + 90 days | Access, Portability |
| Payment info | Contract/Legal | 7 years (financial records) | Access |
Data Subject Rights Proceduresβ
Right to Accessβ
- User submits request via Settings or email
- Identity verification performed
- Data export generated within 30 days
- Delivered securely to verified email
Right to Erasure ("Right to be Forgotten")β
- User submits deletion request
- Verify no legal hold requirements
- Soft delete user record (anonymize PII)
- Cascade to related records per retention policy
- Confirm deletion to user within 30 days
Erasure Exceptions:
- Active subscription with outstanding balance
- Legal hold or regulatory requirement
- Data required for legal defense
Right to Rectificationβ
- User updates profile in-app, or
- Submits correction request
- Data steward reviews and applies
- Confirmation sent to user
Consent Managementβ
| Purpose | Consent Type | Collection Point | Withdrawal Method |
|---|---|---|---|
| Account creation | Explicit | Registration | Account deletion |
| Email notifications | Explicit | Onboarding | Settings toggle |
| Marketing emails | Opt-in | Registration/Dashboard | Unsubscribe link |
| Analytics cookies | Opt-in | Cookie banner | Cookie settings |
| Data sharing (Meta) | Explicit | OAuth flow | Disconnect account |
Data Retention Policiesβ
Retention Scheduleβ
| Data Category | Retention Period | Archive Policy | Deletion Method |
|---|---|---|---|
| User Accounts | Account lifetime + 30 days | N/A | Hard delete after anonymization |
| Organization Data | Account lifetime + 90 days | Archive to cold storage | Hard delete |
| Ad Data (org_ads) | 2 years from creation | Archive after 1 year | Hard delete |
| Compliance Scores | 2 years from creation | Archive after 1 year | Hard delete |
| Activity Logs | 2 years | Archive after 6 months | Hard delete |
| Webhook Events | 90 days | N/A | Hard delete |
| Session Data | 30 days | N/A | Auto-expire |
| Backup Data | 30 days (daily), 1 year (monthly) | Encrypted S3 | Lifecycle policy |
| Financial Records | 7 years | Required for compliance | Legal hold |
Retention Triggersβ
| Trigger Event | Action | Timeline |
|---|---|---|
| User requests account deletion | Begin deletion workflow | Within 30 days |
| Organization churns | Mark for retention countdown | 90-day countdown |
| Subscription expires (no renewal) | Mark inactive | After 30-day grace |
| Retention period expires | Automated deletion job | Nightly job |
Archive Processβ
- Identify records meeting archive criteria
- Export to compressed, encrypted format
- Upload to S3 Glacier
- Verify archive integrity
- Remove from production database
- Update audit log
Legal Hold Processβ
- Legal/Compliance issues hold request
- Tag affected records with hold flag
- Exclude from automated deletion
- Maintain until hold released
- Document hold duration and reason
Access Controlβ
Access Control Modelβ
ComplyAI uses Role-Based Access Control (RBAC) with the following hierarchy:
βββββββββββββββββββββββββββββββββββββββββββββββ
β PLATFORM ADMIN β
β (ComplyAI Staff - Full System Access) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββ
β ORGANIZATION OWNER β
β (Customer Admin - Full Org Access) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββ΄ββββββββββββββ
βΌ βΌ
βββββββββββββββββ βββββββββββββββββ
β ADMIN β β MEMBER β
β β β β
β Manage Users β β View/Edit Ads β
β Manage Accts β β View Reports β
β View Reports β β β
βββββββββββββββββ βββββββββββββββββ
Permission Matrixβ
| Action | Platform Admin | Org Owner | Admin | Member |
|---|---|---|---|---|
| View organization data | β | β | β | β |
| Edit organization settings | β | β | β | β |
| Add/remove users | β | β | β | β |
| Connect ad accounts | β | β | β | β |
| View ads | β | β | β | β |
| Edit ads/feedback | β | β | β | β |
| View billing | β | β | β | β |
| Change subscription | β | β | β | β |
| Access other organizations | β | β | β | β |
| Access admin dashboard | β | β | β | β |
Database Access Controlβ
| Access Level | Who | Access Method | Audit |
|---|---|---|---|
| Production Read/Write | Designated DBAs only | Bastion host + MFA | Full query logging |
| Production Read-Only | Senior Engineers (approved) | Bastion host + MFA | Full query logging |
| Staging Full Access | Engineering team | VPN + credentials | Standard logging |
| Local Development | All developers | Local containers | N/A |
Access Request Processβ
- Submit access request via internal tool
- Manager approval
- Data Owner approval (for sensitive data)
- IT provisions access
- Access reviewed quarterly
Service-to-Service Authenticationβ
- Internal services use API keys stored in AWS Secrets Manager
- Service mesh validates service identity
- All inter-service calls logged
Change Managementβ
Schema Change Processβ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Request βββββΆβ Review βββββΆβ Test βββββΆβ Deploy β
β β β β β β β β
β RFC Document β β Data Owner β β Staging β β Production β
β Impact β β DBA Review β β Integration β β Rollback β
β Analysis β β Security β β Tests β β Plan Ready β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
Change Categoriesβ
| Category | Examples | Approval Required | Lead Time |
|---|---|---|---|
| Emergency | Security patches, critical bugs | Post-hoc approval | Immediate |
| Standard | New columns (nullable), new indexes | Team lead | 1 sprint |
| Significant | New tables, column type changes | Data Owner + DBA | 2 sprints |
| Major | Schema redesign, migrations | Governance Council | 1 quarter |
Schema Change Checklistβ
Before Change:
- Impact analysis completed
- Backward compatibility assessed
- Rollback plan documented
- Data migration script tested
- Documentation updated
- Stakeholders notified
During Change:
- Change window communicated
- Monitoring in place
- Rollback ready to execute
After Change:
- Verification tests passed
- Performance validated
- Documentation finalized
- Change logged in changelog
API Change Managementβ
| Change Type | Versioning | Deprecation Notice |
|---|---|---|
| Breaking changes | New version (v2, v3) | 6 months minimum |
| Additive changes | Same version | N/A |
| Bug fixes | Same version | N/A |
| Deprecations | Mark deprecated | 3 months minimum |
Incident Responseβ
Data Incident Classificationβ
| Severity | Description | Response Time | Escalation |
|---|---|---|---|
| P1 - Critical | Data breach, mass data loss | Immediate | CEO, Legal, All Hands |
| P2 - High | Data corruption, unauthorized access | < 1 hour | CTO, Security Lead |
| P3 - Medium | Data quality issues, minor exposure | < 4 hours | Data Owner, Engineering |
| P4 - Low | Documentation gaps, minor inconsistencies | < 24 hours | Data Steward |
Incident Response Procedureβ
Phase 1: Detection & Triage
- Incident detected (monitoring, user report, audit)
- On-call engineer triages
- Severity assigned
- Incident channel created (#incident-YYYYMMDD)
Phase 2: Containment
- Isolate affected systems if needed
- Preserve evidence (logs, snapshots)
- Stop ongoing data exposure
- Notify affected parties (if P1/P2)
Phase 3: Investigation
- Root cause analysis
- Scope determination (what data, how many records)
- Timeline reconstruction
- Document findings
Phase 4: Remediation
- Fix vulnerability/issue
- Restore data if needed
- Verify fix effectiveness
- Update monitoring/alerts
Phase 5: Post-Incident
- Blameless post-mortem
- Update runbooks/procedures
- Regulatory notifications (if required)
- Customer communications (if required)
Breach Notification Requirementsβ
| Regulation | Notification Timeline | Who to Notify |
|---|---|---|
| GDPR | 72 hours | Supervisory authority + affected users |
| CCPA | "Without unreasonable delay" | Affected California residents |
| Contractual | Per agreement | Affected customers |
Audit & Monitoringβ
Audit Logging Requirementsβ
| Event Type | Logged Fields | Retention |
|---|---|---|
| User login | user_id, timestamp, IP, success/fail | 2 years |
| Data access (sensitive) | user_id, table, record_id, timestamp | 2 years |
| Data modification | user_id, table, record_id, old/new values, timestamp | 2 years |
| Admin actions | user_id, action, target, timestamp | 2 years |
| API calls | endpoint, user/service, params, response_code | 90 days |
| System events | service, event_type, details, timestamp | 90 days |
Audit Log Structureβ
{
"timestamp": "2024-12-08T14:30:00Z",
"event_type": "data_access",
"actor": {
"type": "user",
"id": "user_123",
"ip": "192.168.1.1"
},
"resource": {
"type": "table",
"name": "users",
"record_id": "456"
},
"action": "read",
"outcome": "success",
"metadata": {
"fields_accessed": ["email", "first_name"],
"query_context": "user_profile_view"
}
}
Monitoring Dashboardsβ
Data Quality Dashboard
- Completeness scores by table
- Freshness (time since last update)
- Anomaly detection alerts
- Cross-system reconciliation status
Security Dashboard
- Failed login attempts
- Unusual access patterns
- Privilege escalations
- API rate limit breaches
Compliance Dashboard
- Data subject requests (pending/completed)
- Consent status distribution
- Retention policy compliance
- Encryption status
Periodic Reviewsβ
| Review Type | Frequency | Participants | Output |
|---|---|---|---|
| Access review | Quarterly | Data Owners, IT | Access adjustments |
| Quality review | Monthly | Data Stewards | Quality improvement plan |
| Policy review | Annually | Governance Council | Policy updates |
| Compliance audit | Annually | External auditor | Audit report |
| Penetration test | Annually | Security firm | Remediation plan |
Governance Metrics & KPIsβ
| Metric | Target | Current | Trend |
|---|---|---|---|
| Data quality score (overall) | >95% | TBD | β |
| Critical field completeness | >99% | TBD | β |
| Data subject request SLA | 100% within 30 days | TBD | β |
| Incident response SLA | 100% within target | TBD | β |
| Documentation coverage | 100% of tables | TBD | β |
| Access review completion | 100% quarterly | TBD | β |
Related Documentsβ
- Data Dictionary - Field-level definitions
- Data Lineage - Data flow documentation
- Service Architecture - System design
- Quick Reference - Common lookups
Changelogβ
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2024-12 | Data Team | Initial governance framework |
Last Updated: December 2024