Skip to main content

ComplyAI Data Lineage

Understanding how data flows through ComplyAI systems


Overview

Data lineage maps show how data moves from source to destination, including all transformations along the way. This documentation helps you understand where data comes from, how it's processed, and where it ends up.


High-Level Data Flow


Detailed Lineage Maps

1. User Registration & Organization Setup


2. Ad Account Synchronization


3. Ad Status Change Webhook Flow


4. AI Compliance Scoring Pipeline


5. Subscription & Billing Flow


Data Source Documentation

Primary Data Sources

SourceTypeRefresh RateOwner
Meta Graph APIExternal APIReal-time (webhooks) + 15 min syncMeta
Stripe APIExternal APIReal-time (webhooks)Stripe
User PortalInternalReal-timeComplyAI
Auth0External ServiceReal-timeAuth0

Data Stores

StoreTypePurposeBackup
PostgreSQLPrimary DBAll business dataDaily + WAL
RedisCacheSession, rate limitingHourly
S3Object StorageMedia assetsCross-region

Lineage Metadata

Tracking Information

Every data record includes lineage metadata:

-- Standard timestamp columns
created_time TIMESTAMP NOT NULL DEFAULT NOW()
updated_time TIMESTAMP NOT NULL DEFAULT NOW()

-- Additional tracking (where applicable)
created_by INTEGER REFERENCES users(id)
source_system VARCHAR(50) -- 'meta_api', 'stripe', 'manual'

Change Data Capture

Key tables use activity events for full audit trail:

activity_events
├── user_id (who)
├── action (what)
├── description (details)
├── ip_address (where)
└── created_time (when)

Impact Analysis

If Meta API Changes...

If PostgreSQL Goes Down...


Data Quality Checkpoints

CheckpointLocationValidation
Webhook signatureMaestro ingressHMAC verification
Token validityBefore Meta API callToken refresh if expired
Required fieldsDatabase insertNOT NULL constraints
Foreign keysDatabase insertFK constraints
Data freshnessScheduled jobAlert if >1 hour old

📝 Changelog

DateChange
2024-12Initial lineage documentation