ComplyAI Data Infrastructure
This document outlines the data architecture, databases, and key data models used across the ComplyAI infrastructure.
Database Overview
The infrastructure primarily relies on PostgreSQL for relational data and ChromaDB for vector embeddings (AI/ML).
| Service | Database | ORM / Driver | Notes |
|---|---|---|---|
complyai-api | PostgreSQL | Flask-SQLAlchemy | Main business data |
api-async | PostgreSQL | AsyncPG | High-performance async access |
complyai_cms | PostgreSQL | Flask-SQLAlchemy | CMS content data |
complyai-ipu | ChromaDB | N/A | Vector embeddings for AI |
complyai-violin | ChromaDB | N/A | Vector embeddings for AI |
Key Data Models
Core API (complyai-api)
Located in app/models/core_models.py.
Ad & Spend Tracking
SpendClientAdAccounts: Tracks client ad accounts.SpendClientAdAccountsSpendData: Stores spend data associated with accounts.LineOfCreditAdAccounts: Manages credit lines for ad accounts.LineOfCreditAdAccountSpendData: Spend data specific to credit lines.FacebookCurrentAdData: Real-time ad data from Facebook.
Reporting & Compliance
DailyRejectedAdReport: Logs of rejected ads for compliance reporting.ScoreResult: Stores scoring results (likely for compliance checks).SpendClientsRatesChecker: Validates rates for clients.
Webhooks
FacebookWebhooks: Stores incoming webhook payloads from Facebook.
CMS (complyai_cms)
Located in www/<app>/models.py. Follows a modular app structure.
home: Core homepage models.news: News articles or blog posts.standardpages: Generic CMS pages.users: User profiles and authentication extensions.forms: Dynamic form builders.images: Image asset management.
AI Infrastructure
Vector Store
- ChromaDB: Used by
complyai-ipuandcomplyai-violinto store embeddings for semantic search and RAG (Retrieval-Augmented Generation) flows. - Models: Uses
HuggingFacetransformers andtorchfor embedding generation.
Data Flow
- Ingestion:
complyai-apiingests data via Webhooks (FacebookWebhooks) and Scrapers. - Processing:
api-asyncor Celery workers (incomplyai-api) process raw data. - Storage:
- Structured data -> PostgreSQL (
SpendData,AdAccounts). - Unstructured/Embedding data -> ChromaDB via
complyai-ipu.
- Structured data -> PostgreSQL (
- Presentation:
complyai-frontendandwwwfetch data via API endpoints.