Skip to main content

ComplyAI Data Infrastructure

This document outlines the data architecture, databases, and key data models used across the ComplyAI infrastructure.

Database Overview

The infrastructure primarily relies on PostgreSQL for relational data and ChromaDB for vector embeddings (AI/ML).

ServiceDatabaseORM / DriverNotes
complyai-apiPostgreSQLFlask-SQLAlchemyMain business data
api-asyncPostgreSQLAsyncPGHigh-performance async access
complyai_cmsPostgreSQLFlask-SQLAlchemyCMS content data
complyai-ipuChromaDBN/AVector embeddings for AI
complyai-violinChromaDBN/AVector embeddings for AI

Key Data Models

Core API (complyai-api)

Located in app/models/core_models.py.

Ad & Spend Tracking

  • SpendClientAdAccounts: Tracks client ad accounts.
  • SpendClientAdAccountsSpendData: Stores spend data associated with accounts.
  • LineOfCreditAdAccounts: Manages credit lines for ad accounts.
  • LineOfCreditAdAccountSpendData: Spend data specific to credit lines.
  • FacebookCurrentAdData: Real-time ad data from Facebook.

Reporting & Compliance

  • DailyRejectedAdReport: Logs of rejected ads for compliance reporting.
  • ScoreResult: Stores scoring results (likely for compliance checks).
  • SpendClientsRatesChecker: Validates rates for clients.

Webhooks

  • FacebookWebhooks: Stores incoming webhook payloads from Facebook.

CMS (complyai_cms)

Located in www/<app>/models.py. Follows a modular app structure.

  • home: Core homepage models.
  • news: News articles or blog posts.
  • standardpages: Generic CMS pages.
  • users: User profiles and authentication extensions.
  • forms: Dynamic form builders.
  • images: Image asset management.

AI Infrastructure

Vector Store

  • ChromaDB: Used by complyai-ipu and complyai-violin to store embeddings for semantic search and RAG (Retrieval-Augmented Generation) flows.
  • Models: Uses HuggingFace transformers and torch for embedding generation.

Data Flow

  1. Ingestion: complyai-api ingests data via Webhooks (FacebookWebhooks) and Scrapers.
  2. Processing: api-async or Celery workers (in complyai-api) process raw data.
  3. Storage:
    • Structured data -> PostgreSQL (SpendData, AdAccounts).
    • Unstructured/Embedding data -> ChromaDB via complyai-ipu.
  4. Presentation: complyai-frontend and www fetch data via API endpoints.