Skip to main content

Troubleshooting Guide

Common issues and their solutions for ComplyAI platform


📋 Table of Contents​

  1. User-Reported Issues
  2. Data Sync Issues
  3. Authentication Issues
  4. API Errors
  5. Database Issues
  6. External Integration Issues
  7. Performance Issues
  8. Deployment Issues

User-Reported Issues​

"I can't see my ads"​

Possible Causes & Solutions:

CauseHow to CheckSolution
Ad account not syncedCheck org_ad_accounts.last_syncTrigger manual sync
Sync failedCheck activity_events for errorsReview error, fix token if needed
Wrong organizationCheck user's organization membershipVerify user_organizations table
Ad account disconnectedCheck org_ad_accounts.is_connectedUser needs to reconnect via OAuth
Recent ads (< 15 min)Check ad creation time in MetaWait for next sync cycle

Diagnostic Query:

SELECT 
oaa.id,
oaa.name,
oaa.is_connected,
oaa.last_sync,
COUNT(oa.id) as ad_count
FROM org_ad_accounts oaa
LEFT JOIN org_ads oa ON oaa.id = oa.org_ad_account_id
WHERE oaa.organization_id = {org_id}
GROUP BY oaa.id, oaa.name, oaa.is_connected, oaa.last_sync;

"My score is wrong"​

Possible Causes & Solutions:

CauseHow to CheckSolution
Score not yet calculatedCheck org_ads_score existsWait for scoring cycle (30 min)
Stale scoreCheck org_ads_score.updated_atTrigger re-scoring
Ad content changedCompare ad hash with scored versionRe-sync and re-score
Model version mismatchCheck score metadataVerify model version

Diagnostic Query:

SELECT 
oa.id,
oa.name,
oa.updated_at as ad_updated,
oas.overall_score,
oas.text_score,
oas.media_score,
oas.updated_at as score_updated
FROM org_ads oa
LEFT JOIN org_ads_score oas ON oa.id = oas.org_ad_id
WHERE oa.id = {ad_id};

"I can't connect my Meta account"​

Possible Causes & Solutions:

CauseHow to CheckSolution
OAuth popup blockedBrowser settingsEnable popups for complyai.io
Missing Meta permissionsCheck OAuth scopeUser must grant all requested permissions
Meta account restrictedMeta Business SettingsUser must resolve in Meta
Previous connection existsCheck org_business_accountsDisconnect old connection first
Auth0 session expiredCheck user sessionUser should log out and back in

OAuth Flow Diagram:

User clicks Connect → Auth0 redirects → Meta OAuth → 
User grants permissions → Redirect back → Token stored

"Notifications not working"​

Possible Causes & Solutions:

CauseHow to CheckSolution
Email notifications disabledCheck user preferencesEnable in Settings
In-app notifications disabledCheck notification settingsEnable in Settings
Email in spamCheck spam folderWhitelist @complyai.io
Notification service downCheck Triangle service healthRestart service if needed

Diagnostic Query:

SELECT 
n.id,
n.type,
n.status,
n.created_at,
n.sent_at
FROM notifications n
WHERE n.user_id = {user_id}
ORDER BY n.created_at DESC
LIMIT 20;

Data Sync Issues​

Ads Not Syncing​

Symptoms:

  • org_ad_accounts.last_sync is stale (> 30 min old)
  • New ads in Meta not appearing in ComplyAI

Diagnostic Steps:

  1. Check Celery worker status

    # Are workers running?
    celery -A complyai inspect active
  2. Check queue depth

    # Tasks waiting to be processed
    celery -A complyai inspect reserved
  3. Check for sync errors

    SELECT * FROM activity_events
    WHERE action IN ('ad_sync_started', 'ad_sync_failed', 'ad_sync_completed')
    AND created_at > NOW() - INTERVAL '1 hour'
    ORDER BY created_at DESC;
  4. Check Meta API token validity

    curl "https://graph.facebook.com/v19.0/me?access_token={token}"

Common Fixes:

  • Restart Celery workers: kubectl rollout restart deployment/celery-worker
  • Refresh token: See RB-SYNC-002 in Runbooks
  • Clear stuck tasks: celery -A complyai purge

Webhook Events Not Processing​

Symptoms:

  • Meta shows webhook delivered
  • Events not appearing in database
  • Ad status changes not reflected

Diagnostic Steps:

  1. Verify webhook endpoint is reachable

    curl -X POST https://api.complyai.io/webhooks/meta/test
  2. Check webhook logs

    SELECT * FROM webhook_events
    WHERE source = 'meta'
    ORDER BY received_at DESC
    LIMIT 20;
  3. Verify signature validation

    • Check HMAC signature matches
    • Verify app secret is correct

Common Fixes:

  • Resubscribe webhooks: See RB-SYNC-003 in Runbooks
  • Check firewall/ALB allows Meta IPs
  • Verify webhook secret in environment

Authentication Issues​

User Can't Log In​

Symptoms:

  • Login page shows error
  • User redirected back to login
  • "Invalid credentials" message

Diagnostic Steps:

  1. Check Auth0 logs

    • Login to Auth0 Dashboard
    • View Logs for user email
  2. Check user status

    SELECT id, email, is_active, auth0_user_id, created_at
    FROM users
    WHERE email = '{email}';
  3. Check if user exists in Auth0

    • Auth0 Dashboard → Users → Search

Common Fixes:

IssueSolution
User disabledRe-enable in Auth0
Password expiredUser resets password
MFA issueReset MFA in Auth0
User not in databaseSync from Auth0 or re-register
Auth0 rules blockingCheck Auth0 rules

Token Expired​

Symptoms:

  • API returns 401
  • "Token expired" error
  • User forced to re-login

For User JWT Tokens:

  • Normal behavior - user re-authenticates
  • Check Auth0 token lifetimes if too frequent

For Meta Access Tokens:

  • User tokens: 60-day expiry, user must re-auth
  • System user tokens: Should not expire, regenerate if needed

Diagnostic Query:

SELECT 
oba.id,
oba.business_id,
oba.token_expires_at,
CASE
WHEN oba.token_expires_at < NOW() THEN 'EXPIRED'
WHEN oba.token_expires_at < NOW() + INTERVAL '7 days' THEN 'EXPIRING SOON'
ELSE 'OK'
END as status
FROM org_business_accounts oba
WHERE oba.organization_id = {org_id};

API Errors​

Error Code Reference​

CodeMeaningCommon CauseSolution
400Bad RequestInvalid parametersCheck request body/params
401UnauthorizedInvalid/expired tokenRe-authenticate
403ForbiddenInsufficient permissionsCheck user role
404Not FoundResource doesn't existVerify ID/path
409ConflictDuplicate resourceCheck for existing record
422UnprocessableValidation failedCheck field requirements
429Rate LimitedToo many requestsImplement backoff
500Server ErrorApplication errorCheck logs
502Bad GatewayService unavailableCheck service health
503Service UnavailableOverloaded/maintenanceRetry later

Debugging API Requests​

Request Tracing:

# Add request ID header for tracing
curl -H "X-Request-ID: debug-$(date +%s)" \
-H "Authorization: Bearer {token}" \
https://api.complyai.io/endpoint

# Find in logs
grep "debug-{timestamp}" /var/log/complyai/*.log

Common API Issues:

IssueSymptomSolution
Missing auth header401 on all requestsAdd Authorization: Bearer {token}
Wrong content type400 or 415Set Content-Type: application/json
Invalid JSON400Validate JSON syntax
Missing required field422Check API docs for required fields

Database Issues​

Connection Pool Exhausted​

Symptoms:

  • "Connection refused" errors
  • Timeouts on database operations
  • Application hangs

Diagnostic Steps:

  1. Check active connections

    SELECT count(*) FROM pg_stat_activity;

    -- By application
    SELECT application_name, count(*)
    FROM pg_stat_activity
    GROUP BY application_name;
  2. Check for idle connections

    SELECT pid, usename, application_name, state, query_start
    FROM pg_stat_activity
    WHERE state = 'idle'
    ORDER BY query_start;
  3. Check PgBouncer stats (if used)

    psql -h pgbouncer -p 6432 pgbouncer -c "SHOW POOLS;"

Common Fixes:

  • Kill idle connections: SELECT pg_terminate_backend(pid);
  • Increase pool size (with caution)
  • Check for connection leaks in code
  • Restart application pods

Slow Queries​

Symptoms:

  • High latency on specific endpoints
  • Database CPU spikes
  • Timeout errors

Diagnostic Steps:

  1. Find slow queries

    SELECT pid, now() - query_start as duration, query
    FROM pg_stat_activity
    WHERE state = 'active'
    AND query NOT LIKE '%pg_stat_activity%'
    ORDER BY duration DESC;
  2. Check query plan

    EXPLAIN ANALYZE {slow_query};
  3. Check for missing indexes

    SELECT relname, seq_scan, idx_scan
    FROM pg_stat_user_tables
    WHERE seq_scan > idx_scan
    ORDER BY seq_scan DESC;

Common Fixes:

  • Add missing index: CREATE INDEX CONCURRENTLY idx_name ON table(column);
  • Update statistics: ANALYZE table_name;
  • Rewrite inefficient query
  • Add query timeout

Deadlocks​

Symptoms:

  • "deadlock detected" errors
  • Transactions rolling back
  • Intermittent failures on writes

Diagnostic Steps:

  1. Check for locks
    SELECT blocked_locks.pid AS blocked_pid,
    blocking_locks.pid AS blocking_pid,
    blocked_activity.query AS blocked_query,
    blocking_activity.query AS blocking_query
    FROM pg_catalog.pg_locks blocked_locks
    JOIN pg_catalog.pg_locks blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
    WHERE NOT blocked_locks.granted;

Common Fixes:

  • Ensure consistent ordering of table access
  • Reduce transaction duration
  • Use SELECT FOR UPDATE SKIP LOCKED
  • Implement retry logic in application

External Integration Issues​

Meta API Errors​

Error CodeMeaningSolution
4Application rate limitReduce request frequency, implement backoff
17User rate limitWait for reset (1 hour)
100Invalid parameterCheck API parameters
190Access token expiredRefresh token
200Permission deniedUser must grant permission
278Temporary issueRetry after delay

Diagnostic Steps:

  1. Check rate limit status

    curl "https://graph.facebook.com/v19.0/me?access_token={token}&debug=all"
    # Check x-business-use-case-usage header
  2. Validate token

    curl "https://graph.facebook.com/debug_token?input_token={token}&access_token={app_token}"

Stripe Errors​

ErrorMeaningSolution
card_declinedPayment failedCustomer updates payment
expired_cardCard expiredCustomer updates card
invalid_request_errorBad API callCheck request parameters
authentication_errorBad API keyVerify Stripe key
rate_limit_errorToo many requestsImplement backoff

Performance Issues​

High Latency​

Symptoms:

  • Slow page loads
  • API response times > 500ms
  • User complaints about speed

Diagnostic Steps:

  1. Check which service is slow

    • Review CloudWatch latency metrics by service
    • Check individual service health endpoints
  2. Check database performance

    -- Average query time
    SELECT datname, calls, total_time/calls as avg_time
    FROM pg_stat_statements
    ORDER BY total_time DESC LIMIT 20;
  3. Check external API latency

    • Review third-party status pages
    • Check timeout configurations

Common Causes & Fixes:

CauseIndicatorFix
DatabaseHigh DB latencyAdd indexes, optimize queries
External APIHigh Meta/Stripe latencyAdd caching, increase timeouts
ApplicationCPU-boundScale horizontally, optimize code
NetworkHigh latency between servicesCheck VPC configuration

Memory Issues​

Symptoms:

  • OOM (Out of Memory) kills
  • Service restarts
  • Increasing memory usage over time

Diagnostic Steps:

  1. Check container memory usage

    # ECS
    aws cloudwatch get-metric-statistics --namespace AWS/ECS \
    --metric-name MemoryUtilization --dimensions Name=ServiceName,Value={service}
  2. Check for memory leaks

    • Monitor memory over time
    • Check for growing object counts

Common Fixes:

  • Increase container memory limits
  • Fix memory leaks in code
  • Implement proper connection cleanup
  • Add request timeouts

Celery Queue Backlog​

Symptoms:

  • Tasks queuing up
  • Delayed processing
  • Flower shows large queue

Diagnostic Steps:

  1. Check queue depth

    celery -A complyai inspect active
    celery -A complyai inspect reserved
  2. Check worker status

    celery -A complyai inspect ping
  3. Check for failed tasks

    celery -A complyai inspect failed

Common Fixes:

  • Scale up workers
  • Increase worker concurrency
  • Clear stuck tasks: celery -A complyai purge
  • Fix failing tasks blocking queue

Deployment Issues​

Deployment Failing​

Symptoms:

  • ECS deployment stuck
  • Health checks failing
  • Rollback triggered

Diagnostic Steps:

  1. Check deployment status

    aws ecs describe-services --cluster production --services {service}
  2. Check task failures

    aws ecs describe-tasks --cluster production --tasks {task_arn}
  3. Check container logs

    aws logs get-log-events --log-group-name /ecs/{service} \
    --log-stream-name {stream}

Common Causes:

IssueSymptomFix
Health check failingTasks start then stopFix health endpoint, check port
Missing env varApplication crash on startAdd to task definition
Resource constraintsTask won't startIncrease CPU/memory
Image pull failedECR auth errorRefresh ECR credentials

Rollback Procedure​

  1. Identify last working task definition

    aws ecs describe-task-definition --task-definition {service} \
    --query 'taskDefinition.revision'
  2. Deploy previous version

    aws ecs update-service --cluster production --service {service} \
    --task-definition {service}:{previous_revision}
  3. Monitor rollback

    aws ecs wait services-stable --cluster production --services {service}

Quick Troubleshooting Checklist​

General Investigation Order​

  1. Is it down for everyone or just one user?

    • Check status page
    • Try different user/account
  2. What changed recently?

    • Recent deployments
    • Configuration changes
    • External service issues
  3. Check the obvious first

    • Service health
    • Database connectivity
    • External dependencies
  4. Follow the data

    • Trace request through services
    • Check each layer (UI → API → DB → External)
  5. Check logs

    • Application logs
    • Error rates
    • Recent errors

Escalation Path​

LevelWhenWho
L1First responder, common issuesOn-call engineer
L2Complex issues, databaseSenior engineer
L3Architecture issues, major incidentsEngineering lead
ExecutiveCustomer-impacting, data breachCTO/CEO


Last Updated: December 2024