Company data backup and restore
This guide covers how to export and import complete company datasets using the management commands. These tools are designed for:
- Creating full backups of company data
- Migrating companies between environments (dev → staging → production)
- Potentially with anonymization, testing with production-like data in lower environments
Overview
The backup system consists of two management commands:
export_company_data- Exports all data for a company to JSON files and uploads to S3import_company_data- Downloads and imports company data from S3 backups
Both commands work with S3 storage.
Exporting company data
The export command creates a complete snapshot of a company's data, including infrastructure, emissions, events, users, and configuration.
Basic export usage
python manage.py export_company_data --owner "Company Name"Export command options
| Option | Description | Required | Default |
|---|---|---|---|
--owner | Company name (must be exact match) | Yes | - |
--bucket-name | S3 bucket name for upload | No | From AWS_STORAGE_BUCKET_NAME env var |
--access-key-id | S3 access key | No | From AWS_S3_ACCESS_KEY_ID env var |
--secret-access-key | S3 secret key | No | From AWS_S3_SECRET_ACCESS_KEY env var |
--endpoint-url | S3 endpoint URL | No | From AWS_S3_ENDPOINT_URL env var |
--batch-size | Records per batch for memory-efficient processing | No | 1000 |
S3 credentials
The export command requires S3 credentials, which can be provided via:
- Command-line arguments (highest priority)
- Environment variables:
AWS_STORAGE_BUCKET_NAMEAWS_S3_ACCESS_KEY_IDAWS_S3_SECRET_ACCESS_KEYAWS_S3_ENDPOINT_URL
- Django settings (lowest priority)
What gets exported
The export includes all related data for the company:
- Infrastructure: Sites, equipment, hierarchies, aerial images, pipeline systems
- Emissions: Data batches, data points, emission records, scene observations
- Events: Events, root causes, action plans, event associations
- Configuration: Notification settings, matching configurations, waffle switches
- Users: Company users, memberships, permissions (excluding passwords)
- Analytics: Analytics data snapshots
- Historical records: Complete audit trail from django-simple-history
Export output
Files are uploaded to S3 in the following structure:
s3://bucket-name/company-name/YYYY-MM-DD/
├── manifest.json # Export metadata
├── accounts_Company.json
├── accounts_User.json
├── infrastructure_Site.json
├── emissions_EmissionRecord.json
└── ... (one file per model)Memory-efficient processing
The export uses streaming writes to handle large datasets without memory issues:
- Processes records in batches (default 1000)
- Writes directly to files without accumulating in memory
- Preserves
created_atandupdated_attimestamps
Importing company data
The import command downloads backups from S3 and restores them to the database. This is useful for migrating data between environments or restoring from backups.
Basic import usage
python manage.py import_company_data \
--owner "Company Name" \
--backup-date "2026-01-23"Import command options
| Option | Description | Required | Default |
|---|---|---|---|
--owner | Company name from export | Yes | - |
--backup-date | Date of backup (YYYY-MM-DD) | Yes | - |
--models | Specific models to import (space-separated) | No | All models |
--dry-run | Preview import without making changes | No | false |
--batch-size | Records per batch | No | 1000 |
--disable-constraints | Disable FK constraints during import (faster) | No | false |
--reindex-presets | Rebuild geo filter indexes after import | No | false |
--disable-notifications | Skip importing notification settings | No | false |
--src-data-bucket-name | Source S3 bucket name | Yes | From AWS_SRC_DATA_BUCKET_NAME env var |
--src-data-access-key-id | Source S3 access key | Yes | From AWS_SRC_DATA_S3_ACCESS_KEY_ID env var |
--src-data-secret-access-key | Source S3 secret key | Yes | From AWS_SRC_DATA_S3_SECRET_ACCESS_KEY env var |
--src-endpoint-url | Source S3 endpoint URL | No | From AWS_SRC_DATA_ENDPOINT_URL env var |
S3 credentials
The import command requires two sets of S3 credentials:
- Source credentials (
AWS_SRC_DATA_*) - To download backup files from the source S3 location - Target credentials (
AWS_*) - For the target environment's S3 storage
This dual-credential setup allows importing data from a different S3 location (e.g., production backups) into another environment (e.g., staging) that has its own S3 storage.
Source credentials can be provided via:
Command-line arguments (highest priority):
--src-data-bucket-name--src-data-access-key-id--src-data-secret-access-key--src-endpoint-url
Environment variables (required):
AWS_SRC_DATA_BUCKET_NAMEAWS_SRC_DATA_S3_ACCESS_KEY_IDAWS_SRC_DATA_S3_SECRET_ACCESS_KEYAWS_SRC_DATA_ENDPOINT_URL
Target credentials are read from environment variables or Django settings:
AWS_STORAGE_BUCKET_NAMEAWS_S3_ACCESS_KEY_IDAWS_S3_SECRET_ACCESS_KEYAWS_S3_ENDPOINT_URL
Import workflow
The import process follows these steps:
- Download - Downloads JSON files from S3 to temporary directory
- Validate - Checks manifest and verifies files exist
- Import - Restores data in dependency order (foreign keys respected)
- Timestamps - Restores original
created_at/updated_atvalues - Sequences - Resets PostgreSQL sequences for auto-increment fields
- Cleanup - Removes temporary files
Selective imports
You can import specific models using the --models option:
python manage.py import_company_data \
--owner "Company Name" \
--backup-date "2026-01-23" \
--models infrastructure.Site infrastructure.EquipmentThis is useful for:
- Importing only infrastructure without emissions data
- Updating specific datasets without touching others
- Testing imports of problematic models
Preserving notification settings
When migrating data between environments, you may want to temporarily disable the target database's notification settings. Use the --disable-notifications flag:
python manage.py import_company_data \
--owner "Company Name" \
--backup-date "2026-01-23" \
--disable-notificationsThis skips importing EmissionNotificationSettings, so make sure to import that model later when needed via the --models argument.
Performance optimization
For large imports, these options can significantly improve performance:
python manage.py import_company_data \
--owner "Company Name" \
--backup-date "2026-01-23" \
--disable-constraints \
--batch-size 2000⚠️ Warning: Disabling constraints can lead to inconsistent data if the import fails partway through. Only use in controlled environments.
Dry run mode
Always test imports with --dry-run first to preview what would be imported:
python manage.py import_company_data \
--owner "Company Name" \
--backup-date "2026-01-23" \
--dry-runThis shows which files would be processed without making any database changes.
Complete import checklist
This checklist should generally be followed when performing company import.
Pre-import tasks
⚠️ Critical: Complete ALL pre-import tasks before starting the import
Scale down scheduler tasks
bash# Set scheduler container task number to 0 to prevent scheduled tasks from runningClear S3 bucket for CVX
- Delete existing plumes, aerial images, data_downloads in the target environment's S3 bucket
- The target db should be empty, so these files are orphans anyway
Increase container resources
- Increase the long-running container size (CPU/memory) (or whatever container the import is being run on)
- Import is memory-intensive and requires additional resources
Drop and recreate database
bash# On target environment # Drop and recreate db or relevant schemasRun migrations
bashpython manage.py migrateSet credentials
bash# Source credentials (where backup files are stored) export AWS_SRC_DATA_BUCKET_NAME="production-backup-bucket" export AWS_SRC_DATA_S3_ACCESS_KEY_ID="prod-access-key" export AWS_SRC_DATA_S3_SECRET_ACCESS_KEY="prod-secret-key" # Verify target credentials are set (for current environment's S3) echo $AWS_STORAGE_BUCKET_NAME echo $AWS_S3_ACCESS_KEY_ID
Run import
Execute the import in two phases:
Phase 1: Import database records
python manage.py import_company_data \
--owner "{company_name}" \
--backup-date "{YYYY-MM-DD}" \
--disable-notifications \
--reindex-presetsPhase 2: Copy S3 files
python manage.py import_company_data \
--owner "{company_name}" \
--backup-date "{YYYY-MM-DD}" \
--mode copy-filesNote: The copy-files mode transfers actual files (images, documents) from source S3 to target S3. This can take considerable time for large datasets, but the time required is offset by queuing copy jobs on the dataimport container.
Post-import tasks
Complete these tasks immediately after import finishes:
Upload user guide
- Upload company-specific user guide documentation
- Update any environment-specific links or instructions
Delete provider-specific data (if applicable for target environment)
bash# Delete Bridger and GHGSat data if not needed in non-production environments python manage.py shell >>> from emissions.models import DataBatch, EmissionRecord >>> from event_management.models import Event >>> batch_ids = DataBatch.objects.filter(data_provider__name__in=["Bridger", "GHGSat"]).values_list("pk", flat=True) >>> Event.objects.filter(main_emission_record__data_point__data_batch_id__in=batch_ids).delete() >>> EmissionRecord.objects.filter(data_point__data_batch_id__in=batch_ids).delete() >>> PlumeImage.objects.filter(data_batch_id__in=batch_ids).delete() >>> DataPoint.objects.filter(data_batch_id__in=batch_ids).delete() >>> DataBatch.objects.filter(pk__in=batch_ids).delete() >>> Scene.objects.filter(data_provider__name__in=["Bridger", "GHGSat"]).delete() >>> SiteNonDetect.objects.filter(data_provider__name__in=["Bridger", "GHGSat"]).delete()Create SSO setup for Aerscape
- Configure SSO settings in Django Admin
- Add Aerscape email domain to SSO configuration
Resize container to normal size
- Return container resources to standard allocation
- Remove the temporary resource increase from pre-import step
Post-import tasks (to complete later)
These tasks should be completed after verifying the import was successful:
Enable notifications (when ready to start sending emails)
bashpython manage.py import_company_data \ --owner "{company_name}" \ --backup-date "{YYYY-MM-DD}" \ --models emissions.EmissionNotificationSettingsImportant: Only import notification settings after verifying the environment is properly configured to send emails. This prevents accidentally spamming users during testing.
Restore scheduler tasks
bash# Set Celery worker replica count back to 1 # This re-enables automated background tasks
Import verification checklist
After completing the import, verify the following:
- [ ] Users can log in successfully
- [ ] Infrastructure (sites, equipment) displays correctly on maps
- [ ] Emission records are visible and properly matched
- [ ] Events show correct status and associations
- [ ] File uploads (aerial images, documents) are accessible
- [ ] Geo filters and presets work correctly
- [ ] No duplicate records or data inconsistencies
- [ ] Celery tasks remain disabled until verification complete
Common workflows
Full company migration (production → staging)
Export from source environment:
# On production database
python manage.py export_company_data --owner "Acme Corp"Import to target environment (requires both credential sets):
# On staging database - set source credentials to point to production S3
export AWS_SRC_DATA_BUCKET_NAME="production-backup-bucket"
export AWS_SRC_DATA_S3_ACCESS_KEY_ID="prod-access-key"
export AWS_SRC_DATA_S3_SECRET_ACCESS_KEY="prod-secret-key"
# Target credentials (AWS_STORAGE_BUCKET_NAME, etc.) should already be set for staging environment
python manage.py import_company_data \
--owner "Acme Corp" \
--backup-date "2026-01-23" \
--disable-notificationsInfrastructure-only import
# Import only sites and equipment
python manage.py import_company_data \
--owner "Acme Corp" \
--backup-date "2026-01-23" \
--models infrastructure.Site infrastructure.Equipment \
--dry-runTechnical details
Memory efficiency
Both commands use streaming I/O to handle millions of records with minimal memory:
- Export: Writes records to JSON files in batches without accumulating
- Import: Deserializes records one at a time using Django's streaming API
- Typical memory usage: ~100-200MB regardless of dataset size
Timestamp preservation
Django's auto_now and auto_now_add fields are normally excluded from serialization. These commands preserve them:
- Export: Manually extracts timestamps after serialization
- Import: Uses raw SQL with CASE statements to restore timestamps in batches
This ensures imported records maintain their original creation/modification times.
Sequence reset
PostgreSQL sequences for auto-increment primary keys are automatically reset after import to prevent ID conflicts:
- Only resets sequences for models that were imported
- Sets sequence to MAX(id) value to avoid collisions
- Handles both regular AutoField and historical model history_id fields
Import order
Models are imported in a specific order to respect foreign key dependencies. The order is defined in MODEL_IMPORT_ORDER constant in the import command.
Signal handling
Django signals are temporarily disabled during import to prevent:
- Automatic creation of related objects (e.g., notification settings)
- Triggering workflows or notifications
- Side effects from model save() methods
Troubleshooting
Export process killed
If exports fail with "Killed" message, the process likely ran out of memory. Reduce batch size or increase container size:
python manage.py export_company_data \
--owner "Company Name" \
--batch-size 500Import foreign key errors
If imports fail with foreign key constraint violations, try:
python manage.py import_company_data \
--owner "Company Name" \
--backup-date "2026-01-23" \
--disable-constraintsSequence conflicts after import
If you see "duplicate key value violates unique constraint" errors after import, sequences weren't reset properly. Manually reset:
python manage.py shell
>>> from django.db import connection
>>> cursor = connection.cursor()
>>> cursor.execute("SELECT setval('accounts_company_id_seq', (SELECT MAX(id) FROM accounts_company))")S3 connection issues
Verify credentials are set correctly.
For export:
echo $AWS_STORAGE_BUCKET_NAME
echo $AWS_S3_ACCESS_KEY_ID
echo $AWS_S3_SECRET_ACCESS_KEYFor import (requires BOTH sets):
# Source credentials (to read backup files)
echo $AWS_SRC_DATA_BUCKET_NAME
echo $AWS_SRC_DATA_S3_ACCESS_KEY_ID
echo $AWS_SRC_DATA_S3_SECRET_ACCESS_KEY
# Target credentials (for current environment's S3)
echo $AWS_STORAGE_BUCKET_NAME
echo $AWS_S3_ACCESS_KEY_ID
echo $AWS_S3_SECRET_ACCESS_KEYSource credentials can also be provided via command-line arguments.
Best practices
- Always use --dry-run first when importing to new environments
- Export regularly - Automate exports with cron or scheduled tasks
- Test restores periodically - Verify backups can be restored successfully
- Use --disable-notifications when importing to non-production environments
- Monitor S3 storage - Old backups can accumulate; implement retention policies
- Document backup dates - Keep a log of when backups were created and why
Security considerations
- User passwords are excluded from exports
- Sensitive fields like API keys should be reviewed before cross-environment imports
- S3 buckets should use appropriate IAM policies to restrict access
- Consider encrypting S3 buckets for sensitive company data