Database Migration Tips

Best practices and expert advice for successful database migrations to Snowflake.

Planning

Always Analyze Before You Migrate

Never start a migration without understanding your source database. Run a schema analysis to identify:

- **Data types** that need conversion (ENUM, SET, custom types)
- **Tables without primary keys** (harder to sync incrementally)
- **Large tables** that need special handling
- **Stored procedures and triggers** that need rewriting
- **Foreign key relationships** for migration ordering

A 30-minute analysis can save weeks of troubleshooting.
Security

Use a Read Replica for Analysis

Don't connect migration tools directly to your production database. Instead:

1. **Create a read replica** in your cloud provider (RDS, Cloud SQL, Azure)
2. **Use a read-only user** with minimal permissions
3. **Schedule analysis during low-traffic periods**
4. **Or export schema only**: `mysqldump --no-data mydb > schema.sql`

This protects production performance and reduces security exposure.
Data Types

Plan for ENUM and SET Types

MySQL ENUM and SET types don't exist in Snowflake. Here's how to handle them:

**Option 1: VARCHAR with documentation**
```sql
-- Source: status ENUM('pending','active','closed')
-- Target:
status VARCHAR(50) COMMENT 'Values: pending, active, closed'
```

**Option 2: Lookup table** (enforces values)
```sql
CREATE TABLE status_types (status VARCHAR(50) PRIMARY KEY);
INSERT INTO status_types VALUES ('pending'),('active'),('closed');
```

**Option 3: CHECK constraint** (Snowflake supports this)
```sql
status VARCHAR(50) CHECK (status IN ('pending','active','closed'))
```
Performance

Batch Large Tables by Primary Key

Loading a 100GB table in one query will fail or timeout. Instead, batch by primary key:

```sql
-- Export in chunks
SELECT * FROM orders WHERE id BETWEEN 1 AND 1000000;
SELECT * FROM orders WHERE id BETWEEN 1000001 AND 2000000;
-- ... continue
```

**Best practices:**
- Use 1-10GB chunks depending on network speed
- Export to compressed CSV or Parquet
- Stage files in S3/GCS before loading to Snowflake
- Use Snowflake's COPY INTO with parallel file loading
- Track progress so you can resume on failure
Testing

Test in a Zero-Copy Clone

Snowflake's zero-copy cloning lets you test without extra storage costs:

```sql
-- Create instant clone of your migrated database
CREATE DATABASE mydb_test CLONE mydb_production;

-- Run all your tests
-- If something breaks, drop and re-clone
DROP DATABASE mydb_test;
```

**Use clones to:**
- Test application queries before cutover
- Validate data integrity
- Train users on Snowflake
- Run destructive tests safely
Validation

Validate Row Counts and Checksums

Never assume migration succeeded. Always validate:

**1. Row counts** (quick check)
```sql
-- Source (MySQL)
SELECT 'orders' as tbl, COUNT(*) FROM orders
UNION ALL SELECT 'users', COUNT(*) FROM users;

-- Target (Snowflake)
SELECT 'orders' as tbl, COUNT(*) FROM orders
UNION ALL SELECT 'users', COUNT(*) FROM users;
```

**2. Aggregation checks** (data integrity)
```sql
SELECT SUM(amount), MIN(created_at), MAX(created_at)
FROM orders;
```

**3. Sample comparison** (spot check)
```sql
SELECT * FROM orders WHERE id IN (1, 1000, 50000, 99999);
```
Performance

Use Clustering Keys Instead of Indexes

Snowflake doesn't use traditional indexes. For large tables (>1TB), use clustering keys:

```sql
-- Cluster by commonly filtered columns
ALTER TABLE orders CLUSTER BY (created_date, customer_id);
```

**When to use clustering:**
- Tables over 1TB
- Frequent filters on specific columns (dates, IDs)
- Range queries (BETWEEN, >, <)

**When NOT needed:**
- Small tables (< 1TB) - Snowflake handles automatically
- Tables rarely queried
- Full table scans
Planning

Set Up Incremental Sync Before Cutover

Don't do a big-bang migration. Set up continuous sync first:

**1. Initial bulk load** - Load all historical data

**2. Incremental sync** - Keep Snowflake updated
```sql
-- Requires updated_at column on source tables
SELECT * FROM orders
WHERE updated_at > '2025-01-01 00:00:00';
```

**3. Validation period** - Run both systems in parallel

**4. Cutover** - Brief maintenance window to:
- Stop writes to source
- Final sync
- Switch application connection strings
- Usually under 5 minutes

Put These Tips Into Practice

Start with a free schema analysis and get personalized migration recommendations.

Start Free Analysis