Workload migration from on-prem to AWS - MAP Assessment

A Mid-size K-12 Ed-tech SaaS

Client

A mid-size K-12 ed-tech SaaS with 20+ years’ experience builds cloud learning platforms used nationwide. Its system handles heavy concurrent traffic, making this case relevant to any content-heavy SaaS managing large unstructured data in an older co-located setup.

Challenge

  • Move stateful PHP learning platform from co-location to AWS.
  • Migrate 15-node web tier + 4-node MariaDB cluster
  • 340 TB of unstructured data (~914 M small objects) with no defined migration path.
  • 15 Mbps bandwidth made online data transfer impractical.
  • PHP app relied on local sessions, filesystem cache, and temp files, complicating scaling.
  • Database required a near-zero downtime migration while serving live users.

Key Results

  • Cut 340 TB data transfer from ~1+ year to ~5–6 weeks using a hybrid Snowball Edge + DataSync approach.
  • Slashed cloud storage costs ~62% with lifecycle transition to Glacier Instant Retrieval, paying back the one-time investment in ~6 months.
  • Enabled near-zero downtime DB migration with AWS DMS CDC and cross-region replica, achieving seconds-lag RPO and minute-scale RTO.
  • Saved ~20–25% in app infrastructure costs by using a fixed 15-instance EC2 setup instead of Auto Scaling, avoiding ~$800–$1,100/mo in excess S3 request fees.

Solution

  • Structured Engagement:Delivered a MAP Assessment with a migration roadmap, architecture design, and TCO analysis across four workstreams.
  • Infrastructure Discovery:Documented the co-location stack (15-node web tier, 4-node Galera cluster, load balancers, network), baseline OpEx, and stateful app characteristics to inform architecture.
  • Storage Migration Plan:Designed a hybrid Seed & Sync path using Snowball Edge + DataSync to migrate ~340 TB (~914 M objects) with validation layers and rollback procedures
  • Database Migration:Used AWS DMS with Full Load + CDC to achieve near-zero downtime to RDS MariaDB (Multi-AZ + cross-region replica) with cost-saving reserved instance guidance.
  • App Architecture Evaluation:Compared fixed 15-instance EC2 (recommended) vs Auto Scaling with externalized session/cache; recommended fixed for cost efficiency given workload patterns.
  • S3 Strategy & Lifecycle:Phased Intelligent-Tiering in Year 1 and Glacier Instant Retrieval from Year 2 onward (~62% storage cost reduction), with strong security controls (encryption, bucket policies, MFA Delete).
Technologies Used
  • AWS Snowball Edge Storage Optimized (Physical Data Transfer)
  • AWS DataSync (Enhanced Mode — Delta Sync)
  • Amazon S3 (Intelligent-Tiering, Glacier Instant Retrieval, Lifecycle Management)
  • AWS Database Migration Service (DMS) with Change Data Capture (CDC)
  • Amazon RDS for MariaDB (Multi-AZ, Cross-Region Read Replica)
  • Amazon EC2 (t3a.large) with Application Load Balancer (ALB)
  • AWS CloudWatch (Monitoring, Transfer Validation, DB Insights)
Summary

An ed-tech company needed to move 340 TB of unstructured data and a stateful PHP platform from an aging co-location to AWS despite a 15 Mbps link that made online transfer impractical. MAP Assessment defined a Hybrid Snowball Edge + DataSync “Seed and Sync” approach that cut the data migration from an infeasible 1+ year to ~5–6 weeks, enabled phased S3 tiering for ~62% lower ongoing storage costs, and used AWS DMS CDC to achieve near-zero downtime database migration with near-real-time RPO.

#arocom #artificialintelligence #machinelearning #datascience

Have Any Questions?