Case Study
Building a Scalable Image Transformation Pipeline with AWS ECS and CloudFront
Solution
How Xavor replaced legacy image processing with a modern,
cost-efficient, and high-performance solution for enterprise
e-commerce
Industry
E-Commerce / Retail
Core Technology
Cloud Architecture, DevOps, Infrastructure
Overview
At Xavor Corporation, we work with enterprise clients who rely on us for scalable, efficient digital solutions to meet the growing demands of today’s data-heavy world. For many of our clients, delivering images across a variety of channels in real-time, whether for e-commerce, media, or large-scale content platforms, requires a robust, flexible, and high-performance image management solution.
This case study details how Xavor designed and implemented a custom image transformation pipeline using AWS ECS Fargate, CloudFront, and Sharp to replace an aging Aprimo CDN system for a major enterprise client. The legacy system was becoming a significant bottleneck: basic caching capabilities, no dynamic transformations, and the unsustainable practice of pre-generating every possible image variant.
The new solution delivers transformative results: 70% faster response times, 96% cache hit rates, and significant cost savings compared to enterprise alternatives like Cloudinary and Scene7. More importantly, the architecture is designed to scale seamlessly with the client’s growing business demands.
Faster Response Times
Cache Hit Rate
Cost Reduction
Client Background
Our client is a leading enterprise in the retail and e-commerce sector, operating multiple brands and serving millions of customers across North America. Their digital presence spans web platforms, mobile applications, and partner integrations, all of which require consistent, high-quality product imagery.
With a product catalog containing hundreds of thousands of SKUs, each requiring multiple image variants (different sizes, formats, quality levels), the client was processing and serving billions of image requests monthly. The scale of operations demanded an infrastructure that could not only handle current load but also accommodate future growth without proportional cost increases.
Business Requirement
- Support for multiple image formats including WebP, AVIF, and JPEG for optimal browser compatibility
- Dynamic resizing and quality adjustment based on device and network conditions
- Sub-second response times for optimal user experience
- Cost-effective scaling to handle traffic spikes during promotional events
- Seamless integration with existing DAM (Digital Asset Management) systems
The Challenge
When you’re serving millions of product images across multiple channels, every millisecond counts. The client’s existing Aprimo CDN infrastructure was becoming a critical bottleneck that threatened both user experience and operational efficiency.
Legacy System Limitations
The existing Aprimo CDN provided basic caching capabilities but lacked the flexibility required for modern e-commerce operations. The system had several critical limitations:
No Dynamic Transformation: Every image variant had to be pre-generated and stored, creating massive storage overhead and slow time-to-market for new image requirements.
Poor Performance at Scale: Cache hit rates were stuck at 78%, meaning nearly a quarter of all requests required origin processing, leading to inconsistent response times.
Limited Format Support: Modern formats like WebP and AVIF weren’t supported, resulting in larger file sizes and slower page loads for end users.
Inflexible Architecture: Adding new transformation capabilities required significant vendor coordination and often took months to implement.
Evaluating Third-Party Solutions
Before engaging Xavor, the client evaluated several enterprise image management solutions, including industry leaders Cloudinary and Adobe Scene7. While these platforms offered robust feature sets and would have solved the technical challenges, the cost analysis revealed significant concerns:
The analysis made it clear: a custom-built solution leveraging AWS services would provide not only significant cost savings but also complete control over the architecture, enabling future enhancements without vendor dependencies.
the solution
Xavor designed and implemented a “cache-first, process-on-miss” architecture that leverages AWS-native services for optimal performance, scalability, and cost efficiency. The solution was built with a focus on simplicity, maintainability, and infrastructure-as-code principles.
Architecture Overview
The architecture follows a layered approach where requests are handled at the edge whenever possible, with origin processing only occurring when necessary. This design minimizes latency for end users while keeping compute costs under control.
Legacy System Limitations
When a request arrives at the system, it follows this optimized path:
CloudFront Edge: The request arrives at the nearest CloudFront edge location, providing global low-latency access.
CloudFront Function: A lightweight function normalizes the request path, converting query parameters into a standardized cache key format.
Cache Check: CloudFront checks its edge cache for the normalized path. If found (96% of requests), the image is returned immediately.
Origin Group (S3): On cache miss, the request goes to S3 where previously processed images are stored.
Failover to ALB: If S3 returns 404 (new image variant), CloudFront automatically fails over to the Application Load Balancer.
ECS Fargate Processing: The ALB routes to ECS Fargate tasks running Sharp for image transformation.
Evaluating Third-Party Solutions
Before engaging Xavor, the client evaluated several enterprise image management solutions, including industry leaders Cloudinary and Adobe Scene7. While these platforms offered robust feature sets and would have solved the technical challenges, the cost analysis revealed significant concerns:
Key Technical Decisions
Several critical architectural decisions shaped the final solution. Each choice was made after careful evaluation of alternatives, considering factors such as performance, cost, operational complexity, and future scalability.
Why ECS Fargate Over Lambda?
AWS Lambda was initially considered for its simplicity and automatic scaling. However, ECS Fargate proved to be the better choice for this workload for several reasons:
- No Cold Starts: With a minimum of 3 tasks always running, there's no initialization delay. Lambda cold starts can add 500ms-2s to image processing time.
- Better Resource Control: Precise CPU and memory allocation (1024 CPU units, 2048MB memory) ensures consistent performance regardless of concurrent load.
- Connection Pooling: Long-running containers maintain persistent connections to S3 and DynamoDB, eliminating connection establishment overhead.
- Cost Efficiency at Scale: For sustained workloads, Fargate's per-second billing is more economical than Lambda's per-invocation model.
ARM64 and Graviton Processors
The decision to use AWS Graviton (ARM64) processors was driven by both performance and cost considerations:
- 40% Better Price-Performance: Graviton processors offer significantly better value compared to equivalent x86 instances.
- Excellent Sharp Compatibility: The Sharp image processing library runs beautifully on ARM64, with native binaries available.
- Energy Efficiency: Lower power consumption aligns with sustainability goals while reducing operational costs.
CloudFront Origin Groups
Origin Groups provide native failover capabilities within CloudFront, eliminating the need for custom routing logic:
- Automatic Failover: When S3 returns a 404, CloudFront automatically routes to the ALB without any custom code.
- Reduced Complexity: No Lambda@Edge functions needed for routing decisions, reducing both cost and latency.
- Built-in Reliability: AWS-managed failover is more reliable than custom implementations and requires no maintenance.
Aggressive Path Normalization
Cache key normalization proved to be the single most impactful optimization for cache hit rates. The CloudFront Function transforms varied request formats into a consistent cache key:
- Support for multiple image formats including WebP, AVIF, and JPEG for optimal browser compatibility
- Dynamic resizing and quality adjustment based on device and network conditions
- Sub-second response times for optimal user experience
- Cost-effective scaling to handle traffic spikes during promotional events
- Seamless integration with existing DAM (Digital Asset Management) systems
This normalization ensures that requests with different parameter ordering or formatting still hit the same cache entry, dramatically improving hit rates.
Implementation
Infrastructure as Code
The entire infrastructure is managed with Terraform, ensuring reproducibility, version control, and easy environment management. The codebase is organized into reusable modules:
- ECS Module: Cluster configuration, service definitions, task definitions, and auto-scaling policies
- CloudFront Module: Distribution settings, origin configurations, cache behaviors, and function associations
- Networking Module: VPC, subnets, security groups, and ALB configuration
- Monitoring Module: CloudWatch dashboards, alarms, and log groups
Container Architecture
The image processing container is built on Node.js Alpine with Sharp, optimized for ARM64 architecture. Key specifications:
Performance Optimizations
Multiple optimization techniques were implemented to achieve sub-500ms P99 latency for cache misses:
Container Prewarming: Minimum 3 tasks always running ensures immediate availability without cold start delays.
Connection Pooling: Persistent connections to S3 and DynamoDB eliminate connection establishment overhead.
Parallel Operations: Source image fetch and configuration retrieval happen simultaneously.
Sharp Streaming: For large images, streaming processing reduces memory pressure and improves throughput.
Intelligent Caching: 30-day TTLs with cache key normalization achieve 96%+ hit rates.
Results
The implementation delivered significant improvements across all key performance metrics, exceeding initial project goals and providing a foundation for future scalability.
Performance Metrics
Performance Metrics
The solution achieved significant cost savings through multiple optimization strategies, reducing the total cost of ownership compared to both the legacy system and enterprise alternatives:
- Fargate Pay-as-you-go: Only pay for compute resources actually used, with no charges for idle capacity during low-traffic periods.
- Fargate Spot: Batch operations and prewarming tasks use Spot capacity for up to 70% savings on compute costs.
- Intelligent Caching: 96% cache hit rate means only 4% of requests require origin processing, dramatically reducing compute costs.
- Graviton Processors: 40% better price-performance compared to x86 instances.
- S3 Lifecycle Policies: Automatic tiering moves infrequently accessed images to cheaper storage classes.
Business Impact
Beyond the technical metrics, the new infrastructure has delivered measurable business benefits:
- Improved Page Load Times: Faster image delivery contributed to a 15% improvement in overall page load times.
- Higher Conversion Rates: The client reported a 3% increase in conversion rates, partially attributed to improved user experience.
- Reduced Operational Overhead: Serverless architecture eliminated the need for dedicated infrastructure management resources.
- Faster Time-to-Market: New image variants can be served on-demand without pre-generation, accelerating content updates.
Lessons Learned
The project provided valuable insights that will inform future implementations:
1. Cache Key Normalization is Critical
Aggressive path normalization was the single biggest factor in achieving high cache hit rates. Investing time in designing an effective normalization strategy pays dividends in reduced origin load and improved response times.
2. Origin Groups Simplify Architecture
Native CloudFront failover through Origin Groups eliminated the need for Lambda@Edge routing logic, reducing both complexity and cost. AWS-managed solutions should be preferred over custom implementations when available.
3. ARM64/Graviton is Production-Ready
The Sharp library and Node.js ecosystem run flawlessly on ARM64. The 40% price-performance improvement makes Graviton the default choice for new workloads where compatibility is confirmed.
4. Infrastructure as Code is Essential
Managing the entire stack with Terraform enabled rapid iteration during development and ensures reproducibility across environments. Version-controlled infrastructure makes auditing and disaster recovery straightforward.
Tools & tech stack
conclusion
The combination of ECS Fargate, CloudFront, and Sharp delivered a high-performance, cost-effective image transformation pipeline that exceeds the capabilities of enterprise solutions at a fraction of the cost. The architecture is built for scale, with auto-scaling capabilities that can handle traffic spikes during promotional events without manual intervention.
More importantly, the solution provides complete control over the infrastructure, enabling future enhancements without vendor dependencies. New transformation capabilities, format support, or optimization techniques can be implemented on the client’s timeline, not a vendor’s roadmap.
This project demonstrates Xavor’s ability to design and deliver enterprise-grade solutions that leverage AWS services effectively, providing our clients with competitive advantages through superior technology.
Inefficient cloud infrastructure is proving to be costly?
Xavor develops your cloud environment to be super-efficient and reliable. Partner with us if you need professional help for AWS, Azure, Google Cloud, and more.