Object Storage Migration Case Study
Background
Hundreds of thousands of image files are stored on 52Poké Wiki. Previously, we used AWS S3 to store all these images, AWS Lambda for image processing (generating thumbnails and converting to WebP format), and two layers of cache (a container on Linode and Cloudflare CDN) for image distribution.
Our monthly AWS bill increased gradually due to the growing number of images and traffic each month. Even with two layers of cache, more than 280GB of traffic defaulted to AWS S3, incurring a $20 expense every month.
Fortunately, there are many S3-compatible services available at much more competitive prices. We considered switching to Cloudflare R2, Backblaze B2, or Linode Object Storage.
Choice
Currently, we have 150GB of files. It costs $2.25 every month to store on Cloudflare R2, $0.9 for Backblaze B2, and $5 for Linode Object Storage. Egress traffic charges are zero for all services: Backblaze B2 doesn’t charge for 3x the monthly data stored, Linode uses a traffic pool that sufficiently covers our usage, and Cloudflare never charges for egress bandwidth. If the data storage increases to 250GB, it will cost $3.75 on Cloudflare, $1.5 on Backblaze, and remain $5 on Linode.
However, since the main services of 52Poké are located in Linode’s Tokyo datacenter, there is significant latency when proxying from the other side of the Pacific Ocean. Linode recently announced Object Storage availability in their Osaka datacenter, and with a latency of only 8ms between Tokyo and Osaka, this is an optimal choice performance-wise. We can also eliminate one cache layer (20GB) backed by block storage, saving $2 monthly. Ultimately, by migrating from AWS S3 to Linode Object Storage, we can save $17 every month.
Steps
Creating a Bucket on Linode and Setting Up the Bucket Policy
We can create an Object Storage bucket in Akamai Cloud Manager with the bucket name media.52poke.com
.
Previously, we configured a bucket policy on S3 to allow access solely from the IP addresses of 52Poké’s cloud servers. While Akamai Cloud Manager doesn’t support direct bucket policy configurations, they can be set up using s3cmd
as per Linode’s documentation.
The bucket policy looks like this:
1 | { |
Full Migration
To minimize disruption to 52Poké Wiki during this migration, we’ll implement a two-step mechanism. First, we’ll migrate all files from the AWS S3 bucket and allow users to upload or update files. Later, we’ll conduct an incremental migration, which should take considerably less time.
A Kubernetes job will be used to run rclone sync
to sync all files to the new Linode Object Storage bucket. For the access key ID and secret access key of Linode Object Storage, we’ll use SOPS to encrypt and store them in the Git repository. These can then only be decrypted with the private key in the 52Poké Kubernetes cluster and deployed automatically.
1 | apiVersion: batch/v1 |
Disabling Uploads
In the second step, we must disable uploads on MediaWiki to prevent data loss. This can be configured via LocalSettings.php
.
1 | $wgEnableUploads = false; |
Incremental Migration
We’ll now run the Kubernetes job again to sync updates from the old AWS S3 bucket to the new one. Additionally, we’ll manually check that recently uploaded files from 52Poké Wiki have been synced to the new bucket.
Updating Malasada
Malasada is an AWS Lambda Serverless function designed to generate image thumbnails and convert images to the WebP format to conserve bandwidth.
Previously, we used an IAM policy to grant the Serverless function access to the S3 bucket. Now, with the shift from AWS S3, we’ll need to manually provide the access key ID, secret access key, and the endpoint of Linode Object Storage to the Serverless function using environment variables.
1 | const s3 = new S3({ |
Configuring Nginx
The Nginx configuration for the domains media.52poke.com
, s0.52poke.wiki
, and s1.52poke.wiki
must be updated to replace the AWS S3 domain with that of Linode Object Storage. We also removed the persistent volume claim backed by block storage from the nginx deployment.
1 | proxy_pass http://media.52poke.com.jp-osa-1.linodeobjects.com$request_uri; |
After updating the nginx configuration, we noticed all requests were failing due to permission issues. However, there was no issue when requesting the upstream Object Storage URL with curl
in the nginx container.
Upon further investigation, we determined that Linode Object Storage was blocking requests containing “X-Forwarded-For” and “X-Real-IP” headers because these headers didn’t align with the bucket policy. Removing these headers from the request resolved the issue.
1 | proxy_set_header X-Real-IP ''; |
Configuring MediaWiki
We now need to adjust the AWS S3 MediaWiki extension to use Linode Object Storage instead of AWS S3. This adjustment can be quickly made via the endpoint
parameter. Afterward, we can re-enable file uploading on MediaWiki.
1 | $wgFileBackends['s3']['endpoint'] = 'http://jp-osa-1.linodeobjects.com'; |
Configuring WordPress
The 52Poké homepage uses the WP Offload Media Lite plugin to upload files to S3. While the plugin’s admin page doesn’t offer functionality to change the object storage endpoint, our investigation of the source code revealed hooks that allow modifying the S3 connection configuration. This can be adjusted in wp-config.php
.
1 | add_filter( 'as3cf_aws_s3_client_args', function ( $args ) { |
Backup
As the migration nears completion, it’s important to remember that we have a cron job set up to back up files from AWS S3 to Backblaze B2. We’ll need to update the backup source to Linode Object Storage, which can be achieved through the environment variables of rclone
.
1 | env: |
Conclusion
52Poké Wiki successfully transitioned from AWS S3 to Linode Object Storage for image storage. Despite the availability of lower-cost options like Cloudflare R2 and Backblaze B2, Linode Object Storage was the optimal choice given its low latency benefits and the site’s primary services being located in Linode’s Tokyo datacenter. With this change, we anticipates a monthly savings of $17 and enhanced performance. Through a careful migration process, including addressing challenges with permissions and configurations, the transition was executed smoothly. This move underscores the importance of regularly evaluating and optimizing infrastructure choices to ensure both cost-effectiveness and performance for online communities.