Transferring Data between AWS Elastic cache clusters on isolated networks (without using Redis cli and dump import)

Hariharan Anantharaman
4 min readOct 4, 2023

Problem Statement

How would you transfer data between different clusters when

  • Both the clusters do not share the same network
  • Networks that both the clusters reside are not connected (and cannot be connected)
  • The cloud infrastructure team does not allow the upload of Redis dumps to S3. Hence, we cannot import using the same.
Problem stetement depicting data movement challenges
Problem statement

Context and Background

In our digital transformation program, the development and lower environment testing happens in the partner companies' networks. Then, the application (docker images) is shipped to the customer-secured environment, where the UAT users will test it. The developers add new entries every sprint and change frequently. Assume that the release happens every month and every sprint is 2 weeks.
To give a background, below are the constraints. Please note that we use AWS-managed Redis (AWS Elastic cache)

  • AWS elastic cache by design is not discoverable outside of the VPC network. Exposing via proxy using a VM is a security vulnerability (and rightly reported by AWS)
  • The lower environments and the UAT and other environments are in different networks in cloud and managed by different teams. Application teams have only limited access and cannot change networking rules.
  • The destination network (UAT and higher) does not allow outbound internet connection.
  • We cannot use redis import from dumps as the cloud security team in customers network does not allow import of external content to S3. For the redis clusters within the customers network, we were able to export from one cluster to a S3 bucket and import into another cluster.

The customer being a bank, high security restrictions are expected and getting exceptions usually take a long cycle. Any request and recommendation leading to a security risk is not approved.

As the application development happened, within a couple of releases we realised that transferring data between two clusters is a challange. The below sections describes how we achieved a balance which ensure

  • Developer productivity is high
  • Deployment time is high

The diagram below represents our deployment flow. Due to the network restrictions, options like riot did not solve our needs.

Initial approach — Hail the scripts

As with any solution designer who started his journey before the container world, we started with shell scripts. Below is the process

  • Developers do CRUD operations of Redis entries using redis-cli or the APIs we have written to perform the same.
  • During every release, SREs/developers create and export a Redis export as CSV.
  • The deployment team in the customer network receives the CSV as an email.
  • The deployment team copies the CSV file to a EC2 instances in the cluster
  • Execute redis command to update the redis cluster with content from CSV file.

Pitfalls with the the scripts approach

As you can see from above, it is a bit complicated and lot of human touch points. Below are the few challenges we faced

  • Data formatting. The contents in our cache will have special values(e.g line breaks). When we created CSV export, and reimport via redis-cli , there were a lot of issues in the imported content. To mitigate it, after creating the export, we need to format the special characters accordingly. This is a time-consuming task and with each release the time taken to perform this increased due to data size increase and dependency on the developer.
  • With the team size being very high (8 squads with each squad having 6 developers), who will take the ownership was a big question. One squad needs to learn completely what is the correct format for the data required by other squads. Additional step of review consumed lot of time and associated overhead to co-ordinate this entire sequence.
  • The deployment team needed to be more comfortable executing ad-hoc shell scripts as part of the deployment. If there are any issues running the script, the developers need to be called again.

What worked : K8S jobs

Developers in my team stepped up, understood the problem and designed a solution using K8S jobs. Our data is stored in Redis hash.Below are the steps

  • For each create hash a seperate CSV file.
  • Developers make any change (Create or update) to the keys on the file (not using CLI or API)
  • A java program was written to read the CSV file(s) and update the redis.
  • The project was extended to support multiple files and the list of files are given as configurable entries
  • The .docker file was created to execute the java program on the initiation.
  • The entire repository with the files and java code was packaged as a image.
  • The image was deployed in K8S.
  • This image acted as a one-time job which was invoked during the deployment.
Solution where developers used K8S jobs creatively
Solution

Benefits

  • Formatting issues are taken care by the developer while updating the CSV with their entries.
  • Developers were freed from deployment support. Given that we have a lot of environments from UAT to production, this was a huge productivity improvement.
  • No manual intervention by the customers deployment team. The need to execute ad hoc scripts has been eliminated.

In fact, all the challenges caused due to scripting has been addressed with this approach.

--

--