Backend Resource

Sectigo (Codeguard)

Sectigo (Codeguard)

The Codeguard Migration

This was a discovery project, porting the Codeguard application from Amazon Web Services (AWS) to Google Cloud Platform (GCP).

The product offered automated services to back up clients’ websites, and databases. Not every database engine was supported, and to back up websites Codeguard required firewalled access to clients repositories or network file systems.

The portions of Codeguard expected to be operational on GCP were the scheduled backup, and manual restore functionality.

Process

The high level tasks were as follows:

  1. Investigate the existing architecture, and document it
  2. Identify GCP counterparts for the AWS deployment, and improve where possible
  3. Propose the options for GCP deployment
  4. Determine the schedule of deployment for each option

Existing Architecture

The entire Codeguard product was built using a dated version of Ruby on Rails, and was a monolith running as multiple AWS EC2 instances. To scale these instances Codeguard maintained several tables that operated as job queues. If one queue got full a script was executed to manually increase instances for a specific Codeguard deployment.

The application was managing authentication and authorization manually (via Rails and Redis). Also, Codeguard had DNS routing configured with Amazon Route 53.

In terms of website backup functionality, Codeguard relied on an in-house version control system that backed up binaries to AWS S3 buckets.

In terms of database backup functionality, it only supported a few versions of MSSQL. The code executed scripts to securely connect to clients’ database clusters, infer schemas, and bulk download data between specified dates to Sectigo’s own Amazon RDS.

Logging and alerting was performed through AWS CloudWatch.

Google Cloud Platform Offerings and Proposals

For either proposal, AWS CloudWatch functionality can be replaced with GCP Cloud Monitoring.

Complete Migration

This proposal ports the deployment as is to GCP, with minor changes to work with the GCP counterparts. The AWS deployment is not necessary to maintain for the GCP deployment to function, but may be useful to remain deployed when migrating data.

For the actual service deployment, we could use Google Cloud Engine (GCE) for a 1:1 migration.

To fulfill the need for job queues, we could rely on Cloud SQL for the job tables.

Authentication and authorization can still be managed by Rails if we are porting the whole monolith to GCE. We would need to configure GCP Cloud DNS in the same way that AWS Route 53 is configured for DNS routing.

For the website backup functionality we could rely on Cloud SQL tables to track metadata about the backup, and Cloud Storage Buckets for the actual binaries.

For the database backup functionality we could rely on Cloud SQL tables to track metadata, and SQL Server on Google Cloud to store the actual data.

Break Out Backup and Restore Features

This proposal requires both AWS and GCP to be running at the same time. Some code would have to be introduced in the AWS deployment to forward requests to GCP.

For the backup/restore features, we can migrate the modules into smaller containerized services. We can then leverage Google Kubernetes Engine to enable horizontal scaling more effectively.

For the job queue implementation, Google Pub/Sub is an excellent stand-in. Pub/Sub could be configured with exactly-once delivery to help prevent double-scheduling. It’s important to have idempotent consumers to prevent double-scheduling of jobs, and some message-state persistence could be useful. I suggested either relying on Redis with Redlock, or Temporal Cron Jobs for their core distributed scheduling.

If we only break out the backup and restore modules, then we would require the AWS deployment to still be active for the entirety of Codeguard to be offered. This means that authentication and authorization is still managed by legacy Codeguard, but the AWS to GCP communication should still be secured. We agreed that using a generated API key managed by GCP’s Apigee product minimized the impact to legacy code.

Schedule for Deployment

With the Complete Migration proposal, schedules would be paused. After the new deployment succeeds, and acceptance tests pass, schedules are resumed in the new deployment. If no errors occur then the old deployment is decommissioned.

With the Break Out proposal, both GCP and AWS deployments would be active. First, the new deployment must succeed and tests must pass. Then a new deployment would be made for AWS, containing the mediator code to forward requests to GCP for the backups and restore functionality. No significant downtime is expected.

Migrating Binaries

The above proposals only outline migrating functionality. Eventually the actual binaries need to be moved to GCP.

For this phase of the project, there are two options:

  1. Migrate everything at once
  2. Have both AWS and GCP running at the same time, and forward all new backup/restore requests to GCP

Option 1 is not feasible, as the sheer amount of data could take months and the transfer bandwidth pricing is costly.

The costs from Option 2 come from running the same product on GCP and AWS. However, this organic migration would allow for the existing backups to expire naturally. Any clients not yet migrated would be asked to perform a new backup if they have not been active during the migration period. The added benefit is that we can assess if the new product is running as expected, and routing requests back legacy is quick if errors arise.

Selected Proposal

The client ended up moving with the Break Out proposal for the following reasons:

  • The legacy instance scaling code was difficult to maintain
  • In the long term, the client had wanted to break the monolith down and this was a step towards that goal
  • If the Complete Migration was selected, the version of both the Rails framework and Ruby would have to be updated, which is a non-trivial effort
  • They were looking at rewriting portions of Codeguard in Golang, so this was a good opportunity to directly tackle this tech debt
  • Job queueing in tables didn’t accurately scale the instances to the needs of the system

The main risk involved with this proposal was updating some code around forwarding requests for backups/restore in the AWS Codeguard deployment. Regardless, it was deemed necessary to make code changes in legacy Codeguard, once we enter the data migration phase of the project. It was agreed that code changes in legacy is not avoidable.