Amazon Aurora, can we set up an IaaS alternative?

Do you want our logo?

Do you want our logo description

A few weeks ago, we discussed the benefits of AWS managed services and why using them is a good idea. Today, we will proceed to the second part of the article, where we will analyze some of these solutions and IaaS alternatives that we could implement.

We will conduct a theoretical exercise to understand how AWS managed services work and explore the feasibility of designing IaaS alternatives to these solutions.

When analyzing how a managed service works, we will use the sessions that AWS usually makes available, such as “Deep Dives” or “Under The Hood”. In the article, we will not be able to go into an exhaustive level of detail since it will be an infinite article.

Today, we will analyze a service with a fascinating infrastructure implementation.

Let's get started!

Amazon Aurora

In addition to including many improvements at the MySQL or PostgreSQL level, Aurora's differentiating point is how it uses its infrastructure.

How does Amazon Aurora work?

Amazon Aurora has decoupled the DB engine layer and the persistence layer. This way, it uses dedicated Storage and SSD with 6 copies distributed in 3 availability zones.

There are 2 copies of data in each general availability zone. This allows for a lot of durability and allows writing to be segregated from reading.

In the availability zone where we deployed the primary Aurora instance, we have a copy of data where we will execute the write operations, and Aurora will replicate to all the copies (the copy in the same AZ y copies in the other AZs).

But instead of using that same copy for read operations, we will use the replicated copy in the same AZ.

This allows data consistency since we can only read the data once it has been replicated in all copies. It also provides for Read Replicas, since the data is completely synchronized at the storage level.

The main advantage of this model is that decoupling storage and decoupling read and write operation allows a lot of flexibility when designing high availability without compromising performance, and Aurora has very high performance.

In this way, the Database engine is completely decoupled. Indeed, we cannot change from MySQL to PostgreSQL (because the data is not compatible), but we can change without any Instance problem since, in the end, the Aurora instance does not have any data locally. Aurora Serverless V2 is a marvel in this sense, allowing automatic vertical scaling, while in the traditional model, it allows horizontal reading scaling.

But seeing the high availability of a Read Replica in another AZ is fascinating. Suppose an AZ fails. Aurora has its data replicated on other AZs. The Read Replica can be promoted to the primary instance so that one of its Storage copies will be used for write operations, replicating in all copies, and continuing to use the other copy for read operations.

This process is speedy and runs automatically in a few minutes.

Moreover, we don't even need to raise a Read Replica since copies are always available. It would not be a High Availability mechanism because it is not automatic, but we can always raise an Aurora instance in another Zone if the primary zone has downtime.

Aurora is not just this functionality; it has many more, such as performance improvements, automatic backups with “Restore to a point in time,” a Global model to allow interregional operation, and many more.

For our exercise, we will focus only on this functionality and then see why.

Our Solution

We start with the most logical solution: setting up a MySQL or PostgreSQL instance with Read Replicas.

It is even a viable option using RDS. We do not even need to set up an EC2.

instancia de MySQL o PostgreSQL con Read Replicas.

But we don't want to use a managed service:

They seem like very similar deployments, but quite a few differences exist.

The first and main one is the latency. In this model, we do not execute the replication at the disk level; instead, it is data replication via the network. The network in AWS is indeed fast, but synchronization at the disk level is more rapid.

The second problem is that we triple the infrastructure in certain parts. It is true that in Aurora, we can have 3 instances, but as we have talked before, we can use one, and we would apply the data to the 3 AZs. This does not happen if we use EC2, since then we have to carry out this replication ourselves.

On the other hand, we have to design and plan our system in case of a fall, although it is true that we can rely on different software to implement it.

Another big problem is scaling, since scaling is going to be slower.

We must create a new instance and install the DB (We can always generate an AMI).
We have to add this instance to our Cluster.
Synchronize data via network.

In the case of vertical scaling:

We have to stop the synchronization of one of the read replicas.
Stop EC2.
Resize EC2.
Raise EC2.
Resynchronize.
Promote the new instance as the primary.

In Aurora, for horizontal scales, it is enough to add new read replicas that will be synchronized, and for vertical scales, it is enough to launch a new, larger read replica and perform a switchover.

At the level of pure infrastructure costs, an EC2 instance is cheaper, but there are certain shortcomings.

In Aurora, we can economically have horizontal and even vertical scaling with the Serverless version; we do not have to double the storage. In contrast, in this model, we have to have EBS disks for each Read Replica, which increases the cost.

On the other hand, at the cost level, we also have to take licensing into account since if we go to High-Availability models, we will have to use functionalities available in the Enterprise versions.

All this without counting backup, version management, maintenance of operating systems, etc.

However, we must still meet the main requirement, which is decoupling the storage layer.

Let's now look at a possible solution: decoupling the storage layer.

Posible solución desacoplando la capa de Storage

For this use case, we must understand some of the limitations of the EBS (Amazon Elastic Block Store) service.

EBS is a Zonal service, and therefore, the disks that we generate are only available in the zone where we create them. This has implications in this model because we cannot attach a disk from an instance in zone A to another in zone B. However, we can attach the same disk to multiple instances in the same zone using the Amazon EBS Multi-Attach functionality.

In this way, we can set up an instance that reads from the disk on which our DB is writing, and with different tools, we can synchronize the data with other instances in other AZs.

For this, we can use multiple tools, such as “drbd”, but we encounter the same problem as before: latency. This type of synchronization adds additional latency.

On the other hand, we have to manage storage instances for disk synchronization. It is true that if we have multiple Databases, we can use the same instance for several synchronizations, but it still adds complexity and cost.

Additionally, we have to manage replication and synchronization events at the disk level.

On the other hand, databases are complex in terms of storage use, and a synchronous copy of the data at a low level does not imply that the data is consistent. Usually, DBs do not write directly to disks but instead, store the writes in memory to dump them in the most optimal way possible at the disk level. On the other hand, they do not work with sequential writes to different files but always work with open data files. This implies that low-level synchronization requires additional management by the DB itself.

Almost all DBs have systems that allow this replication at a physical level, but they require additional management on our part.

In addition, we have all the drawbacks that we have already seen in the previous model: We have to manage high availability, maintenancedeSO, database version management, etc…

Conclusions

As we have seen at the end, it is possible to set up a solution similar to Aurora, but it requires managing much more infrastructure and replications.

This is common with managed services; it is possible to generate the same solution. However, that solution will be complex to manage, we will require more infrastructure than initially planned, and the performance will often worsen.

This is why managed services are usually recommended. It is true that they may seem expensive and somewhat more closed, but in the end, the advantages we have are quite a few, and many times, what we see as more closed is simply an implementation that makes it easier for us day to day.