Microservices Patterns: Scalability and Resource Management with…

Do you want our logo?

Do you want our logo description

We begin the seventh installment of this series on architecture patterns. Here are the previous articles in case you missed any:

In this post, we will explore a case of scalability and resource management: auto-scaling.

Auto-Scaling Concept

In modern application development and deployment, especially in cloud environments, the ability to efficiently scale resources is crucial. Auto-scaling is a technique that allows automatically adjusting the number of resources assigned to an application based on current demand, ensuring that resources are optimally utilized and that the application can handle workload variations without manual intervention.

This includes adding more server instances when demand increases or reducing the number of instances when demand decreases, thereby optimizing both cost and performance.

Types of Scaling

Vertical Scaling (Scale-Up). Increasing the resources (CPU, memory) of a single server instance. This is usually done by adding more power to an existing machine, often requiring manual intervention and even service downtime. Therefore, while it is worth mentioning, it is not considered auto-scaling.

Horizontal Scaling (Scale-Out, Scale-In). Increasing the number of server instances running the application. This is the most common approach in cloud environments due to its flexibility and ability to handle large volumes of traffic. Similarly, instances can be reduced when demand decreases.

Benefits of Auto-Scaling

Efficient resource utilization: Ensures that resources are optimally used, reducing operational costs.
Improved availability: Helps maintain application availability even under varying workloads and traffic spikes.
Performance optimization: Adjusts resources to maintain the desired application performance.
Reduced manual intervention: Minimizes the need for manual adjustments, allowing operations teams to focus on more strategic tasks.

Challenges of Auto-Scaling

Configuration and parameter tuning:

Proper thresholds: Defining the right thresholds to trigger scaling is critical. Too low a threshold can cause unnecessary scaling, while too high a threshold can lead to performance degradation.
Scaling policies: Creating policies that properly fit application needs requires a deep understanding of workload behavior and traffic patterns.

Scaling latency:

Instance startup time: There can be a significant delay between the scaling decision and the availability of new resources. For example, launching a new server instance can take several minutes.
Temporary performance impact: During this waiting period, the application may experience performance degradation if sufficient resources are not available.

Cost:

Unexpected costs: If not properly configured, auto-scaling can lead to significant operational cost increases due to over-provisioning.
Maintenance costs: Constant monitoring and tuning of scaling policies to optimize cost and performance can be challenging.

Configuration complexity:

Complex dependencies: Applications with multiple services and dependencies can be difficult to scale properly. For example, scaling a database service may also require scaling dependent services.
Mixed scaling strategies: Implementing a combination of vertical and horizontal scaling adds complexity to configuration and resource management.

Monitoring and alerts:

Accurate metrics: The effectiveness of auto-scaling depends on the accuracy of monitored metrics. Poorly configured or inaccurate metrics can lead to incorrect scaling decisions.
Relevant alerts: Setting up relevant alerts to monitor performance and the status of scaled resources is essential to detect issues in time and take corrective action.

Testing and validation:

Load testing: It is necessary to conduct load tests to validate that the scaling policies work as expected under different traffic scenarios.
Scenario simulation: Simulating traffic spikes and resource failures helps ensure that the auto-scaling system responds appropriately.

Implementing Auto-Scaling

The implementation of auto-scaling generally involves using cloud services that support this functionality. Major cloud providers such as AWS, Azure, and Google Cloud offer built-in auto-scaling capabilities.

AWS Auto-Scaling

AWS provides several tools for auto-scaling, including:

Auto Scaling Groups (ASG). Allows defining policies to add or remove EC2 instances based on metrics such as CPU usage, network traffic, or custom metrics.
Elastic Load Balancing (ELB). Distributes network traffic across multiple instances, ensuring that no instance is overloaded.
AWS Lambda. Automatically scales serverless functions in response to incoming traffic, handling function execution in parallel as needed.

Example of an Auto Scaling Group configuration in AWS:

{
  "AutoScalingGroupName": "my-auto-scaling-group",
  "LaunchConfigurationName": "my-launch-configuration",
  "MinSize": 1,
  "MaxSize": 10,
  "DesiredCapacity": 2,
  "AvailabilityZones": ["us-west-2a", "us-west-2b"],
  "HealthCheckType": "EC2",
  "HealthCheckGracePeriod": 300,
  "Tags": [
    {
      "Key": "Name",
      "Value": "my-instance",
      "PropagateAtLaunch": true
    }
  ],
  "TerminationPolicies": ["OldestInstance"]
}

AutoScalingGroupName: The name of the auto-scaling group.
LaunchConfigurationName: The launch configuration defining how EC2 instances are launched.
MinSize: The minimum number of instances in the group.
MaxSize: The maximum number of instances in the group.
DesiredCapacity: The initial number of instances to run.
AvailabilityZones: The availability zones where instances will be launched.
HealthCheckType: The type of health check (EC2 in this case).
HealthCheckGracePeriod: The grace period for health checks.
Tags: Tags applied to instances.
TerminationPolicies: Policies defining how instances are terminated (in this case, the oldest instance is terminated first).

Azure Auto Scale

Azure also provides auto-scaling capabilities through:

Azure Virtual Machine Scale Sets. Allows creating and managing a group of identical virtual machines, automatically adjusting the number of instances.
Azure App Services. Offers auto-scaling for web, API, and mobile applications.
Azure Functions. Automatically scales serverless applications based on demand.

Example of an Auto Scale configuration in Azure App Services:

{
  "location": "East US",
  "properties": {
    "profiles": [
      {
        "name": "Profile1",
        "capacity": {
          "minimum": 1,
          "maximum": 10,
          "default": 2
        },
        "rules": [
          {
            "metricTrigger": {
              "metricName": "CpuPercentage",
              "metricResourceUri": "<resource-uri>",
              "timeGrain": "PT1M",
              "statistic": "Average",
              "timeWindow": "PT5M",
              "timeAggregation": "Average",
              "operator": "GreaterThan",
              "threshold": 70
            },
            "scaleAction": {
              "direction": "Increase",
              "type": "ChangeCount",
              "value": "1",
              "cooldown": "PT5M"
            }
          },
          {
            "metricTrigger": {
              "metricName": "CpuPercentage",
              "metricResourceUri": "<resource-uri>",
              "timeGrain": "PT1M",
              "statistic": "Average",
              "timeWindow": "PT5M",
              "timeAggregation": "Average",
              "operator": "LessThan",
              "threshold": 30
            },
            "scaleAction": {
              "direction": "Decrease",
              "type": "ChangeCount",
              "value": "1",
              "cooldown": "PT5M"
            }
          }
        ]
      }
    ]
  }
}

location: Resource location.
profiles: Defines scaling profiles.
name: Profile name.
capacity: Defines the minimum, maximum, and default capacity.
rules: Scaling rules based on metrics.
- metricTrigger: Conditions based on metrics (CPU in this case).
- metricName: Metric name.
- metricResourceUri: URI of the monitored resource.
- timeGrain: Sampling frequency.
- statistic: Statistical method used (average).
- timeWindow: Time window for evaluation.
- timeAggregation: Aggregation method.
- operator: Comparison operator.
- threshold: Threshold for the scaling action.
scaleAction: Scaling action.
- direction: Scaling direction (increase or decrease).
- type: Type of action (change in the instance count).
- value: Change value.
- cooldown: Cooldown period between actions.

Google Cloud Auto Scaling

Google Cloud offers auto-scaling capabilities through:

Google Kubernetes Engine (GKE). Automatically scales nodes in a Kubernetes cluster based on workload.
Compute Engine Autoscaler. Adjusts the number of VM instances in a managed instance group in response to demand.
Google Cloud Functions. Automatically scales serverless functions based on incoming traffic.

Example of an Auto Scaling configuration in Google Compute Engine:

autoscalingPolicy:
  minNumReplicas: 1
  maxNumReplicas: 10
  coolDownPeriodSec: 60
  cpuUtilization:
    utilizationTarget: 0.6

autoscalingPolicy: defines the auto-scaling policy.
minNumReplicas: minimum number of replicas.
maxNumReplicas: maximum number of replicas.
coolDownPeriodSec: cooldown period in seconds.
cpuUtilization: CPU utilization.
- utilizationTarget: utilization target (60% in this case).

Auto-Scaling Strategies

Proactive Scaling

Adjusts resources based on known traffic patterns or predictions. For example, increasing capacity before a planned event.

Predictive Analysis

Utilizing predictive analytics and machine learning to anticipate traffic spikes and adjust scaling thresholds accordingly.

Example of a predictive scaling policy in AWS:

{
  "AutoScalingGroupName": "my-auto-scaling-group",
  "LaunchConfigurationName": "my-launch-configuration",
  "MinSize": 1,
  "MaxSize": 10,
  "DesiredCapacity": 2,
  "AvailabilityZones": ["us-west-2a", "us-west-2b"],
  "HealthCheckType": "EC2",
  "HealthCheckGracePeriod": 300,
  "Tags": [
    {
      "Key": "Name",
      "Value": "my-instance",
      "PropagateAtLaunch": true
    }
  ],
  "TerminationPolicies": ["OldestInstance"]
}

This example is similar to the previous AWS Auto Scaling Group example, but applied in a predictive scaling context where resources are adjusted based on anticipated traffic patterns. It uses the same basic configurations but relies on predictive analytics to fine-tune thresholds and trigger scaling actions.

Reactive Scaling

Adjusts resources in response to real-time demand. For example, adding instances when CPU usage exceeds a predefined threshold.

Continuous Monitoring

Continuously monitor resource usage and dynamically adjust scaling thresholds based on real-time performance.

Warm pool configuration in AWS:

warmPool:
  MinSize: 2
  MaxSize: 5
  InstanceReusePolicy:
    ReuseOnScaleIn: true

warmPool: configuration of a pool of pre-warmed instances.
MinSize: minimum size of the warm pool.
MaxSize: maximum size of the warm pool.
InstanceReusePolicy: policy for reusing instances.
- ReuseOnScaleIn: reuse instances when scaling in.

Scheduled Scaling

Adjusts resources at specific times based on predictable usage patterns. For example, increasing capacity during business hours and reducing it at night.

Scheduled Tasks: define scaling schedules based on predictable traffic patterns, such as business hours or special events.

Example of scheduled scaling configuration in AWS:

{
  "AutoScalingGroupName": "my-auto-scaling-group",
  "ScheduledActions": [
    {
      "ScheduledActionName": "ScaleUpMorning",
      "Recurrence": "0 8 * * *",
      "MinSize": 5,
      "MaxSize": 15,
      "DesiredCapacity": 10
    },
    {
      "ScheduledActionName": "ScaleDownEvening",
      "Recurrence": "0 18 * * *",
      "MinSize": 1,
      "MaxSize": 5,
      "DesiredCapacity": 2
    }
  ]
}

AutoScalingGroupName: name of the auto-scaling group.
ScheduledActions: scheduled scaling actions.
- ScheduledActionName: name of the scheduled action.
- Recurrence: schedule for the action recurrence (in cron format).
- MinSize: minimum size during the action.
- MaxSize: maximum size during the action.
- DesiredCapacity: desired capacity during the action.

Conclusion

Auto-scaling is an essential technique for efficient resource management in modern applications, especially in cloud environments. Despite the challenges it presents, such as parameter configuration and adjustment, scaling latency, and costs, its benefits in terms of resource efficiency, availability improvement, and performance optimization make it indispensable.

The proactive, reactive, and scheduled strategies, along with practical examples in AWS, Azure, and Google Cloud, demonstrate how to effectively implement auto-scaling to handle workload variability and ensure applications remain available and efficient.

Yavé Guadaño

Since 2006, I’ve been dedicated to this profession (at Paradigma Digital since 2009). My career has been shaped more by people than by projects. With hard work and dedication, anything can be achieved. Always connected to technology, I love creating PoCs for new trends and staying up-to-date. I currently work at Paradigma as a Software Architect.

View more of Yavé.