Auto-scaling Google Cloud compute instances

August 29, 2018 3-minute read

A great thing about cloud computing is the ability to increase the number of computational resources that your service or application need during high demands dynamically. Auto-scaling is a feature available in any major cloud service and is based on configured thresholds that when surpassed, trigger the automatic scaling of computational resources needed. These thresholds could be, for example, CPU or memory usage, the number of HTTP requests or messages in a queue. Google Cloud Platform supports auto-scaling and I’ll demonstrate how to configure it from the terminal.

We’ll need two things for this demonstration:

An instance template
A group of managed instances

Instance template and group of managed instances

Before we start configuring the auto-scaling, it’s necessary to create an instance template. An instance template is a model from which we are able to create other instances without going through the whole configuration process. Also, an instance template is required to create a group with managed instances. Creating an instance template is possible with the following command:

gcloud compute instance-templates create unique-custom-template \
  --machine-type f1-micro \
  --image-project debian-cloud \
  --image-family debian-9

Now we have the instance template, it’s time to create the group. A group is a set of identical instances that work as a single unit. It means that if you want to make changes to any instance within the group, the whole instance group will be changed. The following command will create the group:

gcloud compute instance-groups managed create unique-custom-group \
  --base-instance-name unique-base-name \
  --template unique-custom-template \
  --zone europe-west4-a \
  --size 3

What we asked Google Cloud Platform here is to create a group based on the instance template we created and with three instances. At this point, we can scale the group up and down manually. When you change the size of the group, the Google Cloud Platform will automatically create or delete instances to match the requirements. Once the group is created, we can enable auto-scaling with the following command:

gcloud compute instance-groups managed set-autoscaling unique-custom-group \
  --max-num-replicas 9 \
  --target-cpu-utilization 0.75

What is happening now is that the instance group is set to auto-scale when the average CPU utilization for the whole group reaches 75%. Once the threshold is reached, the auto-scaler will add machines up until the maximum number of instances informed in the parameter max-num-replicas. Something important to notice is that for an auto-scaler to decide whether it needs to scale down reliably, it takes up to 10 minutes of collecting instance usage to make a decision. It might be seen as a delay but actually, it’s a built feature.

Although this was a short demonstration on how to enable auto-scaling based on CPU usage, it’s also possible to auto-scale based on HTTP requests, monitoring metrics or a combination of them. Auto-scaling is a nice technique to bring resilience to services using it and when fine-tuned, can also save money.

See you!