Nobody, of course plan for downtime. But problems are inevitable, and if you don’t have a plan to fix them immediately and automatically, you will lose revenue if your services go down. High availability helps you plan for worst-case scenarios.
What is high availability?
High availability (HA) is the practice of keeping all server downtime to a minimum, ideally to zero. It includes many techniques, such as auto-scaling, real-time monitoring, and automated blue / green update implementations.
The core concept is quite simple: one server is not a server. Two servers are one server. The more redundancy you plan, the higher the availability of your service. Your service should not be interrupted even if one of your components goes up in flames.
This can be achieved with something as simple as an auto-scaling group, which cloud services like AWS support very well. If a server has a problem, such as a sudden crash, the load balancer will detect that it is not responding. It can then direct traffic from the crashed server to the other servers in the cluster, and even start a new instance if it needs the capacity.
This redundant philosophy applies to all levels of your component hierarchy. For example, if you have a microservice to handle image processing of user-uploaded media, it wouldn̵
Sometimes you have to Guarantee availability for customers. If you guarantee 99.999% availability in a Service Level Agreement (SLA), this means that your service cannot be down for more than five minutes per year. This makes HA necessary for many large companies from the start.
For example, services like AWS S3 come with SLAs that guarantee 99.9999999% (nine 9 seconds) data redundancy. This basically means that all of your data is replicated in different regions, making it safe from everything except the giant meteor scenario affecting your data warehouse. Even then, with physical separation, it can be safe from small meteors, or at least safe from the much more realistic warehouse fire or power outage.
Components of good HA systems
What leads to a standstill? Barring force majeure, downtime is usually caused by human error or random failure.
Random failures cannot really be planned, but they can be planned with redundant systems. They can also be caught as they happen with good surveillance systems that can alert you to problems in your network.
Human errors can be planned. First of all, by minimizing the number of errors with careful testing environments. But everyone makes mistakes, even big companies, so you need to have a plan in place in case mistakes do occur.
Automatic scalability and redundancy
Auto-scaling is the process of automatically scaling the number of servers you have, usually during the day, to accommodate peak loads, but also in high stress situations.
One of the main ways services go down is the “hug of death” when thousands of users flock to the site or otherwise spike traffic. Without auto-scaling you are screwed up, because you can no longer boot servers and have to wait for the load to decrease or manually start a new instance to meet the demand.
Autoscaling means you will never really have to deal with this problem (although you will have to pay for the extra server time you need). This is part of the reason why services like serverless databases and AWS Lambda Functions are so great – they scale extremely well out of the box.
However, it goes beyond just automatically scaling your primary servers – if you have other components or services on your network, they should be scalable too. For example, you may need to start up additional web servers to meet the traffic needs, but if your database server is overwhelmed, you will also have a problem.
If you’d like to learn more, read our article on getting started with AWS autoscaling.
RELATED: Get started with AWS autoscaling
Monitoring includes real-time keeping of logs and statistics about your services. Doing this automatically with automatic alarms can alert you to problems in your network as they occur rather than after they affect users.
For example, you can set an alarm to go off when your server reaches 90% memory usage, which could indicate a memory leak or a problem with an overloaded application.
You can then configure this alarm to tell your auto-scaling group to add another instance or replace the current instance with a new one.
Automated blue / green updates
The most common error scenario is a failed update, when your code changes and breaks an unforeseen part of your application. This can be planned with blue / green deployments.
A blue / green implementation is a slow, gradual process where your code changes are implemented in stages rather than all at once. Imagine that you have 10 servers running the same piece of software behind a load balancer.
A normal implementation can update all of them immediately when new changes are pushed, or at least update them one at a time to avoid downtime.
A blue / green implementation would instead boot an 11th server in your auto-scaling group and install the new code changes. Once it was “green”, or accepted requests and ready to use, it would immediately replace one of the existing “blue” servers in your group. Then you would flush and repeat for every server in the cluster. Even if you only had one server, this update method would result in no downtime
Better yet, you can immediately roll back the changes to the blue servers if problems are detected with your surveillance systems and alarms. This means that even a completely failed update will not shut down your service for more than a few minutes, ideally not at all if you have multiple servers and can implement the update slowly. Blue / Green deployments can be configured so that only 10% of your servers are updated every five minutes, for example by rolling out the update slowly over the hour.