Microservice's reliability
There are rare cases where microservice can function independently from the rest of the system. Most of the systems are built in a way that each microservice plays a small part in it. The services are interconnected and they depend on each other. In order to perform its job reliably every part of a system must trust its collaborators.
This is very hard to accomplish in microservice architecture. There can be multiple reasons of failures of a system. Let's explore what can go wrong:
- hardware failures (individual hosts, host configuration, data centers, physical network, OS)
- communication breakage (network connectivity failure, inadequate firewall configuration, DNS errors, messaging failure, inadequate health checks)
- dependencies failure (timeouts, nonbackwards compatibility, internal component failure, external dependencies)
- service practices (poorly designed system, not mature deployment strategy, incomplete testing)
As each microservice contributes to a properly running system in total, if one of the parts does not function properly it can cause cascading failure. This carries inadequate response from the system, non reliable system functioning and many other problems. In order to ensure that a service can continue to function properly, even if a failure occurs in the collaborator service, we can use some of the following approaches:
- retries
- caching
- graceful degradation
- functional redundancy
- stubbed data
- timeouts
- deadlines
- circuit breakers
Now, it's not that only collaborators will fail to do their job, the service itself can be the one which can fail. In order to maximize availability and fault tolerance there are several different techniques:
- load balancing and health checks
- rate limits
The system is strong as its weakest link. Keep this in mind while designing it!