Everything is getting smart. Not everyone, but everything.
My lamp, dishwasher, washing machine, car and everything in between, now connects to the internet and sends notifications about important events, from completing their duties to depleting levels of various consumables required in their functioning.
With all these devices connecting to a distributed and redundant network like the internet, which further hosts distributed systems that spans heterogeneous hardware, multiple software stacks and geographically separated data centers, something, somewhere, is bound to fail, albeit for a fraction of a second, but it will fail. The device may lose its WIFI signal momentarily, the router may experience a hiccup due to a power brownout, the optic network cable may experience the wrath of an excavator bucket or the server may die in the line of duty before the redundant one kicks in. All these are very much possibilities and we experience them all the time.
Such issues that are ephemeral in nature are known as transient errors in the programming domain. They are there momentarily and then they are not, and because of their non-persistent nature, are not easy to debug, as re-producing them may not always be feasible.
One of the proven ways to increase dependability is to increase availability of the desired product or service. Web services and applications run on a 24-hour schedule on servers that continuously consume power. One way to increase their availability is to increase the redundancy i.e. have the same service or application be available on multiple servers. The servers may all be in the same data center or geographically spread. The geographical spread helps in averting a situation where a single data center may experience a power outage, network outage or a natural calamity and take down the entire service or application.
But increasing availability always comes associated with costs. Every server/virtual machine/container cost to run and the more of them exists, the more man hours go in their upkeep and maintenance.
In order to reduce costs, without introducing redundancy beyond warranted, we incorporate smarts into software, so that the application keeps running the desired actions without throwing error, creating an illusion of high availability. Such activity which can introduce resilience in an application, can be achieved with the help of programming frameworks. One, and at the moment only framework to exists for .Net developers happen to be Polly.
With Polly, we can easily incorporate resiliency patterns such as retry pattern, circuit breaker, timeout, bulkhead isolation etc. in our applications and services. Each pattern deserves a post unto itself and in future I will bring more information on the patterns along with code examples.
Till then, please enjoy a video in which Late Mr. Scott Allen discusses about building resilient applications in cloud and look for opportunities to apply a resilient framework like Polly.