You've got the nod to host your first production application in someone else's cloud. You've got hundreds of the things running on your own hardware right now, so how hard can it be? The cloud's easy right; point and click. Now, where do I find IIS in the Azure portal...

Getting Started

The Azure Portal is a very simple and, in my opinion, well thought out system; though my perspective may be tainted by having had to deal with the old portal and the transition between that and the new one.

When you log in, you'll see the Azure dashboard look something like the below:

[caption id="attachment_109" align="alignright" width="457"] AzurePortal The Azure Portal - Default Dashboard Homepage[/caption]

And from there you can click around to your heart's content, creating anything from a new VM to a complete highly available database.

Now the UI is very good and I find it can be one of the best ways to explore the capabilities of Azure and what it has to offer; it can be quite hard to get to grips with exactly what is offered by one of their products from its documentation alone and I've often found that quickly creating a throwaway resource through the portal allows me to understand the boundaries of what is on offer. Of course, the reverse can also be true; if you just click around in the UI you may never know that some products even exist or that having your service hosted outside of their recommended region pairings could result in extended downtime.

Availability

When you host anything in a third party's cloud offering you become subject to their way of doing things. For Azure and availability, one of the key things is their paired regions system.

[caption id="attachment_111" align="alignnone" width="779"] GeoRegionDataCenter Above image from https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions[/caption]

Every single Azure region is part of a pair. The UK pair comprises of the UK South and UK West regions. When you create a service in Azure, it is recommended that your host that service across a paired region so that if a single region suffers a failure your service can continue, with little or no downtime, in the second region of the pair. This becomes very important if there is a prolonged and widespread outage that affects multiple paired regions; in this scenario, Microsoft will prioritise bringing back online a single region in a pair. If you didn't know this you may have used two regions, UK South and West Europe, believing that you'd be protected because they're on different continents. Unfortunately, this isn't the case and you may well suffer a longer than necessary outage because they were both the second regions in their pair to be brought online. Microsoft's documentation in this area is well worth a read if you're considering hosting a service in Azure that you or your customers have any kind of dependence on. It can be found at the link below:

https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions

If you've got the application in two regions then you'll need some way of directing traffic to these two regions and of course Azure has an answer for this: Azure Traffic Manager. The Azure Traffic Manager is a DNS based service that detects the health of a set of endpoints and when requested, returns the records that correspond to healthy services. For an Azure web app, this would allow you to host in multiple regions and have either an Active/Active or an Active/Passive configuration. It can also help you achieve the lowest latency or geographic routing for your service; it is the way to achieve multi-region availability with Azure hosted web apps.

Disaster Recovery

Disaster recovery is an important consideration when utilising the cloud. At what point do you declare your cloud provider a total loss and look to host elsewhere? Do you have that decision point and alternatives documented as part of your DR plan? Whilst it is very unlikely that a Company the scale of Microsoft would have long-term, broad outages, that may not be the thing that kills your cloud application stone dead. A local ISP may be unable to route traffic to Azure for most or one of your biggest customers. What happens if your account runs out of credit? How do you recover from an admin deleting the wrong resource group?

In all of these cases the answer may simply be to execute your plan to manually re-host the application elsewhere; be that in your own environment or temporarily with another provider whilst you sort the mess out. With web apps, this is normally fairly simple stuff involving repointing or moving DNS records and redeploying your web application elsewhere with some minor configuration changes. As ever though, the devil is in the detail and if you haven't tested your plan out then you cannot be sure if it'll work or how long it'll take. Disaster Recovery drills are key here.

If your web app relies on some kind of database backend, things start to get a little more complicated. If you're using the Azure SQL database offering, then you can easily setup things like Geo-Replication to ensure your database is replicated to another region but this isn't a DR solution. You can backup your database and copy it somewhere, but you won't be able to restore that to anything other than Azure, limiting its effectiveness as a DR solution. Again, this is something that isn't immediately obvious and where it's very easy to make an assumption about how it should work in theory as opposed to how it does in practice. Again, testing is key here and your DR drills should help to flush out these kinds of issues.

Putting It All Together

For an example web application that uses an Azure SQL database backend, your solution may contain the following elements:

Web App hosted in both regions of a pair, UK South and UK West
SQL database configured with geo-replication between the two regions, with automatic failover
Azure Traffic Manager with each region as a single endpoint returning region based CNAME records:
- uksouth-webapp.azurewebsites.net
- ukwest-webapp.azurewebsites.net
DNS tying it all together:
- mywebapp.com
  - mywebapp.trafficmanager.net
    - uksouth-webapp.azurewebsites.net
    - ukwest-webapp.azurewebsites.net
Regular BACPAC export of the data from your Azure SQL database for DR purposes

With all of the above, you have a good foundation for a higher availability web application whilst maintaining the ability to move elsewhere if required for disaster recovery purposes.

Architecture Considerations for Azure: High Availability and Disaster Recovery

Getting Started

Availability

Disaster Recovery

Putting It All Together

Further Reading & References