In this blog post, I will talk about some of the best practices for building cloud applications. I started working on it as a presentation for a conference however that didn’t work out thus this blog post. Please note that these are some of the best practices I think one can follow while building cloud applications running in Windows Azure. There’re many-many more available out there. This blog post will be focused on building Stateless PaaS Cloud Services (you know that Web/Worker role thingie :) utilizing Windows Azure Storage (Blobs/Queues/Tables) and Windows Azure SQL Databases (SQL Azure).
So let’s start!
Things To Consider
Before jumping into building cloud applications, there’re certain things one must take into consideration:
- Cloud infrastructure is shared.
- Cloud infrastructure is built on commodity hardware to achieve best bang-for-buck and it is generally assumed that eventually it will fail.
- A typical cloud application consist of many sub-systemswhere:
- Each sub-system is a shared system on its own e.g. Windows Azure Storage.
- Each sub-system has its limits and thresholds.
- Sometimes individual nodes fail in a datacenter and though very rarely, but sometimes entire datacenter fails.
- You don’t get physical access to the datacenter.
- Understanding latency is very important.
With these things in mind, let’s talk about some of the best practices.
Best Practices – Protection Against Hardware Issues
These are some of the best practices to protect your application against hardware issues:
Deploy multiple instances of your application.
Scale out instead of scale up or in other words favor horizontal scaling over vertical scaling. It is generally recommended that you go with more smaller sized Virtual Machines (VM) instead of few larger sized VMs unless you have a specific need for larger sized VMs.
- Don’t rely on VM’s local storage as it is transient and not fail-safe. Use persistent storage like Windows Azure Blob Storage instead.
- Build decoupled applications to safeguard your application against hardware failures.
Best Practices – Cloud Services Development
Now let’s talk about some of the best practices for building cloud services:
- It is important to understand what web role and worker role are and what benefit they offer. Choose wisely to distribute functionality between a web role and worker role.
Decouple your application logic between web role and worker role.
- Build stateless applications. For state management, it is recommended that you make use of distributed cache.
- Make proper use of service configuration / app.config / web.config files. While you can dynamically change the values in a service configuration file without redeploying, the same is not true with app.config or web.config file.
- To achieve best value for money, ensure that your application is making proper use of all VM instances in which it is deployed.
Best Practices – Windows Azure Storage/SQL Database
Now let’s talk about some of the best practices for using Windows Azure Storage (Blobs, Tables and Queues) and SQL Database.
Some General Recommendations
Here’re some recommendations I could think of:
Blob/Table/SQL Database – Understand what they can do for you. For example, one might be tempted to save images in a SQL database whereas blob storage is the most ideal place for it. Likewise one could consider Table storage over SQL database if transaction/relational features are not required.
- It is important to understand that these are shared resources with limits and thresholds which are not in your control i.e. you don’t get to set these limits and thresholds.
- It is important to understand the scalability targets of each of the storage component and design your application to stay within those scalability targets.
- Be prepared that you’ll encounter “transient errors” and have your application handle (and recover from) these transient errors.
- It is recommended that your application uses retry logic to recover from these transient errors.
- You can use TOPAZ or Storage Client Library’s built-in retry mechanism to handle transient errors. If you don’t know, TOPAZ is Microsoft’s Transient Fault Handling Application Block which is part of Enterprise Library 5.0 for Windows Azure. You can read more about TOPAZ here: http://entlib.codeplex.com/wikipage?title=EntLib5Azure.
- For best performance, co-locate your application and storage. With storage accounts, the cloud service should be in the same affinity group while with WASD, the cloud service should be in the same datacenter for best performance.
- From disaster recovery point of view, please enable geo-replication on your storage accounts.
Best Practices – Windows Azure SQL Database (WASD)
Here’re some recommendations I could think of as far as working with WASD:
- It is important to understand (and mentioned above and will be mentioned many more times in this post :)) that it’s a shared resource. So expect your requests to get throttled or timed out.
- It is important to understand that WASD != On Premise SQL Server. You may have to make some changes in your data access layer.
- It is important to understand that you don’t get access to data/log files. You will have to rely on alternate mechanisms like “Copy Database” or “BACPAC” functionality for backup purposes.
- Prepare your application to handle transient errors with WASD. Use TOPAZ for implementing retry logic in your application.
Co-locate your application and SQL Database in same data center for best performance.
Best Practices – Windows Azure Storage (Blobs, Tables & Queues)
Here’re some recommendations I could think of as far as working with Windows Azure Storage:
- (Again :)) It is important to understand that it’s a shared resource. So expect your requests to get throttled or timed out.
- Understand the scalability targets of Storage components and design your applications accordingly.
- Prepare your application to handle transient errors with WASD. Use TOPAZ or Storage Client library’s Retry Policies for implementing retry logic in your application.
Co-locate your application and storage account in same affinity group (best option) or same data center (next best option) for best performance.
- Table Storage does not support relationships so you may need to de-normalize the data.
- Table Storage does not support secondary indexes so pay special attention to querying data as it may result in full table scan. Always ensure that you’re using PartitionKey or PartitionKey/RowKey in your query for best performance.
- Table Storage has limited transaction support. For full transaction support, consider using Windows Azure SQL Database.
- With Table Storage, pay very special attention to “PartitionKey” as this is how data in a table is organized and managed.
Best Practices – Managing Latency
Here’re some recommendations I could think of as far as managing latency is concerned:
Co-locate your application and data stores. For best performance, co-locate your cloud services and storage accounts in the same affinity group and co-locate your cloud services and SQL database in the same data center.
- Make appropriate use of Windows Azure CDN.
- Load balance your application using Windows Azure Traffic Manager when deploying a single application in different data centers.
Some Recommended Reading
Though you’ll find a lot of material online, a few books/blogs/sites I can recommend are:
Cloud Architecture Patterns – Bill Wilder: http://shop.oreilly.com/product/0636920023777.do
CALM (Cloud ALM) – Simon Munro: https://github.com/projectcalm/Azure-EN
Windows Azure Storage Team Blog: http://blogs.msdn.com/b/windowsazurestorage/
Patterns & Practices Windows Azure Guidance: http://wag.codeplex.com/
What I presented above are only a few of the best practices one could follow while building cloud services. On purpose I kept this blog post rather short. In fact one could write a blog post for each item. I hope you’ve found this information useful. I’m pretty sure that there’re more. Please do share them by providing comments. If I have made some mistakes in this post, please let me know and I will fix them ASAP. If you have any questions, feel free to ask them by providing comments.