Amazon Web Services (AWS) announced global availability of Elastic Block Storage (EBS) optimized support for four extra Elastic Cloud Computing (EC2) instance types. The support enables optimized performance between standard and provisioned IOP EBS volumes and EC2 instances to meet different bandwidth or throughput needs (learn more about AWS EBS, EC2, S3 and Glacier here).
The four EBS optimized instance types are m3.xlarge, m3.2xlarge, m2.2xlarge and c1.xlarge for dedicated bandwidth or throughput between the EC2 instances and EBS volumes. The performance or bandwidth ranges from 500 Mbits (500 / 8 = 62.5 MBytes) per second, to 1,000 Mbits (1,000 / 8 = 125MBytes) per second depending on the type of instance.
As a refresher, EC2 instances (why by time you read this could change) vary in size and functionality with different amounts of EC2 Unit of Compute (ECU), number of virtual cores, amount of storage space included, 32 or 64 bit, storage and networking IO performance, and EBS Optimized or not. In addition to instances, different operating system images can be installed using those licensed from AWS such as various Windows and Unix or supply your own.
There are also different generations of instances such as M1 (first generation where one ECU = 1.0 to 1.2 Ghz of a 2007 era Opteron or Xeon processor), M3 (second generation with faster processors) along with Micro low-cost options. There are also other optimized instances including high or large amounts of memory, high CPU or compute processing, clustered compute, high memory clustered, clustered GPU (e.g. using Nivida Tesla GPUs), high IO and high storage space capacity needs
Here is the announcement from AWS:
Dear Amazon Web Services Customer,
We are delighted to announce the global availability of EBS-optimized support for four additional instance types: m3.xlarge, m3.2xlarge, m2.2xlarge, and c1.xlarge. EBS-optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS, with options between 500 Megabits per second and 1,000 Megabits per second depending on the instance type used. The dedicated throughput minimizes contention between EBS I/O and other traffic from your Amazon EC2 instance, providing the best performance for your EBS volumes.
EBS-optimized instances are designed for use with both Standard and Provisioned IOPS EBS volumes. Standard volumes deliver 100 IOPS on average with a best effort ability to burst to hundreds of IOPS, making them well-suited for workloads with moderate and bursty I/O needs. When attached to an EBS-optimized instance, Provisioned IOPS volumes are designed to consistently deliver up to 2000 IOPS from a single volume, making them ideal for I/O intensive workloads such as databases. You can attach multiple Amazon EBS volumes to a single instance and stripe your data across them for increased I/O and throughput performance.
Amazon EBS-optimized support is now available for m3.xlarge, m3.2xlarge, m2.2xlarge, m2.4xlarge, m1.large, m1.xlarge, and c1.xlarge instance types, and is currently supported in the US-East (N. Virginia), US-West (N. California), US-West (Oregon), EU-West (Ireland), Asia Pacific (Singapore), Asia Pacific (Japan), Asia Pacific (Sydney), and South America (São Paulo) Regions.
You can learn more by visiting the Amazon EC2 detail page.
The Amazon EC2 Team
What this means is that AWS is enabling customers to size their compute instances and storage volumes with more flexibility to meet different needs. For example, EC2 instances with various compute processing capabilities, amount of memory, network and storage I/O performance to volumes. In addition, storage volumes based on different space capacity size, standard or provisioned IOP’s, bandwidth or throughput performance between the instance and volume, along with data protection such as snapshots.
This means that the cost per space capacity of an EBS volume varies based on which AWS availability zone it is in, standard (lower IOP performance) or provisioned IOP’s (faster), along with instance type. In other words, cloud storage is not just about the cost per GByte, it’s also about the cost for IOPS, bandwidth to use it, where it is located (e.g. with AWS which Availability Zone), type of service, level of availability and durability among other attributes.
For those not familiar, Simple Storage Services (S3), Glacier and Elastic Block Storage (EBS) are part of the AWS cloud storage portfolio of services. There are several other storage and data related service for little data database (SQL and NoSql based) other offerings include compute, data management, application and networking for different needs shown in the following image.
AWS Services Console via www.amazon.com
Simple Storage Service (S3) is commonly used in the context of cloud storage and object storage accessed via its S3 API. S3 can be used externally from outside AWS as well as within or via other AWS services. There are various S3 modes including standard, Reduced Redundancy (RR) and Infrequent Access (IA). For example with Elastic Cloud Compute (EC2) including via the Amazon Storage Gateway (read more here and about EC2 here). Glacier is the AWS cold or deep storage service for inactive data and is a companion to S3 that you can read more about here.
S3 is well suited for both big and little data repositories of objects ranging from backup to archive to active video images and much more. In fact if you are using some of the different AaaS or SaaS services including backup or file and video sharing, those may be using S3 as its back-end storage repository. For example NetFlix leverages various AWS capabilities as part of its data and applications infrastructure (read more here).
AWS consists of multiple regions that contain multiple availability zones where data and applications are supported from.
Note that objects stored in a region never leave that region, such as data stored in the EU west never leave Ireland, or data in the US East never leaves Virginia.
AWS does support the ability for user controlled movement of data between regions for business continuance (BC), high availability (HA) and disaster recovery (DR). Read more here at the AWS Security and Compliance site and in this AWS white paper.
What about EBS'
That brings us to Elastic Block Storage (EBS) that is used by EC2 (read more about EC2 and instances here) as storage for cloud and virtual machines or compute instances. In addition to using S3 as a persistent backing store or target for holding snapshots EBS can be thought of as primary storage. You can provision and allocate EBS volumes in the different data centers of the various AWS availability zones. As part of allocating your EBS volume you indicate the type (standard) or provisioned IOP’s or the new EBS Optimized volumes. EBS Optimized volumes enables instances that support the feature to have better IO performance to storage.
The following image shows an EC2 instance with EBS volumes (standard and provisioned IOPS’s) along with S3 volumes and snapshots. In the following example the instance and volumes are being served via the AWS US East region (Northern Virginia) using availability zone US East 1a. In addition, EBS optimized volumes are shown being used in the example to increase bandwidth or throughput performance between storage and the compute instance.
Using the above as a basis, you can build on that to leverage multiple availability zones or regions for HA, BC and DR combined with application, network load balancing and other capabilities. Note that EBS volumes are protected for durability by being spread across different servers and storage in an availability zone. Additional protection is provided by using snapshots combined with S3. Additional BC and DR or HA protection can be accomplished by replicating data across availability zones.
The above is an example of tying various components and services together. For example using different AWS availability zones, instances, EBS, S3 and other tools including those from third parties. Here is a link to a free chapter download from Cloud and Virtual Data Storage Networking (CRC Press) pertaining to data protection, BC and DR (available at Amazon here and Kindle here). In addition here is an AWS white paper on using their services for BC, HA and DR.
EBS volumes are created ranging in size from 1GByte to 1Tbyte in space capacity with multiple volumes being mapped or attached to an EC2 instances. EBS volumes appear as a virtual disk drive for block storage. From the EC2 instance and guest operating system you can mount, format and use the EBS volumes as any other block disk drive with your favorite tools and file systems. In addition to space capacity, EBS volumes are also provisioned with standard IO (e.g. disk based) performance or high performance Provisioned IOPS (e.g. SSD) for thousands of IOPS per instance. AWS states that a standard EBS volume should support about 100 IOP’s on average, with about 2,000 IOPS for a provisioned IOP volume. Need more than 2,000 IOPS, then the AWS recommendation is to use multiple IOP provisioned volumes with data spread across those. Following is an example of AWS EBS volumes seen via the EC2 management interface.
AWS EC2 and EBS configuration status
Note that there is a 10 to 1 ratio of space capacity to IOP’s being provisioned. If you try to play a game of 1,000 IOPS provisioned on a 10GByte EBS volume to keep your costs down you are out of luck. Thus to get 1,000 IOPS’s you would need to allocate at least a 100GByte EBS volume of which you will be billed for the actual space used on a monthly pro-rated basis. The following is an example of provisioning an AWS EBS volume using provisioned IOPS in the US East region in the 1a availability zone.
Provisioning IOPS with EBS volume
Standard and Provisioned IOPS EBS volumes
Standard EBS volumes are good for boot images or other application usage that are not IO performance intensive. For database or other active applications where more performance is needed, then EBS Provisioned IOPS volumes are your option. Note that the provisioned IOP rate is persistent for the specific volume during its life. Thus if you set it and forget it including not using it without turning it off, you will be billed for provisioning it.
For those not familiar, Simple Storage Services (S3), Glacier and Elastic Block Storage (EBS) are part of the AWS cloud storage portfolio of services. With S3, you specify a region where a bucket is created that will contain objects that can be written, read, listed and deleted. You can create multiple buckets in a region with unlimited number of objects ranging from 1 byte to 5 Tbytes in size per bucket. Each object has a unique, user or developer assigned access key. In addition to indicating which AWS region, S3 buckets and objects are provisioned using different levels of availability, durability, SLA’s and costs (view S3 SLA’s here).
Cost will vary depending on the AWS region being used, along if Standard or Reduced Redundancy Storage (RSS) selected. Standard S3 storage is designed with 99.999999999% durability (how many copies exists) and 99.99% availability (how often can it be accessed) on an annual basis capable of two data centers becoming un-available.
As its name implies, for a lower fee and level of durability, S3 RRS has an annual durability of 99.999% and availability of 99.99% capable of a single data center loss. In the following figure durability is how many copies of data exist spread across different servers and storage systems in various data centers and availability zones.
What would you put in RRS vs. Standard S3 storage'
Items that need some level of persistence that can be refreshed, recreated or restored from some other place or pool of storage such as thumbnails or static content or read caches. Other items would be those that you could tolerant some downtime while waiting for data to be restored, recovered or rebuilt from elsewhere in exchange for a lower cost.
Different AWS regions can be chosen for regulatory compliance requirements, performance, SLA’s, cost and redundancy with authentication mechanisms including encryption (SSL and HTTPS) to make sure data is kept secure. Various rights and access can be assigned to objects including making them public or private. In addition to logical data protection (security, identity and access management (IAM), encryption, access control) policies also apply to determine level of durability and availability or accessibility of buckets and objects. Other attributes of buckets and objects include life-cycle management polices and logging of activity to the items. Also part of the objects are meta data containing information about the data being stored shown in a generic example below.
Access to objects is via standard REST and SOAP interfaces with an Application Programming Interface (API). For example default access is via HTTP along with a Bit Torrent interface with optional support via various gateways, appliances and software tools.
Example cloud and object storage access
The above figure via Cloud and Virtual Data Storage Networking (CRC Press) shows a generic example applicable to AWS services including S3 being accessed in different ways. For example I access my S3 buckets and objects via Jungle Disk (one of the tools I use for data protection) that can also access my Rackspace Cloudfiles data. In the following figure there are examples of some of my S3 buckets and objects used by different applications and tools that I have in various AWS regions.
AWS S3 buckets and objects in different regions
Note that I sometimes use other AWS regions outside the US for testing purposes, for compliance purpose my production, business or personal data is only in the US regions.
The following figure is a generic example of how cloud and object storage are accessed using different tools, hardware, software and API’s along with gateways. AWS is an example of what is shown in the following figure as a Cloud Service and S3, EBS or Glacier as cloud storage. Common example API commands are also shown which will vary by different vendors, products or solution definitions or implementations. While Amazon S3 API which is REST HTTP based has become an industry de facto standard, there are other API’s including CDMI (Cloud Data Management Interface) developed by SNIA which has gained ISO accreditation.
Cloud and object storage access example via Cloud and Virtual Data Storage Networking
In addition to using Jungle Disk which manages my AWS keys and objects that it creates, I can also access my S3 objects via the AWS management console and web tools, also via third-party tools including Cyberduck.
Cloud and object storage access example via Cloud and Virtual Data Storage Networking
AWS cloud storage gateway
In 2012 AWS released their Storage Gateway that you can use and try for free here using either an EC2 Amazon Machine Instance (AMI), or deployed locally on a hypervisor such as VMware vSphere/ESXi. About a year ago I did a storage gateway post (First, second and third impressions) when it was first released. I will do a new post soon following up with my later impressions and experiences of having used it recently. For now, my quick (fourth impressions can be found here in this AWS Marketplace review). In general, the gateway is an AWS alternative to using third product gateway, appliances of software tools for accessing AWS storage.
Image courtesy of www.amazon.com
When deployed locally on a VM, the storage gateway communicates using the AWS API’s back to the S3 and EBS (depending on how configured) storage services. Locally, the storage gateway presents an iSCSI block access method for Windows or other servers to use.
There are two modes with one being Gateway-Stored and the other Gateway-Cached. Gateway-Stored uses your primary storage mapped to the storage gateway as primary storage and asynchronous (time delayed) snapshots (user defined) to S3 via EBS volumes. This is a handy way to have local storage for low latency access, yet use AWS for HA, BC and DR, along with a means for doing migration into or out of AWS. Gateway-cache mode places primary storage in AWS S3 with a local cached copy to reduce network overhead.
When I tried the gateway a month or so ago, using both modes, I was not able to view any of my data using standard S3 tools. For example if I looked in my S3 buckets the objects do not appear, something that AWS said had to do with where and how those buckets and objects are managed. Otoh, I was able to see EBS snapshots for the gateway-stored mode including using that as a means of moving data between local and AWS EC2 instances. Note that regardless of the AWS storage gateway mode, some local cache storage is needed, and likewise some EBS volumes will be needed depending on what mode is used.
When I used the gateway, a Windows Server mounted the iSCSI volume presented by the storage gateway and in turn served that to other systems as a shared folder. Thus while having block such as iSCSI is nice, a NAS (NFS or CIFS) presentation and access mode would also be useful. However more on the storage gateway in a future post. Also note that beyond the free trial period (you may have to pay for storage being used) for using the gateway, there are also fees for S3 and EBS storage volumes use.
What about Glacier'
Shortly after its release last year, I did this piece about Glacier and have since been doing some testing proof of concepts with it.
I like Glacier and its prospects for doing some various things, particular for inactive data including deep archives that will seldom if every be accessed, yet need to be retained. The business value proposition of Glacier is that it has a very high durability and low-cost assuming that you do not need to frequently access your data, and when you do, that you can wait 3 to 5 hours before retrieving it from your S3 buckets.
Access to Glacier is via API or AWS console so getting things into and out of it can be a challenge. For example I wanted to see if I could use AWS storage gateway to more easily bulk move things into Glacier via S3, however no luck, or at least today. Speaking of S3, by setting your policies you determine when objects get moved into Glacier as well as how long they will stay there, you can read more about Glacier here and via AWS here.
How much do these AWS services cost'
Fees vary depending on which region is selected, amount of space capacity, level or durability and availability, performance along with type of service. S3 pricing can be found here including a free trial tier along with optional fees. Other AWS fees for EC2 can be found here, EBS pricing here, Glacier here, and storage gateway costs are located here.
Note that there is a myth that cloud vendors have hidden fees which may be the case for some, however so far I have not seen that to be the case with AWS. However, as a consumer, designer or architect, doing your homework and looking at the above links among others you can be ready and understand the various fees and options. Hence like procuring traditional hardware, software or services, do your due diligence and be an informed shopper.
Some more service cost notes include:
Note that with S3 Standard and RRS objects there is not a charge for deletion of objects, however there is a pro-rated charge per GByte of Glacier objects removed prior to 90 days. Glacier also allows up to 5% of your average monthly storage usage (pro-rated daily) to be restored with no charge, other fees apply for restoring larger amounts in a given period. Thus if you are planning on accessing and using data, analyze what your activity and usage will be as part of calculating your costs with Glacier. Read more about Glacier here.
Standard EBS volumes are changed by the amount of storage space capacity you provision in GB until released. For EBS snapshot copies there are fees for transferring data across regions, once moved, the rates of the new region apply for the snapshot.
As with Standard volumes, volume storage for Provisioned IOPS volumes is charged by the amount you provision in GB per month. With Provisioned IOPS volumes, you are also charged by the amount you provision in IOPS pro-rated as a percentage of days you have it in use for the month.
Thus important for cloud storage planning to know not only your space requirements, also IOP’s, bandwidth, and level of availability as well as durability. so for Standard volumes, you will likely see a lower number of I/O requests on your bill than is seen by your application unless you sync all of your I/Os to disk. Thus pay attention to what your needs are in terms of availability (accessibility), durability (resiliency or survivability), space capacity, and performance.
Leverage AWS CloudWatch tools and API’s to monitoring that matter for timely insight and situational awareness into how EBS, EC2, S3, Glacier, Storage Gateway and other services are being used (or costing you). Also visit the AWS service health status dashboard to gain insight into how things are running to help gain confidence with cloud services and solutions.
When it comes to Cloud, Virtualization, Data and Storage Networking along with AWS among other services, tools and technologies including object storage, we are just scratching the surface here.
Hopefully this helps to fill in some gaps giving more information addressing questions, along with generating new ones to prepare for your journey with clouds. After all, don’t be scared of clouds. Be prepared, do your homework, identify your concerns and then address those to gain cloud confidence.