For those not familiar, Simple Storage Services (S3), Glacier and Elastic Block Storage (EBS) are part of the AWS cloud storage portfolio of services. There are several other storage and data related service for little data database (SQL and NoSql based) other offerings include compute, data management, application and networking for different needs shown in the following image.
AWS%202.jpg" width="448" height="252" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Simple Storage Service (S3) is commonly used in the context of cloud storage and object storage accessed via its S3 API. S3 can be used externally from outside AWS as well as within or via other AWS services. There are various S3 modes including standard, Reduced Redundancy (RR) and Infrequent Access (IA). For example with Elastic Cloud Compute (EC2) including via the Amazon Storage Gateway. Glacier is the AWS cold or deep storage service for inactive data and is a companion to S3.
S3 is well suited for both big and little data repositories of objects ranging from backup to archive to active video images and much more. In fact if you are using some of the different AaaS or SaaS services including backup or file and video sharing, those may be using S3 as its back-end storage repository. For example NetFlix leverages various AWS capabilities as part of its data and applications infrastructure.
AWS consists of multiple regions that contain multiple availability zones where data and applications are supported from.
AWS%203.jpg" width="435" height="250" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Note that objects stored in a region never leave that region, such as data stored in the EU west never leave Ireland, or data in the US East never leaves Virginia.
AWS does support the ability for user controlled movement of data between regions for business continuance (BC), high availability (HA), and disaster recovery (DR). Read more here at the AWS Security and Compliance site.
For those not familiar, Simple Storage Services (S3), Glacier and Elastic Block Storage (EBS) are part of the AWS cloud storage portfolio of services. With S3, you specify a region where a bucket is created that will contain objects that can be written, read, listed and deleted. You can create multiple buckets in a region with unlimited number of objects ranging from 1 byte to 5 Tb in size per bucket. Each object has a unique, user or developer assigned access key. In addition to indicating which AWS region, S3 buckets and objects are provisioned using different levels of availability, durability, SLA’s and costs (view S3 SLA’s here).
AWS%208.jpg" width="465" height="250" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Cost will vary depending on the AWS region being used, along if Standard or Reduced Redundancy Storage (RSS) selected. Standard S3 storage is designed with 99.999999999% durability (how many copies exists) and 99.99% availability (how often can it be accessed) on an annual basis capable of two data centers becoming un-available.
As its name implies, for a lower fee and level of durability, S3 RRS has an annual durability of 99.999% and availability of 99.99% capable of a single data center loss. In the following figure durability is how many copies of data exist spread across different servers and storage systems in various data centers and availability zones.
AWS%209.jpg" width="435" height="250" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
What would you put in RRS vs. Standard S3 storage'
Items that need some level of persistence that can be refreshed, recreated or restored from some other place or pool of storage such as thumbnails or static content or read caches. Other items would be those that you could tolerant some downtime while waiting for data to be restored, recovered or rebuilt from elsewhere in exchange for a lower cost.
Different AWS regions can be chosen for regulatory compliance requirements, performance, SLA’s, cost and redundancy with authentication mechanisms including encryption (SSL and HTTPS) to make sure data is kept secure. Various rights and access can be assigned to objects including making them public or private. In addition to logical data protection (security, identity and access management (IAM), encryption, access control) policies also apply to determine level of durability and availability or accessibility of buckets and objects. Other attributes of buckets and objects include life-cycle management polices and logging of activity to the items. Also part of the objects are meta data containing information about the data being stored shown in a generic example below.
AWS%2010.jpg" width="435" height="250" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Access to objects is via standard REST and SOAP interfaces with an Application Programming Interface (API). For example default access is via HTTP along with a Bit Torrent interface with optional support via various gateways, appliances and software tools.
AWS%2011.jpg" width="435" height="250" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Example cloud and object storage access
The above figure via Cloud and Virtual Data Storage Networking (CRC Press) shows a generic example applicable to AWS services including S3 being accessed in different ways. For example I access my S3 buckets and objects via Jungle Disk (one of the tools I use for data protection) that can also access my Rackspace Cloudfiles data. In the following figure there are examples of some of my S3 buckets and objects used by different applications and tools that I have in various AWS regions.
AWS%2012.jpg" width="453" height="253" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
AWS S3 buckets and objects in different regions
Note that I sometimes use other AWS regions outside the US for testing purposes, for compliance purpose my production, business or personal data is only in the US regions.
The following figure is a generic example of how cloud and object storage are accessed using different tools, hardware, software and API’s along with gateways. AWS is an example of what is shown in the following figure as a Cloud Service and S3, EBS or Glacier as cloud storage. Common example API commands are also shown which will vary by different vendors, products or solution definitions or implementations. While Amazon S3 API which is REST HTTP based has become an industry de facto standard, there are other API’s including CDMI (Cloud Data Management Interface) developed by SNIA which has gained ISO accreditation.
AWS%2013.jpg" width="435" height="250" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Cloud and object storage access example via Cloud and Virtual Data Storage Networking
In addition to using Jungle Disk which manages my AWS keys and objects that it creates, I can also access my S3 objects via the AWS management console and web tools, also via third-party tools including Cyberduck.
AWS%2014.jpg" width="435" height="250" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Cloud and object storage access example via Cloud and Virtual Data Storage Networking
AWS cloud storage gateway
In 2012 AWS released their Storage Gateway that you can use and try for free here using either an EC2 Amazon Machine Instance (AMI), or deployed locally on a hypervisor such as VMware vSphere/ESXi. In general, the gateway is an AWS alternative to using third product gateway, appliances of software tools for accessing AWS storage.
AWS%2015.jpg" width="465" height="220" style="cursor: pointer; max-width: 100%; height: auto; vertical-align: middle;">
Image courtesy of www.amazon.com
When deployed locally on a VM, the storage gateway communicates using the AWS API’s back to the S3 and EBS (depending on how configured) storage services. Locally, the storage gateway presents an iSCSI block access method for Windows or other servers to use.
There are two modes with one being Gateway-Stored and the other Gateway-Cached. Gateway-Stored uses your primary storage mapped to the storage gateway as primary storage and asynchronous (time delayed) snapshots (user defined) to S3 via EBS volumes. This is a handy way to have local storage for low latency access, yet use AWS for HA, BC and DR, along with a means for doing migration into or out of AWS. Gateway-cache mode places primary storage in AWS S3 with a local cached copy to reduce network overhead.
When I tried the gateway a month or so ago, using both modes, I was not able to view any of my data using standard S3 tools. For example if I looked in my S3 buckets the objects do not appear, something that AWS said had to do with where and how those buckets and objects are managed. Otoh, I was able to see EBS snapshots for the gateway-stored mode including using that as a means of moving data between local and AWS EC2 instances. Note that regardless of the AWS storage gateway mode, some local cache storage is needed, and likewise some EBS volumes will be needed depending on what mode is used.
When I used the gateway, a Windows Server mounted the iSCSI volume presented by the storage gateway and in turn served that to other systems as a shared folder. Thus while having block such as iSCSI is nice, a NAS (NFS or CIFS) presentation and access mode would also be useful. However more on the storage gateway in a future post. Also note that beyond the free trial period (you may have to pay for storage being used) for using the gateway, there are also fees for S3 and EBS storage volumes use.
How much do these AWS services cost'
Fees vary depending on which region is selected, amount of space capacity, level or durability and availability, performance along with type of service. S3 pricing can be found here including a free trial tier along with optional fees.
Note that there is a myth that cloud vendors have hidden fees which may be the case for some, however so far I have not seen that to be the case with AWS. However, as a consumer, designer or architect, doing your homework and looking at the above links among others you can be ready and understand the various fees and options. Hence like procuring traditional hardware, software or services, do your due diligence and be an informed shopper.
Some more service cost notes include:
Note that with S3 Standard and RRS objects, there is not a charge for deletion of objects.
As with Standard volumes, volume storage for Provisioned IOPS volumes is charged by the amount you provision in GB per month. With Provisioned IOPS volumes, you are also charged by the amount you provision in IOPS pro-rated as a percentage of days you have it in use for the month.
Thus important for cloud storage planning to know not only your space requirements, also IOP’s, bandwidth, and level of availability as well as durability. so for Standard volumes, you will likely see a lower number of I/O requests on your bill than is seen by your application unless you sync all of your I/Os to disk. Thus pay attention to what your needs are in terms of availability (accessibility), durability (resiliency or survivability), space capacity, and performance.
Leverage AWS CloudWatch tools and API’s to monitoring that matter for timely insight and situational awareness into how EBS, EC2, S3, Glacier, Storage Gateway and other services are being used (or costing you). Also visit the AWS service health status dashboard to gain insight into how things are running to help gain confidence with cloud services and solutions.
When it comes to Cloud, Virtualization, Data and Storage Networking along with AWS among other services, tools and technologies including object storage, we are just scratching the surface here.
Hopefully this helps to fill in some gaps giving more information addressing questions, along with generating new ones to prepare for your journey with clouds. After all, don’t be scared of clouds. Be prepared, do your homework, identify your concerns and then address those to gain cloud confidence.