Part II: How many IOPS can a HDD HHDD SSD do with VMware?

How many IOPS can a HDD HHDD SSD do with VMware?

server storage data infrastructure i/o iop hdd ssd trends

Updated 2/10/2018

This is the second post of a two-part series looking at storage performance, specifically in the context of drive or device (e.g. mediums) characteristics of How many IOPS can a HDD HHDD SSD do with VMware. In the first post the focus was around putting some context around drive or device performance with the second part looking at some workload characteristics (e.g. benchmarks).

A common question is how many IOPS (IO Operations Per Second) can a storage device or system do?

The answer is or should be it depends.

Here are some examples to give you some more insight.

For example, the following shows how IOPS vary by changing the percent of reads, writes, random and sequential for a 4K (4,096 bytes or 4 KBytes) IO size with each test step (4 minutes each).

IO Size for test
Workload Pattern of test
Avg. Resp (R+W) ms
Avg. IOP Sec (R+W)
Bandwidth KB Sec (R+W)
4KB
100% Seq 100% Read
0.0
29,736
118,944
4KB
60% Seq 100% Read
4.2
236
947
4KB
30% Seq 100% Read
7.1
140
563
4KB
0% Seq 100% Read
10.0
100
400
4KB
100% Seq 60% Read
3.4
293
1,174
4KB
60% Seq 60% Read
7.2
138
554
4KB
30% Seq 60% Read
9.1
109
439
4KB
0% Seq 60% Read
10.9
91
366
4KB
100% Seq 30% Read
5.9
168
675
4KB
60% Seq 30% Read
9.1
109
439
4KB
30% Seq 30% Read
10.7
93
373
4KB
0% Seq 30% Read
11.5
86
346
4KB
100% Seq 0% Read
8.4
118
474
4KB
60% Seq 0% Read
13.0
76
307
4KB
30% Seq 0% Read
11.6
86
344
4KB
0% Seq 0% Read
12.1
82
330

Dell/Western Digital (WD) 1TB 7200 RPM SATA HDD (Raw IO) thread count 1 4K IO size

In the above example the drive is a 1TB 7200 RPM 3.5 inch Dell (Western Digital) 3Gb SATA device doing raw (non file system) IO. Note the high IOP rate with 100 percent sequential reads and a small IO size which might be a result of locality of reference due to drive level cache or buffering.

Some drives have larger buffers than others from a couple to 16MB (or more) of DRAM that can be used for read ahead caching. Note that this level of cache is independent of a storage system, RAID adapter or controller or other forms and levels of buffering.

Does this mean you can expect or plan on getting those levels of performance?

I would not make that assumption, and thus this serves as an example of using metrics like these in the proper context.

Building off of the previous example, the following is using the same drive however with a 16K IO size.

IO Size for test
Workload Pattern of test
Avg. Resp (R+W) ms
Avg. IOP Sec (R+W)
Bandwidth KB Sec (R+W)
16KB
100% Seq 100% Read
0.1
7,658
122,537
16KB
60% Seq 100% Read
4.7
210
3,370
16KB
30% Seq 100% Read
7.7
130
2,080
16KB
0% Seq 100% Read
10.1
98
1,580
16KB
100% Seq 60% Read
3.5
282
4,522
16KB
60% Seq 60% Read
7.7
130
2,090
16KB
30% Seq 60% Read
9.3
107
1,715
16KB
0% Seq 60% Read
11.1
90
1,443
16KB
100% Seq 30% Read
6.0
165
2,644
16KB
60% Seq 30% Read
9.2
109
1,745
16KB
30% Seq 30% Read
11.0
90
1,450
16KB
0% Seq 30% Read
11.7
85
1,364
16KB
100% Seq 0% Read
8.5
117
1,874
16KB
60% Seq 0% Read
10.9
92
1,472
16KB
30% Seq 0% Read
11.8
84
1,353
16KB
0% Seq 0% Read
12.2
81
1,310

Dell/Western Digital (WD) 1TB 7200 RPM SATA HDD (Raw IO) thread count 1 16K IO size

The previous two examples are excerpts of a series of workload simulation tests (ok, you can call them benchmarks) that I have done to collect information, as well as try some different things out.

The following is an example of the summary for each test output that includes the IO size, workload pattern (reads, writes, random, sequential), duration for each workload step, totals for reads and writes, along with averages including IOP’s, bandwidth and latency or response time.

disk iops

Want to see more numbers, speeds and feeds, check out the following table which will be updated with extra results as they become available.

Device
Vendor
Make

Model

Form Factor
Capacity
Interface
RPM Speed
Raw
Test Result
HDD
HGST
Desktop
HK250-160
2.5
160GB
SATA
5.4K
HDD
Seagate
Mobile
ST2000LM003
2.5
2TB
SATA
5.4K
HDD
Fujitsu
Desktop
MHWZ160BH
2.5
160GB
SATA
7.2K
HDD
Seagate
Momentus
ST9160823AS
2.5
160GB
SATA
7.2K
HDD
Seagate
MomentusXT
ST95005620AS
2.5
500GB
SATA
7.2K(1)
HDD
Seagate
Barracuda
ST3500320AS
3.5
500GB
SATA
7.2K
HDD
WD/Dell
Enterprise
WD1003FBYX
3.5
1TB
SATA
7.2K
HDD
Seagate
Barracuda
ST3000DM01
3.5
3TB
SATA
7.2K
HDD
Seagate
Desktop
ST4000DM000
3.5
4TB
SATA
HDD
HDD
Seagate
Capacity
ST6000NM00
3.5
6TB
SATA
HDD
HDD
Seagate
Capacity
ST6000NM00
3.5
6TB
12GSAS
HDD
HDD
Seagate
Savio 10K.3
ST9300603SS
2.5
300GB
SAS
10K
HDD
Seagate
Cheetah
ST3146855SS
3.5
146GB
SAS
15K
HDD
Seagate
Savio 15K.2
ST9146852SS
2.5
146GB
SAS
15K
HDD
Seagate
Ent. 15K
ST600MP0003
2.5
600GB
SAS
15K
SSHD
Seagate
Ent. Turbo
ST600MX0004
2.5
600GB
SAS
SSHD
SSD
Samsung
840 PRo
MZ-7PD256
2.5
256GB
SATA
SSD
HDD
Seagate
600 SSD
ST480HM000
2.5
480GB
SATA
SSD
SSD
Seagate
1200 SSD
ST400FM0073
2.5
400GB
12GSAS
SSD

Performance characteristics 1 worker (thread count) for RAW IO (non-file system)

Note: (1) Seagate Momentus XT is a Hybrid Hard Disk Drive (HHDD) based on a 7.2K 2.5 HDD with SLC nand flash integrated for read buffer in addition to normal DRAM buffer. This model is a XT I (4GB SLC nand flash), may add an XT II (8GB SLC nand flash) at some future time.

As a starting point, these results are raw IO with file system based information to be added soon along with more devices. These results are for tests with one worker or thread count, other results will be added with such as 16 workers or thread counts to show how those differ.

The above results include all reads, all writes, mix of reads and writes, along with all random, sequential and mixed for each IO size. IO sizes include 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, 1024K and 2048K. As with any workload simulation, benchmark or comparison test, take these results with a grain of salt as your mileage can and will vary. For example you will see some what I consider very high IO rates with sequential reads even without file system buffering. These results might be due to locality of reference of IO’s being resolved out of the drives DRAM cache (read ahead) which vary in size for different devices. Use the vendor model numbers in the table above to check the manufactures specs on drive DRAM and other attributes.

If you are used to seeing 4K or 8K and wonder why anybody would be interested in some of the larger sizes take a look at big fast data or cloud and object storage. For some of those applications 2048K may not seem all that big. Likewise if you are used to the larger sizes, there are still applications doing smaller sizes. Sorry for those who like 512 byte or smaller IO’s as they are not included. Note that for all of these unless indicated a 512 byte standard sector or drive format is used as opposed to emerging Advanced Format (AF) 4KB sector or block size. Watch for some more drive and device types to be added to the above, along with results for more workers or thread counts, along with file system and other scenarios.

Using VMware as part of a Server, Storage and IO (aka StorageIO) test platform

vmware vexpert

The above performance results were generated on Ubuntu 12.04 (since upgraded to 14.04 which was hosted on a VMware vSphere 5.1 (upgraded to 5.5U2) purchased version (you can get the ESXi free version here) with vCenter enabled system. I also have VMware workstation installed on some of my Windows-based laptops for doing preliminary testing of scripts and other activity prior to running them on the larger server-based VMware environment. Other VMware tools include vCenter Converter, vSphere Client and CLI. Note that other guest virtual machines (VMs) were idle during the tests (e.g. other guest VMs were quiet). You may experience different results if you ran Ubuntu native on a physical machine or with different adapters, processors and device configurations among many other variables (that was a disclaimer btw ;) ).

Storage I/O trends

All of the devices (HDD, HHDD, SSD’s including those not shown or published yet) were Raw Device Mapped (RDM) to the Ubuntu VM bypassing VMware file system.

Example of creating an RDM for local SAS or SATA direct attached device.

vmkfstools -z /vmfs/devices/disks/naa.600605b0005f125018e923064cc17e7c /vmfs/volumes/dat1/RDM_ST1500Z110S6M5.vmdk

The above uses the drives address (find by doing a ls -l /dev/disks via VMware shell command line) to then create a vmdk container stored in a dat. Note that the RDM being created does not actually store data in the .vmdk, it’s there for VMware management operations.

If you are not familiar with how to create a RDM of a local SAS or SATA device, check out this post to learn how.This is important to note in that while VMware was used as a platform to support the guest operating systems (e.g. Ubuntu or Windows), the real devices are not being mapped through or via VMware virtual drives.

vmware iops

The above shows examples of RDM SAS and SATA devices along with other VMware devices and dats. In the next figure is an example of a workload being run in the test environment.

vmware iops

One of the advantages of using VMware (or other hypervisor) with RDM’s is that I can quickly define via software commands where a device gets attached to different operating systems (e.g. the other aspect of software defined storage). This means that after a test run, I can quickly simply shutdown Ubuntu, remove the RDM device from that guests settings, move the device just tested to a Windows guest if needed and restart those VMs. All of that from where ever I happen to be working from without physically changing things or dealing with multi-boot or cabling issues.

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

So how many IOPs can a device do?

That depends, however have a look at the above information and results.

Check back from time to time here to see what is new or has been added including more drives, devices and other related themes.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

How many I/O iops can flash SSD or HDD do?

How many i/o iops can flash ssd or hdd do with vmware?

sddc data infrastructure Storage I/O ssd trends

Updated 2/10/2018

A common question I run across is how many I/O iopsS can flash SSD or HDD storage device or system do or give.

The answer is or should be it depends.

This is the first of a two-part series looking at storage performance, and in context specifically around drive or device (e.g. mediums) characteristics across HDD, HHDD and SSD that can be found in cloud, virtual, and legacy environments. In this first part the focus is around putting some context around drive or device performance with the second part looking at some workload characteristics (e.g. benchmarks).

What about cloud, tape summit resources, storage systems or appliance?

Lets leave those for a different discussion at another time.

Getting started

Part of my interest in tools, metrics that matter, measurements, analyst, forecasting ties back to having been a server, storage and IO performance and capacity planning analyst when I worked in IT. Another aspect ties back to also having been a sys admin as well as business applications developer when on the IT customer side of things. This was followed by switching over to the vendor world involved with among other things competitive positioning, customer design configuration, validation, simulation and benchmarking HDD and SSD based solutions (e.g. life before becoming an analyst and advisory consultant).

Btw, if you happen to be interested in learn more about server, storage and IO performance and capacity planning, check out my first book Resilient Storage Networks (Elsevier) that has a bit of information on it. There is also coverage of metrics and planning in my two other books The Green and Virtual Data Center (CRC Press) and Cloud and Virtual Data Storage Networking (CRC Press). I have some copies of Resilient Storage Networks available at a special reader or viewer rate (essentially shipping and handling). If interested drop me a note and can fill you in on the details.

There are many rules of thumb (RUT) when it comes to metrics that matter such as IOPS, some that are older while others may be guess or measured in different ways. However the answer is that it depends on many things ranging from if a standalone hard disk drive (HDD), Hybrid HDD (HHDD), Solid State Device (SSD) or if attached to a storage system, appliance, or RAID adapter card among others.

Taking a step back, the big picture

hdd image
Various HDD, HHDD and SSD’s

Server, storage and I/O performance and benchmark fundamentals

Even if just looking at a HDD, there are many variables ranging from the rotational speed or Revolutions Per Minute (RPM), interface including 1.5Gb, 3.0Gb, 6Gb or 12Gb SAS or SATA or 4Gb Fibre Channel. If simply using a RUT or number based on RPM can cause issues particular with 2.5 vs. 3.5 or enterprise and desktop. For example, some current generation 10K 2.5 HDD can deliver the same or better performance than an older generation 3.5 15K. Other drive factors (see this link for HDD fundamentals) including physical size such as 3.5 inch or 2.5 inch small form factor (SFF), enterprise or desktop or consumer, amount of drive level cache (DRAM). Space capacity of a drive can also have an impact such as if all or just a portion of a large or small capacity devices is used. Not to mention what the drive is attached to ranging from in internal SAS or SATA drive bay, USB port, or a HBA or RAID adapter card or in a storage system.

disk iops
HDD fundamentals

How about benchmark and performance for marketing or comparison tricks including delayed, deferred or asynchronous writes vs. synchronous or actually committed data to devices? Lets not forget about short stroking (only using a portion of a drive for better IOP’s) or even long stroking (to get better bandwidth leveraging spiral transfers) among others.

Almost forgot, there are also thick, standard, thin and ultra thin drives in 2.5 and 3.5 inch form factors. What’s the difference? The number of platters and read write heads. Look at the following image showing various thickness 2.5 inch drives that have various numbers of platters to increase space capacity in a given density. Want to take a wild guess as to which one has the most space capacity in a given footprint? Also want to guess which type I use for removable disk based archives along with for onsite disk based backup targets (compliments my offsite cloud backups)?

types of disks
Thick, thin and ultra thin devices

Beyond physical and configuration items, then there are logical configuration including the type of workload, large or small IOPS, random, sequential, reads, writes or mixed (various random, sequential, read, write, large and small IO). Other considerations include file system or raw device, number of workers or concurrent IO threads, size of the target storage space area to decide impact of any locality of reference or buffering. Some other items include how long the test or workload simulation ran for, was the device new or worn in before use among other items.

Tools and the performance toolbox

Then there are the various tools for generating IO’s or workloads along with recording metrics such as reads, writes, response time and other information. Some examples (mix of free or for fee) include Bonnie, Iometer, Iorate, IOzone, Vdbench, TPC, SPC, Microsoft ESRP, SPEC and netmist, Swifttest, Vmark, DVDstore and PCmark 7 among many others. Some are focused just on the storage system and IO path while others are application specific thus exercising servers, storage and IO paths.

performance tools
Server, storage and IO performance toolbox

Having used Iometer since the late 90s, it has its place and is popular given its ease of use. Iometer is also long in the tooth and has its limits including not much if any new development, never the less, I have it in the toolbox. I also have Futremark PCmark 7 (full version) which turns out has some interesting abilities to do more than exercise an entire Windows PC. For example PCmark can use a secondary drive for doing IO to.

PCmark can be handy for spinning up with VMware (or other tools) lots of virtual Windows systems pointing to a NAS or other shared storage device doing real world type activity. Something that could be handy for testing or stressing virtual desktop infrastructures (VDI) along with other storage systems, servers and solutions. I also have Vdbench among others tools in the toolbox including Iorate which was used to drive the workloads shown below.

What I look for in a tool are how extensible are the scripting capabilities to define various workloads along with capabilities of the test engine. A nice GUI is handy which makes Iometer popular and yes there are script capabilities with Iometer. That is also where Iometer is long in the tooth compared to some of the newer generation of tools that have more emphasis on extensibility vs. ease of use interfaces. This also assumes knowing what workloads to generate vs. simply kicking off some IOPs using default settings to see what happens.

Another handy tool is for recording what’s going on with a running system including IO’s, reads, writes, bandwidth or transfers, random and sequential among other things. This is where when needed I turn to something like HiMon from HyperIO, if you have not tried it, get in touch with Tom West over at HyperIO and tell him StorageIO sent you to get a demo or trial. HiMon is what I used for doing start, stop and boot among other testing being able to see IO’s at the Windows file system level (or below) including very early in the boot or shutdown phase.

Here is a link to some other things I did awhile back with HiMon to profile some Windows and VDI activity test profiling.

What’s the best tool or benchmark or workload generator?

The one that meets your needs, usually your applications or something as close as possible to it.

disk iops
Various 2.5 and 3.5 inch HDD, HHDD, SSD with different performance

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

That depends, however continue reading part II of this series to see some results for various types of drives and workloads.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

EMC ViPR software defined object storage part III

Storage I/O trends

This is part III in a series of posts pertaining to EMC ViPR software defined storage and object storage. You can read part I here and part II here.

EMCworld

More on the object opportunity

Other object access includes OpenStack storage part Swift, AWS S3 HTTP and REST API access. This also includes ViPR supporting EMC Atmos, VNX and Isilon arrays as southbound persistent storage in addition.

object storage
Object (and cloud) storage access example

EMC is claiming that over 250 VNX systems can be abstracted to support scaling with stability (performance, availability, capacity, economics) using ViPR. Third party storage will be supported along with software such as OpenStack Swift, Ceph and others running on commodity hardware. Note that EMC has some history with object storage and access including Centera and Atmos. Visit the micro site I have setup called www.objectstoragecenter.com and watch for more content to be updated and added there.

More on the ViPR control plane and controller

ViPR differs from some others in that it does not sit in the data path all the time (e.g. between application servers and storage systems or cloud services) to cut potential for bottlenecks.

ViPR architecture

Organizations that can use ViPR include enterprise, SMB, CSP or MSP and hosting sites. ViPR can be used in a control mode to leverage underlying storage systems, appliances and services intelligence and functionality. This means ViPR can be used to complement as oppose to treat southbound or target storage systems and services as dumb disks or JBOD.

On the other hand, ViPR will also have a suite of data services such as snapshot, replication, data migration, movement, tiering to add value for when those do not exist. Customers will be free to choose how they want to use and deploy ViPR. For example leveraging underlying storage functionality (e.g. lightweight model), or in a more familiar storage virtualization model heavy lifting model. In the heavy lifting model more work is done by the virtualization or abstraction software to create an added value, however can be a concern for bottlenecks depending how deployed.

Service categories

Software defined, storage hypervisor, virtual storage or storage virtualization?

Most storage virtualization, storage hypervisors and virtual storage solutions that are hardware or software based (e.g. software defined) implemented what is referred to as in band. With in band the storage virtualization software or hardware sits between the applications (northbound) and storage systems or services (southbound).

While this approach can be easier to carry out along with add value add services, it can also introduce scaling bottlenecks depending on implementations. Examples of in band storage virtualization includes Actifio, DataCore, EMC VMAX with third-party storage, HDS with third-party storage, IBM SVC (and their V7000 Storwize storage system based on it) and NetApp Vseries among others. An advantage of in band approaches is that there should not need to be any host or server-side software requirements and SAN transparency.

There is another approach called out-of-band that has been tried. However pure out-of-band requires a management system along with agents, drivers, shims, plugins or other software resident on host application servers.

fast path control path
Example of generic fast path control path model

ViPR takes a different approach, one that was seen a few years ago with EMC Invista called fast path, control path that for the most part stays out of the data path. While this is like out-of-band, there should not be a need for any host server-side (e.g. northbound) software. By being a fast path control path, the virtualization or abstraction and management functions stay out of the way for data being moved or work being done.

Hmm, kind of like how management should be, there to help when needed, out-of-the-way not causing overhead other times ;).

Is EMC the first (even with Invista) to leverage fast path control path?

Actually up until about a year or so ago, or shortly after HP acquired 3PAR they had a solution called Storage Virtualization Services Platform (SVPS) that was OEMd from LSI (e.g. StorAge). Unfortunately, HP decided to retire that as opposed to extend its capabilities for file and object access (northbound) as well as different southbound targets or destination services.

Whats this northbound and southbound stuff?

Simply put, think in terms of a vertical stack with host servers (PMs or VMs) on the top with applications (and hypervisors or other tools such as databases) on top of them (e.g. north).

software defined storage
Northbound servers, southbound storage systems and cloud services

Think of storage systems, appliances, cloud services or other target destinations on the bottom (or south). ViPR sits in between providing storage services and management to the northbound servers leveraging the southbound storage.

What host servers can VIPR support for serving storage?

VIPR is being designed to be server agnostic (e.g. virtual or physical), along with operating system agnostic. In addition VIPR is being positioned as capable of serving northbound (e.g. up to application servers) block, file or object as well as accessing southbound (e.g. targets) block, file and object storage systems, file systems or services.

Note that a difference between earlier similar solutions from EMC have been either block based (e.g. Invista, VPLEX, VMAX with third-party storage), or file based. Also note that this means VIPR is not just for VMware or virtual server environments and that it can exist in legacy, virtual or cloud environments.

ViPR image

Likewise VIPR is intended to be application agnostic supporting little data, big data, very big data ( VBD) along with Hadoop or other specialized processing. Note that while VIPR will support HDFS in addition to NFS and CIFS file based access, Hadoop will not be running on or in the VIPR controllers as that would live or run elsewhere.

How will VIPR be deployed and licensed?

EMC has indicated that the VIPR controller will be delivered as software that installs into a virtual appliance (e.g. VMware) running as a virtual machine (VM) guest. It is not clear when support will exist for other hypervisors (e.g. Microsoft Hyper-V, Citrix/XEN, KVM or if VMware vSphere with vCenter or simply on ESXi free version). As of the announcement pre briefing, EMC had not yet finalized pricing and licensing details. General availability is expected in the second half of calendar 2013.

Keep in mind that the VIPR controller (software) runs as a VM that can be hosted on a clustered hypervisor for HA. In addition, multiple VIPR controllers can exist in a cluster to further enhance HA.

Some questions to be addressed among others include:

  • How and where are IOs intercepted?
  • Who can have access to the APIs, what is the process, is there a developers program, SDK along with resources?
  • What network topologies are supported local and remote?
  • What happens when JBOD is used and no advanced data services exist?
  • What are the characteristics of the object access functionality?
  • What if any specific switches or data path devices and tools are needed?
  • How does a host server know to talk with its target and ViPR controller know when to intercept for handling?
  • Will SNIA CDMI be added and when as part of the object access and data services capabilities?
  • Are programmatic bindings available for the object access along with support for other APIs including IOS?
  • What are the performance characteristics including latency under load as well as during a failure or fault scenario?
  • How will EMC place Vplex and its caching model on a local and wide area basis vs. ViPR or will we see those two create some work together, if so, what will that be?

Bottom line (for now):

Good move for EMC, now let us see how they execute including driving adoption of their open APIs, something they have had success in the past with Centera and other solutions. Likewise, let us see what other storage vendors become supported or add support along with how pricing and licensing are rolled out. EMC will also have to articulate when and where to use ViPR vs. VPLEX along with other storage systems or management tools.

Additional related material:
Are you using or considering implementation of a storage hypervisor?
Cloud and Virtual Data Storage Networking (CRC)
Cloud conversations: Public, Private, Hybrid what about Community Clouds?
Cloud, virtualization, storage and networking in an election year
Does software cut or move place of vendor lock-in?
Don’t Use New Technologies in Old Ways
EMC VPLEX: Virtual Storage Redefined or Respun?
How many degrees separate you and your information?
Industry adoption vs. industry deployment, is there a difference?
Many faces of storage hypervisor, virtual storage or storage virtualization
People, Not Tech, Prevent IT Convergence
Resilient Storage Networks (Elsevier)
Server and Storage Virtualization Life beyond Consolidation
Should Everything Be Virtualized?
The Green and Virtual Data Center (CRC)
Two companies on parallel tracks moving like trains offset by time: EMC and NetApp
Unified storage systems showdown: NetApp FAS vs. EMC VNX
backup, restore, BC, DR and archiving
VMware buys virsto, what about storage hypervisor’s?
Who is responsible for vendor lockin?

Ok, nuff said (for now)

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

EMC ViPR software defined object storage part II

Storage I/O trends

This is part II in a series of posts pertaining to EMC ViPR software defined storage and object storage. You can read part I here and part III here.

EMCworld

Some questions and discussion topics pertaining to ViPR:

Whom is ViPR for?

Organizations that need to scale with stability across EMC, third-party or open storage software stacks and commodity hardware. This applies to large and small enterprise, cloud service providers, managed service providers, virtual and cloud environments/

What this means for EMC hardware/platform/systems?

They can continue to be used as is, or work with ViPR or other deployment modes.

Does this mean EMC storage systems are nearing their end of life?

IMHO for the most part not yet, granted there will be some scenarios where new products will be used vs. others, or existing ones used in new ways for different things.

As has been the case for years if not decades, some products will survive, continue to evolve and find new roles, kind of like different data storage mediums (e.g. ssd, disk, tape, etc).

How does ViPR work?

ViPR functions as a control plane across the data and storage infrastructure supporting both north and southbound. northbound refers to use from or up to application servers (physical machines PM and virtual machines VMs). southbound refers target or destination storage systems. Storage systems can be traditional EMC or third-party (NetApp mentioned as part of first release), appliances, just a bunch of disks (JBOD) or cloud services.

Some general features and functions:

  • Provisioning and allocation (with automation)
  • Data and storage migration or tiering
  • Leverage scripts, templates and workbooks
  • Support service categories and catalogs
  • Discovery, registration of storage systems
  • Create of storage resource pools for host systems
  • Metering, measuring, reporting, charge or show back
  • Alerts, alarms and notification
  • Self-service portal for access and provisioning

ViPR data plane (adding data services and value when needed)

Another part is the data plane for implementing data services and access. For block and file when not needed, ViPR steps out-of-the-way leveraging the underlying storage systems or services.

object storage
Object storage access

When needed, the ViPR data plane can step in to add added services and functionality along with support object based access for little data and big data. For example, Hadoop Distributed File System (HDFS) services can support northbound analytic software applications running on servers accessing storage managed by ViPR.

Continue reading in part III of this series here including how ViPR works, who it is for and more analysis.

Ok, nuff said (for now)

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

EMC ViPR virtual physical object and software defined storage (SDS)

Storage I/O trends

Introducing EMC ViPR

This is the first in a three part series, read part II here, and part III here.

During the recent EMCworld event in Las Vegas among other things, EMC announced ViPR (read announcement here) . Note that this ViPR is not the same EMC Viper project from a few years ago that was focused on data footprint reduction (DFR) including dedupe. ViPR has been in the works for a couple of years taking a step back rethinking how storage is can be used going forward.

EMCworld

ViPR is not a technology creation developed in a vacuum instead includes customer feedback, wants and needs. Its core themes are extensible, open and scalable.

EMCworld

On the other hand, ViPR addresses plenty of buzzword bingo themes including:

  • Agility, flexibility, multi-tenancy, orchestration
  • Virtual appliance and control plane
  • Data services and storage management
  • IT as a Service (ITaaS) and Infrastructure as a Service (IaaS)
  • Scaling with stability without compromise
  • Software defined storage
  • Public, private, hybrid cloud
  • Big data and little data
  • Block, file and object storage
  • Control plane and data plane
  • Storage hypervisor, virtualization and virtual storage
  • Heterogeneous (third-party) storage support
  • Open API and automation
  • Self-service portals, service catalogs

Buzzword bingo

Note that this is essentially announcing the ViPR product and program initiative with general availability slated for second half of 2013.

What is ViPR addressing?

IT and data infrastructure (server, storage, IO and networking hardware, software) challenges for traditional, virtual and cloud environments.

  • Data growth, after all, there is no such thing as an information recession with more data being generated, moved, processed, stored and retained for longer periods of time. Then again, people and data are both getting larger and living longer, for both little data and big data along with very big data.
  • Overhead and complexities associated with managing and using an expanding, homogenous (same vendor, perhaps different products) or heterogeneous (different vendors and products) data infrastructure across cloud, virtual and physical, legacy and emerging. This includes add, changes or moves, updates and upgrades, retirement and replacement along with disposition, not to mention protecting data in an expanding footprint.
  • road to cloud

  • Operations and service management, fault and alarm notification, resolution and remediation, rapid provisioning, removing complexity and cost of doing things vs. simply cutting cost and compromising service.

EMC ViPR

What is this software defined storage stuff?

There is the buzzword aspect, and then there is the solution and business opportunity.

First the buzzword aspect and bandwagon:

  • Software defined marketing (SDM) Leveraging software defined buzzwords.
  • Software defined data centers (SDDC) Leveraging software to derive more value from hardware while enabling agility, flexibility, and scalability and removing complexity. Think the Cloud and Virtual Data Center models including those from VMware among others.
  • Software defined networking (SDN) Rather than explain, simply look at Nicira that VMware bought in 2012.
  • Software defined storage (SDS) Storage software that is independent of any specific hardware, which might be a bit broad, however it is also narrower than saying anything involving software.
  • Software defined BS (SDBS) Something that usually happens as a result when marketers and others jump on a bandwagon, in this case software defined marketing.

Note that not everything involved with software defined is BS, only some of the marketing spins and overuse. The downside to the software defined marketing and SDBS is the usual reaction of skepticism, cynicism and dismissal, so let us leave the software defined discussion here for now.

software defined storage

An example of software defined storage can be storage virtualization, virtual storage and storage hypervisors that are hardware independent. Note that when I say hardware independent, that also means being able to support different vendors systems. Now if you want to have some fun with the software defined storage diehards or purist, tell them that all hardware needs software and all software needs hardware, even if virtual. Further hardware is defined by its software, however lets leave sleeping dogs lay where they rest (at least for now ;)).

Storage hypervisors were a 2012 popular buzzword bingo topic with plenty of industry adoption and some customer deployment. While 2012 saw plenty of SDM buzz including SDC, SDN 2013 is already seeing an increase including software defined servers, and software defined storage.

Regardless of what you view of software defined storage, storage hypervisor, storage virtualization and virtual storage is, the primary focus and goal should be addressing business and application needs. Unfortunately, some of the discussions or debates about what is or is not software defined and related themes lose focus of what should be the core goal of enabling business and applications.

Continue reading in part II of this series here including how ViPR works, who it is for and more analysis.

Ok, nuff said (for now)

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

April 2013 Server and StorageIO Update Newsletter

StorageIO News Letter Image
April 2013 News letter

Welcome to the April 2013 edition of the StorageIO Update. This edition includes more on nand flash SSD, after all its not if, rather when, where, why, with what along with how much SSD is in your future. Also more on object storage, clouds, big data and little data, HDDs, SNW, backup/restore, HA, BC, DR and data protection along with data center topics and trends.

You can get access to this news letter via various social media venues (some are shown below) in addition to StorageIO web sites and subscriptions.

Click on the following links to view the April 2013 edition as (HTML sent via Email) version, or PDF versions.

Visit the news letter page to view previous editions of the StorageIO Update.

You can subscribe to the news letter by clicking here.

Enjoy this edition of the StorageIO Update news letter, let me know your comments and feedback.

Ok Nuff said, for now

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Spring SNW 2013, Storage Networking World Recap

Storage I/O trends

A couple of weeks ago I attended the spring 2013 Storage Networking World (SNW) in Orlando Florida. Talking with SNIA Chairman Wayne Adams and SNIA Director Leo Legar this was the 28th edition of the US SNW (two shows a year), plus the international ones. While I have not been to all 28 of the US SNWs, I have been to a couple of dozen SNWs in the US, Europe and Brazil going back to around 2001 as an attendee, main stage as well as breakout, and tutorial presenter (see here and here).

SNW image

For the spring 2013 SNW I was there for a mix of meetings, analyst briefings, attending the expo, doing some podcasts (see below), meeting with IT professionals (e.g. customers), VARs, vendors along with presenting three sessions (you can download them and others backup, restore, BC, DR and archiving).

Some of the buzz and themes heard included big data was a little topic at the event, while cloud was in the conversations, dedupe and data footprint reduction (DFR) do matter for some people and applications. However also a common theme with customers including Media and Entertainment (M&E) is that not everything can be duped thus other DFR approaches are needed.

There was some hype in and around hybrid storage along with storage hypervisors, which was also an entertaining panel discussion with HDS (Claus Mikkelsen aka @YoClaus), Datacore, IBM and Virstro.

The theme of that discussion seemed for the most part to gravitate towards realities of storage virtualization and less about the hypervisor hype. Some software defined marketing hype I heard is that it is impossible to spend more than a million dollars on a server today. I guess with the applicable caveats, qualifiers and context that could be true, however I also know some vendors and customers that would say otherwise.

Lunch
Lunchtime at SNW Spring 2013

Not surprisingly, there was an increase in vendors wanting to jump on the software defined and object storage bandwagons; however, customers tended to be curious at best, confused or concerned otherwise. Speaking of object storage, check out this podcast discussion with Cleversafe customer Justin Stottlemyer of Shutterfly and his 80PB environment.

In addition to Cleversafe, heard from Astute (if you need fast iSCSI storage check them out), Avere has a new NAS for dummies book out, Exablox a storage system startup with emphasis on scalability, ease of use and NAS access and hybrid storage Tegile. Also, check out SwifTest for generating application workloads and measurement that had their customer Go Daddy presenting at the event. A couple of others to keep an eye on include Raxco with their thin provision storage reclamation tool, and Infinio with their NAS acceleration for VMware software tools among others.

backup, restore, BC, DR and archiving

Here are the three presentations that I did while at the event:

Analyst Perspective: Increase Your Return on Innovation (The New ROI) With Data Management and Dedupe
There is no such thing as an information recession with more data to move, process and store, however there are economic challenges. Likewise, people and data are living longer and getting larger which requires leveraging data footprint reduction (DFR) techniques on a broader focus. It is time to move upstream finding and fixing things at the source to reduce the downstream impact of expanding data footprints, enabling more to be done with what you have.

Analyst Perspective: Metrics that Matter – Meritage of Data Management and Data Protection
Not everything in the data center or information factory is the same. This session recaps and builds off the morning increase your ROI with data footprint and data management session while setting the stage for the rethinking data protection (backup, BC and DR). Are you maximizing the return on innovation in how using new tools and technology in new ways, vs. using new tools in old ways? Also discussed performance capacity planning, forecasting analysis in cloud, virtual and physical environments. Without metrics that matter, you are flying blind, or perhaps missing opportunities to further drive your return on innovation and return on investment.

Analyst Perspective: Time to Rethink Data Protection Including BC and DR
When it comes to today’s data centers and information factories including physical, virtual and cloud, everything is not the same, so why treat business continuance (BC), disaster recovery (DR) and data protection in general the same? Simply using new tools, technologies and techniques in the same old ways is no longer a viable option. Since there is no such thing as a data or information recession, yet there are economic and budget challenges, along with new or changing threat risks, now is the time to review data protection including BC and DR including using new technologies in new ways.

You can view the complete SNW USA spring 2013 agenda here.

audio
Podcasts are also available on

Here are links to some podcasts from spring 2013 SNW:
Stottlemyer of Shutterfly and object storage discussion
Dave Demming talking tech education from SNW Spring 2013
Farley Flies into SNW Spring 2013
Talking with Tony DiCenzo at SNW Spring 2013
SNIA Spring 2013 update with Wayne Adams
SNIA’s new SPDEcon conference

Also, check out these podcasts from fall 2012 US and Europe SNWs:
Ben Woo on Big Data Buzzword Bingo and Business Benefits
Networking with Bruce Ravid and Bruce Rave
Industry trends and perspectives: Ray Lucchesi on Storage and SNW
Learning with Leo Leger of SNIA
Meeting up with Marty Foltyn of SNIA
Catching up with Quantum CTE David Chapa (Now with Evault)
Chatting with Karl Chen at SNW 2012
SNW 2012 Wayne’s World
SNW Podcast on Cloud Computing
HDS Claus Mikkelsen talking storage from SNW Fall 2012

Storage I/O trends

What this all means?

While busy, I liked this edition of SNW USA in that it had a great agenda with diversity and balance of speaker sessions (some tutorials, some vendors, some IT customers, and some analysts) vs. too many of one specific area.

In addition to the agenda and session length, the venue was good, big enough, however not spread out so much to cause loss of the buzz and energy of the event.

This SNW had some similar buzz or energy as early versions granted without the hype and fanfare of a startup industry or focus area (that would be some of the other events today)

Should SNW go to a once a year event?

While it would be nice to have a twice a year venue for convenience, practicality and budgets say once would be enough given all the other conferences and venues on the agenda (or that could be).

The next SNW USA will be October 15 to 17 2013 in Long Beach California, and Europe in Frankfurt Germany October 29-30 2013.

Thanks again to all the attendees, participants, vendor exhibitors, event organizers and SNIA, SNW/Computerworld staffs for another great event.

Ok, nuff said

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

HP Moonshot 1500 software defined capable compute servers

Storage I/O cloud virtual and big data perspectives

Riding the current software defined data center (SDC) wave being led by the likes of VMware and software defined networking (SDN) also championed by VMware via their acquisition of Nicira last year, Software Defined Marketing (SDM) is in full force. HP being a player in providing the core building blocks for traditional little data and big data, along with physical, virtual, converged, cloud and software defined has announced a new compute, processor or server platform called the Moonshot 1500.

HP Moonshot software defined server image

Software defined marketing aside, there are some real and interesting things from a technology standpoint that HP is doing with the Moonshot 1500 along with other vendors who are offering micro server based solutions.

First, for those who see server (processor and compute) improvements as being more and faster cores (and threads) per socket, along with extra memory, not to mention 10GbE or 40GbE networking and PCIe expansion or IO connectivity, hang on to your hats.

HP Moonshot software defined server image individual server blade

Moonshot is in the model of the micro servers or micro blades such as what HP has offered in the past along with the likes of Dell and Sea Micro (now part of AMD). The micro servers are almost the opposite of the configuration found on regular servers or blades where the focus is putting more ability on a motherboard or blade.

With micro servers the approach support those applications and environments that do not need lots of CPU processing capability, large amount of storage or IO or memory. These include some web hosting or cloud application environments that can leverage more smaller, lower power, less performance or resource intensive platforms. For example big data (or little data) applications whose software or tools benefit from many low-cost, low power, and lower performance with distributed, clustered, grid, RAIN or ring based architectures can benefit from this type of solution.

HP Moonshot software defined server image and components

What is the Moonshot 1500 system?

  • 4.3U high rack mount chassis that holds up to 45 micro servers
  • Each hot-swap micro server is its own self-contained module similar to blade server
  • Server modules install vertically from the top into the chassis similar to some high-density storage enclosures
  • Compute or processors are Intel Atom S1260 2.0GHz based processors with 1 MB of cache memory
  • Single S0-DIMM slot (unbuffered ECC at 1333 MHz) supports 8GB (1 x 8GB DIMM) DRAM
  • Each server module has a single 2.5″ SATA 200GB SSD, 500GB or 1TB HDD onboard
  • A dual port Broadcom 5720 1 Gb Ethernet LAn per server module that connects to chassis switches
  • Marvel 9125 storage controller integrated onboard each server module
  • Chassis and enclosure management along with ACPI 2.0b, SMBIOS 2.6.1 and PXE support
  • A pair of Ethernet switches each give up to six x 10GbE uplinks for the Moonshot chassis
  • Dual RJ-45 connectors for iLO chassis management are also included
  • Status LEDs on the front of each chassis providers status of the servers and network switches
  • Support for Canonical Ubuntu 12.04, RHEL 6.4, SUSE Linux LES 11 SP2

Storage I/O cloud virtual and big data perspectives

Notice a common theme with moonshot along with other micro server-based systems and architectures?

If not, it is simple, I mean literally simple and flexible is the value proposition.

Simple is the theme (with software defined for marketing) along with low-cost, lower energy power demand, lower performance, less of what is not needed to remove cost.

Granted not all applications will be a good fit for micro servers (excuse me, software defined servers) as some will need the more robust resources of traditional servers. With solutions such as HP Moonshot, system architects and designers have more options available to them as to what resources or solution options to use. For example, a cloud or object storage system based solutions that does not need a lot of processing performance per node or memory, and a low amount of storage per node might find this as an interesting option for mid to entry-level needs.

Will HP release a version of their Lefthand or IBRIX (both since renamed) based storage management software on these systems for some market or application needs?

How about deploying NoSQL type tools including Cassandra or Mongo, how about CloudStack, OpenStack Swift, Basho Riak (or Riak CS) or other software including object storage, on these types of solutions, or web servers and other applications that do not need the fastest processors or most memory per node?

Thus micro server-based solutions such as Moonshot enable return on innovation (the new ROI) by enabling customers to leverage the right tool (e.g. hard product) to create their soft product allowing their users or customers to in turn innovate in a cost-effective way.

Will the Moonshot servers be the software defined turnaround for HP, click here to see what Bloomberg has to say, or Forbes here.

Learn more about Moonshot servers at HP here, here or data sheets found here.

Btw, HP claims that this is the industries first software defined server, hmm.

Ok, nuff said (for now).

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

SNIA’s new SPDEcon conference

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode from SNW Spring 2013 in Orlando Florida, Bruce Ravid (@BruceRave) and me visit with our guests SNIA Chairman Wayne Adams (@wma01606) and from SNIA Education SW Worth. Wayne was one of our first podcast guests in the episode titled Waynes World, SNIA and SNW that you can listen to here.

SNIA image logo

Our conversation centers around the new SNIA SPDEcon conference that will occur June 10th in Santa Clara California.

SNIA SPDEcon image

The tag line of the event is for experts by experts and those who want to become experts. Listen to our conversation and check out the snia.org and snia.org/spdecon websites to signup and take part in this new event.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Wayne and SW.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode from SNW Spring 2013 with Wayne Adams and SW Worth of SNIA to learn about the new SPDEcon conference.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Conversation with Justin Stottlemyer of Shutterfly and object storage discussion

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode from SNW Spring 2013 in Orlando Florida, Bruce Ravid (@BruceRave) and me visit with Justin Stottlemyer (@JHStott) who is a Fellow and Storage Architect at Shutterfly.

Shutterfly image via shutterfly.com

Our conversation centers on how Justin and Shutterfly maximize their return on innovation (the new ROI) by using object storage along with other technology and techniques to create a resilient, scalable flexible data infrastructure.

Justin was at SNW presenting on overcoming object integration at Shutterfly where their data infrastructure consists of 80PB of storage to house over 30PB of user content data that continues to grow.

Example of how we have used Shutterfly to create photo books from vacations

For those not familiar, Shutterfly providers customers with free unlimited storage of their photos which can then be printed in coffee table type books such as the one shown in the above figure. My wife has used Shutterfly a few times to create photo books such as the one shown above in the image.

As you will hear Justin explain in the pod cast, photos get uploaded and ingested into their environment and then available for printing.

In addition to talking about object storage, private clouds, business continuance (BC) and disaster recovery, other topics include performance and capacity planning, maximizing return on innovation in addition to return on investment among other items.
Varies and managed by user interface

Listen in to hear how Justin and Shutterfly are currently managing 80PB of storage with over 30PB of user data that continues to grow.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Justin and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode from SNW Spring 2013 with Justin Stottlemyer of Shutterfly.

Speaking of cloud and object storage, check out www.objectstoragecenter.com to view more related material.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

March 2013 Server and StorageIO Update Newsletter

StorageIO News Letter Image
March 2013 News letter

Welcome to the March 2013 edition of the StorageIO Update news letter including a new format and added content.

You can get access to this news letter via various social media venues (some are shown below) in addition to StorageIO web sites and subscriptions.

Click on the following links to view the March 2013 edition as (HTML sent via Email) version, or PDF versions.

Visit the news letter page to view previous editions of the StorageIO Update.

You can subscribe to the news letter by clicking here.

Enjoy this edition of the StorageIO Update news letter, let me know your comments and feedback.

Nuff said for now

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Cloud conversations: AWS EBS, Glacier and S3 overview (Part III)

Storage I/O industry trends image

Amazon Web Services (AWS) recently added EBS Optimized support for enhanced bandwidth EC2 instances (read more here). This industry trends and perspective cloud conversation is the third (tying the posts together) in a three-part series companion to the AWS EBS optimized post found here. Part I is here (closer look at EBS) and part II is here (closer look at S3).

AWS image via Amazon.com

Cloud storage and object storage I/O figure
Cloud and object storage access example via Cloud and Virtual Data Storage Networking

AWS cloud storage gateway

In 2012 AWS released their Storage Gateway that you can use and try for free here using either an EC2 Amazon Machine Instance (AMI), or deployed locally on a hypervisor such as VMware vSphere/ESXi. About a year ago I did a storage gateway post (First, second and third impressions) when it was first released. I will do a new post soon following up with my later impressions and experiences of having used it recently. For now, my quick (fourth impressions can be found here in this AWS Marketplace review). In general, the gateway is an AWS alternative to using third product gateway, appliances of software tools for accessing AWS storage.

AWS Storage Gateway
Image courtesy of www.amazon.com

When deployed locally on a VM, the storage gateway communicates using the AWS API’s back to the S3 and EBS (depending on how configured) storage services. Locally, the storage gateway presents an iSCSI block access method for Windows or other servers to use.

There are two modes with one being Gateway-Stored and the other Gateway-Cached. Gateway-Stored uses your primary storage mapped to the storage gateway as primary storage and asynchronous (time delayed) snapshots (user defined) to S3 via EBS volumes. This is a handy way to have local storage for low latency access, yet use AWS for HA, BC and DR, along with a means for doing migration into or out of AWS. Gateway-cache mode places primary storage in AWS S3 with a local cached copy to reduce network overhead.

Storage I/O industry trends image

When I tried the gateway a month or so ago, using both modes, I was not able to view any of my data using standard S3 tools. For example if I looked in my S3 buckets the objects do not appear, something that AWS said had to do with where and how those buckets and objects are managed. Otoh, I was able to see EBS snapshots for the gateway-stored mode including using that as a means of moving data between local and AWS EC2 instances. Note that regardless of the AWS storage gateway mode, some local cache storage is needed, and likewise some EBS volumes will be needed depending on what mode is used.

When I used the gateway, a Windows Server mounted the iSCSI volume presented by the storage gateway and in turn served that to other systems as a shared folder. Thus while having block such as iSCSI is nice, a NAS (NFS or CIFS) presentation and access mode would also be useful. However more on the storage gateway in a future post. Also note that beyond the free trial period (you may have to pay for storage being used) for using the gateway, there are also fees for S3 and EBS storage volumes use.

AWS image via Amazon.com

What about Glacier?

Shortly after its release last year, I did this piece about Glacier and have since been doing some testing proof of concepts with it.

I like Glacier and its prospects for doing some various things, particular for inactive data including deep archives that will seldom if every be accessed, yet need to be retained. The business value proposition of Glacier is that it has a very high durability and low-cost assuming that you do not need to frequently access your data, and when you do, that you can wait 3 to 5 hours before retrieving it from your S3 buckets.

Access to Glacier is via API or AWS console so getting things into and out of it can be a challenge. For example I wanted to see if I could use AWS storage gateway to more easily bulk move things into Glacier via S3, however no luck, or at least today. Speaking of S3, by setting your policies you determine when objects get moved into Glacier as well as how long they will stay there, you can read more about Glacier here and via AWS here.

Storage I/O industry trends image

How much do these AWS services cost?

Fees vary depending on which region is selected, amount of space capacity, level or durability and availability, performance along with type of service. S3 pricing can be found here including a free trial tier along with optional fees. Other AWS fees for EC2 can be found here, EBS pricing here, Glacier here, and storage gateway costs are located here.

Note that there is a myth that cloud vendors have hidden fees which may be the case for some, however so far I have not seen that to be the case with AWS. However, as a consumer, designer or architect, doing your homework and looking at the above links among others you can be ready and understand the various fees and options. Hence like procuring traditional hardware, software or services, do your due diligence and be an informed shopper.

Amazon Web Services (AWS) image

Some more service cost notes include:

Note that with S3 Standard and RRS objects there is not a charge for deletion of objects, however there is a pro-rated charge per GByte of Glacier objects removed prior to 90 days. Glacier also allows up to 5% of your average monthly storage usage (pro-rated daily) to be restored with no charge, other fees apply for restoring larger amounts in a given period. Thus if you are planning on accessing and using data, analyze what your activity and usage will be as part of calculating your costs with Glacier. Read more about Glacier here.

Standard EBS volumes are changed by the amount of storage space capacity you provision in GB until released. For EBS snapshot copies there are fees for transferring data across regions, once moved, the rates of the new region apply for the snapshot.

Amazon Web Services (AWS) image

As with Standard volumes, volume storage for Provisioned IOPS volumes is charged by the amount you provision in GB per month. With Provisioned IOPS volumes, you are also charged by the amount you provision in IOPS pro-rated as a percentage of days you have it in use for the month.

Thus important for cloud storage planning to know not only your space requirements, also IOP’s, bandwidth, and level of availability as well as durability. so for Standard volumes, you will likely see a lower number of I/O requests on your bill than is seen by your application unless you sync all of your I/Os to disk. Thus pay attention to what your needs are in terms of availability (accessibility), durability (resiliency or survivability), space capacity, and performance.

Leverage AWS CloudWatch tools and API’s to monitoring that matter for timely insight and situational awareness into how EBS, EC2, S3, Glacier, Storage Gateway and other services are being used (or costing you). Also visit the AWS service health status dashboard to gain insight into how things are running to help gain confidence with cloud services and solutions.

Storage I/O industry trends image

When it comes to Cloud, Virtualization, Data and Storage Networking along with AWS among other services, tools and technologies including object storage, we are just scratching the surface here.

Hopefully this helps to fill in some gaps giving more information addressing questions, along with generating new ones to prepare for your journey with clouds. After all, don’t be scared of clouds. Be prepared, do your homework, identify your concerns and then address those to gain cloud confidence.

Additional reading and related items:

  • Cloud conversations: AWS EBS optimized instances
  • Cloud conversations: AWS EBS, Glacier and S3 overview (Part I)
  • Cloud conversations: AWS EBS, Glacier and S3 overview (Part II)
  • Cloud conversations: AWS Government Cloud (GovCloud)
  • Cloud conversations: Gaining cloud confidence from insights into AWS outages
  • AWS (Amazon) storage gateway, first, second and third impressions
  • Cloud conversations: Public, Private, Hybrid what about Community Clouds?
  • Amazon cloud storage options enhanced with Glacier
  • Amazon Web Services (AWS) and the NetFlix Fix?
  • Cloud conversation, Thanks Gartner for saying what has been said
  • Cloud and Virtual Data Storage Networking via Amazon.com
  • Seven Databases in Seven Weeks
  • www.objectstoragecenter.com
  • Ok, nuff said (for now).

    Cheers
    Gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Cloud conversations: AWS EBS, Glacier and S3 overview (Part II S3)

    Storage I/O industry trends image

    Amazon Web Services (AWS) recently added EBS Optimized support for enhanced bandwidth EC2 instances (read more here). This industry trends and perspective cloud conversation is the second (looking at S3 object storage) in a three-part series companion to the AWS EBS optimized post found here. Part I is here (closer look at EBS) and part III is here (tying it all together).

    AWS image via Amazon.com

    For those not familiar, Simple Storage Services (S3), Glacier and Elastic Block Storage (EBS) are part of the AWS cloud storage portfolio of services. With S3, you specify a region where a bucket is created that will contain objects that can be written, read, listed and deleted. You can create multiple buckets in a region with unlimited number of objects ranging from 1 byte to 5 Tbytes in size per bucket. Each object has a unique, user or developer assigned access key. In addition to indicating which AWS region, S3 buckets and objects are provisioned using different levels of availability, durability, SLA’s and costs (view S3 SLA’s here).

    AWS S3 example image

    Cost will vary depending on the AWS region being used, along if Standard or Reduced Redundancy Storage (RSS) selected. Standard S3 storage is designed with 99.999999999% durability (how many copies exists) and 99.99% availability (how often can it be accessed) on an annual basis capable of two data centers becoming un-available.

    As its name implies, for a lower fee and level of durability, S3 RRS has an annual durability of 99.999% and availability of 99.99% capable of a single data center loss. In the following figure durability is how many copies of data exist spread across different servers and storage systems in various data centers and availability zones.

    cloud storage and object storage across availability zone image

    What would you put in RRS vs. Standard S3 storage?

    Items that need some level of persistence that can be refreshed, recreated or restored from some other place or pool of storage such as thumbnails or static content or read caches. Other items would be those that you could tolerant some downtime while waiting for data to be restored, recovered or rebuilt from elsewhere in exchange for a lower cost.

    Different AWS regions can be chosen for regulatory compliance requirements, performance, SLA’s, cost and redundancy with authentication mechanisms including encryption (SSL and HTTPS) to make sure data is kept secure. Various rights and access can be assigned to objects including making them public or private. In addition to logical data protection (security, identity and access management (IAM), encryption, access control) policies also apply to determine level of durability and availability or accessibility of buckets and objects. Other attributes of buckets and objects include life-cycle management polices and logging of activity to the items. Also part of the objects are meta data containing information about the data being stored shown in a generic example below.

    Cloud storage and object storage spread across availability zones figure

    Access to objects is via standard REST and SOAP interfaces with an Application Programming Interface (API). For example default access is via HTTP along with a Bit Torrent interface with optional support via various gateways, appliances and software tools.

    Cloud storage and object storage IO figure
    Example cloud and object storage access

    The above figure via Cloud and Virtual Data Storage Networking (CRC Press) shows a generic example applicable to AWS services including S3 being accessed in different ways. For example I access my S3 buckets and objects via Jungle Disk (one of the tools I use for data protection) that can also access my Rackspace Cloudfiles data. In the following figure there are examples of some of my S3 buckets and objects used by different applications and tools that I have in various AWS regions.

    Image of AWS S3 usage
    AWS S3 buckets and objects in different regions

    Note that I sometimes use other AWS regions outside the US for testing purposes, for compliance purpose my production, business or personal data is only in the US regions.

    The following figure is a generic example of how cloud and object storage are accessed using different tools, hardware, software and API’s along with gateways. AWS is an example of what is shown in the following figure as a Cloud Service and S3, EBS or Glacier as cloud storage. Common example API commands are also shown which will vary by different vendors, products or solution definitions or implementations. While Amazon S3 API which is REST HTTP based has become an industry de facto standard, there are other API’s including CDMI (Cloud Data Management Interface) developed by SNIA which has gained ISO accreditation.

    Cloud storage and object storage I/O figure
    Cloud and object storage access example via Cloud and Virtual Data Storage Networking

    In addition to using Jungle Disk which manages my AWS keys and objects that it creates, I can also access my S3 objects via the AWS management console and web tools, also via third-party tools including Cyberduck.

    Cyberduck tool.

    Additional reading and related items:

    Cloud conversations: AWS EBS, Glacier and S3 overview (Part I)

    Storage I/O industry trends image

    Amazon Web Services (AWS) recently added EBS Optimized support for enhanced bandwidth EC2 instances (read more here). This industry trends and perspective cloud conversation is the first (looking at EBS) in a three-part series companion to the AWS EBS optimized post found here. Part II is here (closer look at S3) and part III is here (tying it all together).

    AWS image via Amazon.com

    For those not familiar, Simple Storage Services (S3), Glacier and Elastic Block Storage (EBS) are part of the AWS cloud storage portfolio of services. There are several other storage and data related service for little data database (SQL and NoSql based) other offerings include compute, data management, application and networking for different needs shown in the following image.

    AWS services console image
    AWS Services Console via www.amazon.com

    Simple Storage Service (S3) is commonly used in the context of cloud storage and object storage accessed via its S3 API. S3 can be used externally from outside AWS as well as within or via other AWS services. For example with Elastic Cloud Compute (EC2) including via the Amazon Storage Gateway (read more here and about EC2 here). Glacier is the AWS cold or deep storage service for inactive data and is a companion to S3 that you can read more about here.

    S3 is well suited for both big and little data repositories of objects ranging from backup to archive to active video images and much more. In fact if you are using some of the different AaaS or SaaS services including backup or file and video sharing, those may be using S3 as its back-end storage repository. For example NetFlix leverages various AWS capabilities as part of its data and applications infrastructure (read more here).

    AWS basics

    AWS consists of multiple regions that contain multiple availability zones where data and applications are supported from.

    yyyy

    Note that objects stored in a region never leave that region, such as data stored in the EU west never leave Ireland, or data in the US East never leaves Virginia.

    AWS does support the ability for user controlled movement of data between regions for business continuance (BC), high availability (HA) and disaster recovery (DR). Read more here at the AWS Security and Compliance site and in this AWS white paper.

    What about EBS?

    That brings us to Elastic Block Storage (EBS) that is used by EC2 (read more about EC2 and instances here) as storage for cloud and virtual machines or compute instances. In addition to using S3 as a persistent backing store or target for holding snapshots EBS can be thought of as primary storage. You can provision and allocate EBS volumes in the different data centers of the various AWS availability zones. As part of allocating your EBS volume you indicate the type (standard) or provisioned IOP’s or the new EBS Optimized volumes. EBS Optimized volumes enables instances that support the feature to have better IO performance to storage.

    The following image shows an EC2 instance with EBS volumes (standard and provisioned IOPS’s) along with S3 volumes and snapshots. In the following example the instance and volumes are being served via the AWS US East region (Northern Virginia) using availability zone US East 1a. In addition, EBS optimized volumes are shown being used in the example to increase bandwidth or throughput performance between storage and the compute instance.

    xxxxxxx

    Using the above as a basis, you can build on that to leverage multiple availability zones or regions for HA, BC and DR combined with application, network load balancing and other capabilities. Note that EBS volumes are protected for durability by being spread across different servers and storage in an availability zone. Additional protection is provided by using snapshots combined with S3. Additional BC and DR or HA protection can be accomplished by replicating data across availability zones.

    SQL applications using cloud and object storage services

    The above is an example of tying various components and services together. For example using different AWS availability zones, instances, EBS, S3 and other tools including those from third parties. Here is a link to a free chapter download from Cloud and Virtual Data Storage Networking (CRC Press) pertaining to data protection, BC and DR (available at Amazon here and Kindle here). In addition here is an AWS white paper on using their services for BC, HA and DR.

    EBS volumes are created ranging in size from 1GByte to 1Tbyte in space capacity with multiple volumes being mapped or attached to an EC2 instances. EBS volumes appear as a virtual disk drive for block storage. From the EC2 instance and guest operating system you can mount, format and use the EBS volumes as any other block disk drive with your favorite tools and file systems. In addition to space capacity, EBS volumes are also provisioned with standard IO (e.g. disk based) performance or high performance Provisioned IOPS (e.g. SSD) for thousands of IOPS per instance. AWS states that a standard EBS volume should support about 100 IOP’s on average, with about 2,000 IOPS for a provisioned IOP volume. Need more than 2,000 IOPS, then the AWS recommendation is to use multiple IOP provisioned volumes with data spread across those. Following is an example of AWS EBS volumes seen via the EC2 management interface.

    Image of mapping AWS EBS to ECS instance
    AWS EC2 and EBS configuration status

    Note that there is a 10 to 1 ratio of space capacity to IOP’s being provisioned. If you try to play a game of 1,000 IOPS provisioned on a 10GByte EBS volume to keep your costs down you are out of luck. Thus to get 1,000 IOPS’s you would need to allocate at least a 100GByte EBS volume of which you will be billed for the actual space used on a monthly pro-rated basis. The following is an example of provisioning an AWS EBS volume using provisioned IOPS in the US East region in the 1a availability zone.

    Image of AWS EBS provisioned IOPs
    Provisioning IOPS with EBS volume

    Standard and Provisioned IOPS EBS volumes

    Standard EBS volumes are good for boot images or other application usage that are not IO performance intensive. For database or other active applications where more performance is needed, then EBS Provisioned IOPS volumes are your option. Note that the provisioned IOP rate is persistent for the specific volume during its life. Thus if you set it and forget it including not using it without turning it off, you will be billed for provisioning it.

    Additional reading and related items:

  • Cloud conversations: AWS EBS optimized instances
  • Cloud conversations: AWS EBS, Glacier and S3 overview (Part II S3)
  • Cloud conversations: AWS EBS, Glacier and S3 overview (Part III)
  • Cloud conversations: AWS Government Cloud (GovCloud)
  • Cloud conversations: Gaining cloud confidence from insights into AWS outages
  • AWS (Amazon) storage gateway, first, second and third impressions
  • Cloud conversations: Public, Private, Hybrid what about Community Clouds?
  • Amazon cloud storage options enhanced with Glacier
  • Amazon Web Services (AWS) and the NetFlix Fix?
  • Cloud conversation, Thanks Gartner for saying what has been said
  • Cloud and Virtual Data Storage Networking via Amazon.com
  • Seven Databases in Seven Weeks
  • www.objectstoragecenter.com
  • Continue reading part II (closer look at S3) here and part III (tying it all together) here.

    Ok, nuff said (for now)

    Cheers
    Gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved