Big Files Lots of Little File Processing Benchmarking with Vdbench

Big Files Lots of Little File Processing Benchmarking with Vdbench


server storage data infrastructure i/o File Processing Benchmarking with Vdbench

Updated 2/10/2018

Need to test a server, storage I/O networking, hardware, software, services, cloud, virtual, physical or other environment that is either doing some form of file processing, or, that you simply want to have some extra workload running in the background for what ever reason? An option is File Processing Benchmarking with Vdbench.

I/O performance

Getting Started


Here’s a quick and relatively easy way to do it with Vdbench (Free from Oracle). Granted there are other tools, both for free and for fee that can similar things, however we will leave those for another day and post. Here’s the con to this approach, there is no Uui Gui like what you have available with some other tools Here’s the pro to this approach, its free, flexible and limited by your creative, amount of storage space, server memory and I/O capacity.

If you need a background on Vdbench and benchmarking, check out the series of related posts here (e.g. www.storageio.com/performance).

Get and Install the Vdbench Bits and Bytes


If you do not already have Vdbench installed, get a copy from the Oracle or Source Forge site (now points to Oracle here).

Vdbench is free, you simply sign-up and accept the free license, select the version down load (it is a single, common distribution for all OS) the bits as well as documentation.

Installation particular on Windows is really easy, basically follow the instructions in the documentation by copying the contents of the download folder to a specified directory, set up any environment variables, and make sure that you have Java installed.

Here is a hint and tip for Windows Servers, if you get an error message about counters, open a command prompt with Administrator rights, and type the command:

$ lodctr /r


The above command will reset your I/O counters. Note however that command will also overwrite counters if enabled so only use it if you have to.

Likewise *nix install is also easy, copy the files, make sure to copy the applicable *nix shell script (they are in the download folder), and verify Java is installed and working.

You can do a vdbench -t (windows) or ./vdbench -t (*nix) to verify that it is working.

Vdbench File Processing

There are many options with Vdbench as it has a very robust command and scripting language including ability to set up for loops among other things. We are only going to touch the surface here using its file processing capabilities. Likewise, Vdbench can run from a single server accessing multiple storage systems or file systems, as well as running from multiple servers to a single file system. For simplicity, we will stick with the basics in the following examples to exercise a local file system. The limits on the number of files and file size are limited by server memory and storage space.

You can specify number and depth of directories to put files into for processing. One of the parameters is the anchor point for the file processing, in the following examples =S:\SIOTEMP\FS1 is used as the anchor point. Other parameters include the I/O size, percent reads, number of threads, run time and sample interval as well as output folder name for the result files. Note that unlike some tools, Vdbench does not create a single file of results, rather a folder with several files including summary, totals, parameters, histograms, CSV among others.


Simple Vdbench File Processing Commands

For flexibility and ease of use I put the following three Vdbench commands into a simple text file that is then called with parameters on the command line.
fsd=fsd1,anchor=!fanchor,depth=!dirdep,width=!dirwid,files=!numfiles,size=!filesize

fwd=fwd1,fsd=fsd1,rdpct=!filrdpct,xfersize=!fxfersize,fileselect=random,fileio=random,threads=!thrds

rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=!etime,interval=!itime

Simple Vdbench script

# SIO_vdbench_filesystest.txt
#
# Example Vdbench script for file processing
#
# fanchor = file system place where directories and files will be created
# dirwid = how wide should the directories be (e.g. how many directories wide)
# numfiles = how many files per directory
# filesize = size in in k, m, g e.g. 16k = 16KBytes
# fxfersize = file I/O transfer size in kbytes
# thrds = how many threads or workers
# etime = how long to run in minutes (m) or hours (h)
# itime = interval sample time e.g. 30 seconds
# dirdep = how deep the directory tree
# filrdpct = percent of reads e.g. 90 = 90 percent reads
# -p processnumber = optional specify a process number, only needed if running multiple vdbenchs at same time, number should be unique
# -o output file that describes what being done and some config info
#
# Sample command line shown for Windows, for *nix add ./
#
# The real Vdbench script with command line parameters indicated by !=
#

fsd=fsd1,anchor=!fanchor,depth=!dirdep,width=!dirwid,files=!numfiles,size=!filesize

fwd=fwd1,fsd=fsd1,rdpct=!filrdpct,xfersize=!fxfersize,fileselect=random,fileio=random,threads=!thrds

rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=!etime,interval=!itime

Big Files Processing Script


With the above script file defined, for Big Files I specify a command line such as the following.
$ vdbench -f SIO_vdbench_filesystest.txt fanchor=S:\SIOTemp\FS1 dirwid=1 numfiles=60 filesize=5G fxfersize=128k thrds=64 etime=10h itime=30 numdir=1 dirdep=1 filrdpct=90 -p 5576 -o SIOWS2012R220_NOFUZE_5Gx60_BigFiles_64TH_STX1200_020116

Big Files Processing Example Results


The following is one of the result files from the folder of results created via the above command for Big File processing showing totals.


Run totals

21:09:36.001 Starting RD=format_for_rd1

Feb 01, 2016 .Interval. .ReqstdOps.. ...cpu%... read ....read.... ...write.... ..mb/sec... mb/sec .xfer.. ...mkdir... ...rmdir... ..create... ...open.... ...close... ..delete...
rate resp total sys pct rate resp rate resp read write total size rate resp rate resp rate resp rate resp rate resp rate resp
21:23:34.101 avg_2-28 2848.2 2.70 8.8 8.32 0.0 0.0 0.00 2848.2 2.70 0.00 356.0 356.02 131071 0.0 0.00 0.0 0.00 0.1 109176 0.1 0.55 0.1 2006 0.0 0.00

21:23:35.009 Starting RD=rd1; elapsed=36000; fwdrate=max. For loops: None

07:23:35.000 avg_2-1200 4939.5 1.62 18.5 17.3 90.0 4445.8 1.79 493.7 0.07 555.7 61.72 617.44 131071 0.0 0.00 0.0 0.00 0.0 0.00 0.1 0.03 0.1 2.95 0.0 0.00


Lots of Little Files Processing Script


For lots of little files, the following is used.


$ vdbench -f SIO_vdbench_filesystest.txt fanchor=S:\SIOTEMP\FS1 dirwid=64 numfiles=25600 filesize=16k fxfersize=1k thrds=64 etime=10h itime=30 dirdep=1 filrdpct=90 -p 5576 -o SIOWS2012R220_NOFUZE_SmallFiles_64TH_STX1200_020116

Lots of Little Files Processing Example Results


The following is one of the result files from the folder of results created via the above command for Big File processing showing totals.
Run totals

09:17:38.001 Starting RD=format_for_rd1

Feb 02, 2016 .Interval. .ReqstdOps.. ...cpu%... read ....read.... ...write.... ..mb/sec... mb/sec .xfer.. ...mkdir... ...rmdir... ..create... ...open.... ...close... ..delete...
rate resp total sys pct rate resp rate resp read write total size rate resp rate resp rate resp rate resp rate resp rate resp
09:19:48.016 avg_2-5 10138 0.14 75.7 64.6 0.0 0.0 0.00 10138 0.14 0.00 158.4 158.42 16384 0.0 0.00 0.0 0.00 10138 0.65 10138 0.43 10138 0.05 0.0 0.00

09:19:49.000 Starting RD=rd1; elapsed=36000; fwdrate=max. For loops: None

19:19:49.001 avg_2-1200 113049 0.41 67.0 55.0 90.0 101747 0.19 11302 2.42 99.36 11.04 110.40 1023 0.0 0.00 0.0 0.00 0.0 0.00 7065 0.85 7065 1.60 0.0 0.00


Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

The above examples can easily be modified to do different things particular if you read the Vdbench documentation on how to setup multi-host, multi-storage system, multiple job streams to do different types of processing. This means you can benchmark a storage systems, server or converged and hyper-converged platform, or simply put a workload on it as part of other testing. There are even options for handling data footprint reduction such as compression and dedupe.

Ok, nuff said, for now.

Gs

Greg Schulz - Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

How to test your HDD SSD or all flash array (AFA) storage fundamentals

How to test your HDD SSD AFA Hybrid or cloud storage

server storage data infrastructure i/o hdd ssd all flash array afa fundamentals

Updated 2/14/2018

Over at BizTech Magazine I have a new article 4 Ways to Performance Test Your New HDD or SSD that provides a quick guide to verifying or learning what the speed characteristic of your new storage device are capable of.

An out-take from the article used by BizTech as a "tease" is:

These four steps will help you evaluate new storage drives. And … psst … we included the metrics that matter.

Building off the basics, server storage I/O benchmark fundamentals

The four basic steps in the article are:

  • Plan what and how you are going to test (what’s applicable for you)
  • Decide on a benchmarking tool (learn about various tools here)
  • Test the test (find bugs, errors before a long running test)
  • Focus on metrics that matter (what’s important for your environment)

Server Storage I/O performance

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

To some the above (read the full article here) may seem like common sense tips and things everybody should know otoh there are many people who are new to servers storage I/O networking hardware software cloud virtual along with various applications, not to mention different tools.

Thus the above is a refresher for some (e.g. Dejavu) while for others it might be new and revolutionary or simply helpful. Interested in HDD’s, SSD’s as well as other server storage I/O performance along with benchmarking tools, techniques and trends check out the collection of links here (Server and Storage I/O Benchmarking and Performance Resources).

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

I/O, I/O how well do you know good bad ugly server storage I/O iops?

How well do you know good bad ugly I/O iops?

server storage i/o iops activity data infrastructure trends

Updated 2/10/2018

There are many different types of server storage I/O iops associated with various environments, applications and workloads. Some I/Os activity are iops, others are transactions per second (TPS), files or messages per time (hour, minute, second), gets, puts or other operations. The best IO is one you do not have to do.

What about all the cloud, virtual, software defined and legacy based application that still need to do I/O?

If no IO operation is the best IO, then the second best IO is the one that can be done as close to the application and processor as possible with the best locality of reference.

Also keep in mind that aggregation (e.g. consolidation) can cause aggravation (server storage I/O performance bottlenecks).

aggregation causes aggravation
Example of aggregation (consolidation) causing aggravation (server storage i/o blender bottlenecks)

And the third best?

It’s the one that can be done in less time or at least cost or effect to the requesting application, which means moving further down the memory and storage stack.

solving server storage i/o blender and other bottlenecks
Leveraging flash SSD and cache technologies to find and fix server storage I/O bottlenecks

On the other hand, any IOP regardless of if for block, file or object storage that involves some context is better than those without, particular involving metrics that matter (here, here and here [webinar] )

Server Storage I/O optimization and effectiveness

The problem with IO’s is that they are a basic operations to get data into and out of a computer or processor, so there’s no way to avoid all of them, unless you have a very large budget. Even if you have a large budget that can afford an all flash SSD solution, you may still meet bottlenecks or other barriers.

IO’s require CPU or processor time and memory to set up and then process the results as well as IO and networking resources to move data too their destination or retrieve them from where they are stored. While IO’s cannot be eliminated, their impact can be greatly improved or optimized by, among other techniques, doing fewer of them via caching and by grouping reads or writes (pre-fetch, write-behind).

server storage I/O STI and SUT

Think of it this way: Instead of going on multiple errands, sometimes you can group multiple destinations together making for a shorter, more efficient trip. However, that optimization may also mean your drive will take longer. So, sometimes it makes sense to go on a couple of quick, short, low-latency trips instead of one larger one that takes half a day even as it accomplishes many tasks. Of course, how far you have to go on those trips (i.e., their locality) makes a difference about how many you can do in a given amount of time.

Locality of reference (or proximity)

What is locality of reference?

This refers to how close (i.e., its place) data exists to where it is needed (being referenced) for use. For example, the best locality of reference in a computer would be registers in the processor core, ready to be acted on immediately. This would be followed by levels 1, 2, and 3 (L1, L2, and L3) onboard caches, followed by main memory, or DRAM. After that comes solid-state memory typically NAND flash either on PCIe cards or accessible on a direct attached storage (DAS), SAN, or NAS device. 

server storage I/O locality of reference

Even though a PCIe NAND flash card is close to the processor, there still remains the overhead of traversing the PCIe bus and associated drivers. To help offset that impact, PCIe cards use DRAM as cache or buffers for data along with meta or control information to further optimize and improve locality of reference. In other words, this information is used to help with cache hits, cache use, and cache effectiveness vs. simply boosting cache use.

SSD to the rescue?

What can you do the cut the impact of IO’s?

There are many steps one can take, starting with establishing baseline performance and availability metrics.

The metrics that matter include IOP’s, latency, bandwidth, and availability. Then, leverage metrics to gain insight into your application’s performance.

Understand that IO’s are a fact of applications doing work (storing, retrieving, managing data) no matter whether systems are virtual, physical, or running up in the cloud. But it’s important to understand just what a bad IO is, along with its impact on performance. Try to identify those that are bad, and then find and fix the problem, either with software, application, or database changes. Perhaps you need to throw more software caching tools, hypervisors, or hardware at the problem. Hardware may include faster processors with more DRAM and faster internal busses.

Leveraging local PCIe flash SSD cards for caching or as targets is another option.

You may want to use storage systems or appliances that rely on intelligent caching and storage optimization capabilities to help with performance, availability, and capacity.

Where to gain insight into your server storage I/O environment

There are many tools that you can be used to gain insight into your server storage I/O environment across cloud, virtual, software defined and legacy as well as from different layers (e.g. applications, database, file systems, operating systems, hypervisors, server, storage, I/O networking). Many applications along with databases have either built-in or optional tools from their provider, third-party, or via other sources that can give information about work activity being done. Likewise there are tools to dig down deeper into the various data information infrastructure to see what is happening at the various layers as shown in the following figures.

application storage I/O performance
Gaining application and operating system level performance insight via different tools

windows and linux storage I/O performance
Insight and awareness via operating system tools on Windows and Linux

In the above example, Spotlight on Windows (SoW) which you can download for free from Dell here along with Ubuntu utilities are shown, You could also use other tools to look at server storage I/O performance including Windows Perfmon among others.

vmware server storage I/O
Hypervisor performance using VMware ESXi / vsphere built-in tools

vmware server storage I/O performance
Using Visual ESXtop to dig deeper into virtual server storage I/O performance

vmware server storage i/o cache
Gaining insight into virtual server storage I/O cache performance

Wrap up and summary

There are many approaches to address (e.g. find and fix) vs. simply move or mask data center and server storage I/O bottlenecks. Having insight and awareness into how your environment along with applications is important to know to focus resources. Also keep in mind that a bit of flash SSD or DRAM cache in the applicable place can go along way while a lot of cache will also cost you cash. Even if you cant eliminate I/Os, look for ways to decrease their impact on your applications and systems.

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

>Keep in mind: SSD including flash and DRAM among others are in your future, the question is where, when, with what, how much and whose technology or packaging.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Revisiting RAID data protection remains relevant resource links

Revisiting RAID data protection remains relevant and resources

Storage I/O trends

Updated 2/10/2018

RAID data protection remains relevant including erasure codes (EC), local reconstruction codes (LRC) among other technologies. If RAID were really not relevant anymore (e.g. actually dead), why do some people spend so much time trying to convince others that it is dead or to use a different RAID level or enhanced RAID or beyond raid with related advanced approaches?

When you hear RAID, what comes to mind?

A legacy monolithic storage system that supports narrow 4, 5 or 6 drive wide stripe sets or a modern system support dozens of drives in a RAID group with different options?

RAID means many things, likewise there are different implementations (hardware, software, systems, adapters, operating systems) with various functionality, some better than others.

For example, which of the items in the following figure come to mind, or perhaps are new to your RAID vocabulary?

RAID questions

There are Many Variations of RAID Storage some for the enterprise, some for SMB, SOHO or consumer. Some have better performance than others, some have poor performance for example causing extra writes that lead to the perception that all parity based RAID do extra writes (some actually do write gathering and optimization).

Some hardware and software implementations using WBC (write back cache) mirrored or battery backed-BBU along with being able to group writes together in memory (cache) to do full stripe writes. The result can be fewer back-end writes compared to other systems. Hence, not all RAID implementations in either hardware or software are the same. Likewise, just because a RAID definition shows a particular theoretical implementation approach does not mean all vendors have implemented it in that way.

RAID is not a replacement for backup rather part of an overall approach to providing data availability and accessibility.

data protection and durability

What’s the best RAID level? The one that meets YOUR needs

There are different RAID levels and implementations (hardware, software, controller, storage system, operating system, adapter among others) for various environments (enterprise, SME, SMB, SOHO, consumer) supporting primary, secondary, tertiary (backup/data protection, archiving).

RAID comparison
General RAID comparisons

Thus one size or approach does fit all solutions, likewise RAID rules of thumbs or guides need context. Context means that a RAID rule or guide for consumer or SOHO or SMB might be different for enterprise and vise versa, not to mention on the type of storage system, number of drives, drive type and capacity among other factors.

RAID comparison
General basic RAID comparisons

Thus the best RAID level is the one that meets your specific needs in your environment. What is best for one environment and application may be different from what is applicable to your needs.

Key points and RAID considerations include:

· Not all RAID implementations are the same, some are very much alive and evolving while others are in need of a rest or rewrite. So it is not the technology or techniques that are often the problem, rather how it is implemented and then deployed.

· It may not be RAID that is dead, rather the solution that uses it, hence if you think a particular storage system, appliance, product or software is old and dead along with its RAID implementation, then just say that product or vendors solution is dead.

· RAID can be implemented in hardware controllers, adapters or storage systems and appliances as well as via software and those have different features, capabilities or constraints.

· Long or slow drive rebuilds are a reality with larger disk drives and parity-based approaches; however, you have options on how to balance performance, availability, capacity, and economics.

· RAID can be single, dual or multiple parity or mirroring-based.

· Erasure and other coding schemes leverage parity schemes and guess what umbrella parity schemes fall under.

· RAID may not be cool, sexy or a fun topic and technology to talk about, however many trendy tools, solutions and services actually use some form or variation of RAID as part of their basic building blocks. This is an example of using new and old things in new ways to help each other do more without increasing complexity.

·  Even if you are not a fan of RAID and think it is old and dead, at least take a few minutes to learn more about what it is that you do not like to update your dead FUD.

Wait, Isn’t RAID dead?

There is some dead marketing that paints a broad picture that RAID is dead to prop up something new, which in some cases may be a derivative variation of parity RAID.

data dispersal
Data dispersal and durability

RAID rebuild improving
RAID continues to evolve with rapid rebuilds for some systems

Otoh, there are some specific products, technologies, implementations that may be end of life or actually dead. Likewise what might be dead, dying or simply not in vogue are specific RAID implementations or packaging. Certainly there is a lot of buzz around object storage, cloud storage, forward error correction (FEC) and erasure coding including messages of how they cut RAID. Catch is that some object storage solutions are overlayed on top of lower level file systems that do things such as RAID 6, granted they are out of sight, out of mind.

RAID comparison
General RAID parity and erasure code/FEC comparisons

Then there are advanced parity protection schemes which include FEC and erasure codes that while they are not your traditional RAID levels, they have characteristic including chunking or sharding data, spreading it out over multiple devices with multiple parity (or derivatives of parity) protection.

Bottom line is that for some environments, different RAID levels may be more applicable and alive than for others.

Via BizTech – How to Turn Storage Networks into Better Performers

  • Maintain Situational Awareness
  • Design for Performance and Availability
  • Determine Networked Server and Storage Patterns
  • Make Use of Applicable Technologies and Techniques

If RAID is alive, what to do with it?

If you are new to RAID, learn more about the past, present and future keeping mind context. Keeping context in mind means that there are different RAID levels and implementations for various environments. Not all RAID 0, 1, 1/0, 10, 2, 3, 4, 5, 6 or other variations (past, present and emerging) are the same for consumer vs. SOHO vs. SMB vs. SME vs. Enterprise, nor are the usage cases. Some need performance for reads, others for writes, some for high-capacity with low performance using hardware or software. RAID Rules of thumb are ok and useful, however keep them in context to what you are doing as well as using.

What to do next?

Take some time to learn, ask questions including what to use when, where, why and how as well as if an approach or recommendation are applicable to your needs. Check out the following links to read some extra perspectives about RAID and keep in mind, what might apply to enterprise may not be relevant for consumer or SMB and vise versa.

Some advise needed on SSD’s and Raid (Via Spiceworks)
RAID 5 URE Rebuild Means The Sky Is Falling (Via BenchmarkReview)
Double drive failures in a RAID-10 configuration (Via SearchStorage)
Industry Trends and Perspectives: RAID Rebuild Rates (Via StorageIOblog)
RAID, IOPS and IO observations (Via StorageIOBlog)
RAID Relevance Revisited (Via StorageIOBlog)
HDDs Are Still Spinning (Rust Never Sleeps) (Via InfoStor)
When and Where to Use NAND Flash SSD for Virtual Servers (Via TheVirtualizationPractice)
What’s the best way to learn about RAID storage? (Via Spiceworks)
Design considerations for the host local FVP architecture (Via Frank Denneman)
Some basic RAID fundamentals and definitions (Via SearchStorage)
Can RAID extend nand flash SSD life? (Via StorageIOBlog)
I/O Performance Issues and Impacts on Time-Sensitive Applications (Via CMG)
The original RAID white paper (PDF) that while over 20 years old, it provides a basis, foundation and some history by Katz, Gibson, Patterson et al
Storage Interview Series (Via Infortrend)
Different RAID methods (Via RAID Recovery Guide)
A good RAID tutorial (Via TheGeekStuff)
Basics of RAID explained (Via ZDNet)
RAID and IOPs (Via VMware Communities)

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

What is my favorite or preferred RAID level?

That depends, for some things its RAID 1, for others RAID 10 yet for others RAID 4, 5, 6 or DP and yet other situations could be a fit for RAID 0 or erasure codes and FEC. Instead of being focused on just one or two RAID levels as the solution for different problems, I prefer to look at the environment (consumer, SOHO, small or large SMB, SME, enterprise), type of usage (primary or secondary or data protection), performance characteristics, reads, writes, type and number of drives among other factors. What might be a fit for one environment would not be a fit for others, thus my preferred RAID level along with where implemented is the one that meets the given situation. However also keep in mind is tying RAID into part of an overall data protection strategy, remember, RAID is not a replacement for backup.

What this all means

Like other technologies that have been declared dead for years or decades, aka the Zombie technologies (e.g. dead yet still alive) RAID continues to be used while the technologies evolves. There are specific products, implementations or even RAID levels that have faded away, or are declining in some environments, yet alive in others. RAID and its variations are still alive, however how it is used or deployed in conjunction with other technologies also is evolving.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

USENIX FAST (File and Storage Technologies) 2014 Conference Proceedings

Storage I/O trends

USENIX FAST (File and Storage Technologies) 2014 Conference Proceedings

In case you missed it, the 12th annual USENIX conference on File and Storage Technologies (FAST) was recently held in Santa Clara, CA.

USENIX FAST 2014

Big Data, Little Data, Fast SSD and Erasure Code Data

If like me you are interested in FAST related technologies, trends, tools and related research, check out the conference PDF proceedings here.

You can also go here to the USENIX FAST site to view additional information about the sessions along with other download material.

The PDF format proceedings contain over 320 pages of content including some good white papers and information covering RAID and Erasure code, Big Data and Little Data, Cloud and Virtualization, Flash, DRAM, SSD, Filesystem performance, metrics, measurement and related software along with plenty of file system related material.

USENIX FAST 2014 Proceedings Index

USENIX FAST 2014 Proceedings Index part 3

Heads up though, these are not your usual vendor high-level marketing white papers rather what you would expect from a technical conference such as FAST as you can see in the above index with abstracts.

So add the 2014 USENIX FAST Proceedings to your reading list.

Ok, nuff said

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Welcome to the Data Protection Diaries

Updated 1/10/2018

Storage I/O trends

Welcome to the Data Protection Diaries

This is a series of posts about data protection which includes security (logical and physical), backup/restore, business continuance (BC), disaster recovery (DR), business resiliency (BR) along with high availability (HA), archiving and related topic themes, technologies and trends.

Think of data protection like protect, preserve and serve information across cloud, virtual and physical environments spanning traditional servers, storage I/O networking along with mobile (ok, some IoT as well), SOHO/SMB to enterprise.

Getting started, taking a step back

Recently I have done a series of webinars and Google+ hangouts as part of the BackupU initiative brought to you by Dell Software (that’s a disclosure btw ;) ) that are vendor and technology neutral. Instead of the usual vendor product or technology focused seminars and events, these are about getting back to the roots, the fundamentals of what to protect when and why, then decide your options as well as different approaches (e.g. what tools to use when).

In addition over the past year (ok, years) I have also been doing other data protection related events, seminars, workshops, articles, tips, posts across cloud, virtual and physical from SOHO/SMB to enterprise. These are in addition to the other data infrastructure server and storage I/O stuff (e.g. SSD, object storage, software defined, big data, little data, buzzword bingo and others).

Keep in mind that in the data center or information factory everything is not the same as there are different applications, threat risk scenarios, availability and durability among other considerations. In this series like the cloud conversations among others, I’m going to be pulling various data protection themes together hopefully to make it easier for others to find, as well as where I know where to get them.

data protection diaries
Some notes for an upcoming post in this series using my Livescribe about data protection

Data protection topics, trends, technologies and related themes

Here are some more posts to checkout pertaining to data protection trends, technologies and perspectives:

Ok, nuff said (for now)

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Viking SATADIMM: Nand flash SATA SSD in DDR3 DIMM slot?

Storage I/O trends

Today computer and data storage memory vendor Viking announced that SSD vendor Solidfire has deployed their SATADIMM modules in DDR3 DIMM (e.g. Random Access Memory (RAM) main memory) slots of their SF SSD based storage solution.

solidfire ssd storage with satadimm
Solidfire SD solution with SATADIMM via Viking

Nand flash SATA SSD in a DDR3 DIMM slot?

Per Viking, Solidfire uses the SATADIMM as boot devices and cache to complement the normal SSD drives used in their SF SSD storage grid or cluster. For those not familiar, Solidfire SF storage systems or appliances are based on industry standard servers that are populated with SSD devices which in turn are interconnected with other nodes (servers) to create a grid or cluster of SSD performance and space capacity. Thus as nodes are added, more performance, availability and capacity are also increased all of which are accessed via iSCSI. Learn more about Solidfire SD solutions on their website here.

Here is the press release that Viking put out today:

Viking Technology SATADIMM Increases SSD Capacity in SolidFire’s Storage System (Press Release)

Viking Technology’s SATADIMM enables higher total SSD capacity for SolidFire systems, offering cloud infrastructure providers an optimized and more powerful solution

FOOTHILL RANCH, Calif., August 12, 2013 – Viking Technology, an industry leading supplier of Solid State Drives (SSDs), Non-Volatile Dual In-line Memory Module (NVDIMMs), and DRAM, today announced that SolidFire has selected its SATADIMM SSD as both the cache SSD and boot volume SSD for their storage nodes. Viking Technology’s SATADIMM SSD enables SolidFire to offer enhanced products by increasing both the number and the total capacity of SSDs in their solution.

“The Viking SATADIMM gives us an additional SSD within the chassis allowing us to dedicate more drives towards storage capacity, while storing boot and metadata information securely inside the system,” says Adam Carter, Director of Product Management at SolidFire. “Viking’s SATADIMM technology is unique in the market and an important part of our hardware design.”

SATADIMM is an enterprise-class SSD in a Dual In-line Memory Module (DIMM) form factor that resides within any empty DDR3 DIMM socket. The drive enables SSD caching and boot capabilities without using a hard disk drive bay. The integration of Viking Technology’s SATADIMM not only boosts overall system performance but allows SolidFire to minimize potential human errors associated with data center management, such as accidentally removing a boot or cache drive when replacing an adjacent failed drive.

“We are excited to support SolidFire with an optimal solid state solution that delivers increased value to their customers compared to traditional SSDs,” says Adrian Proctor, VP of Marketing, Viking Technology. “SATADIMM is a solid state drive that takes advantage of existing empty DDR3 sockets and provides a valuable increase in both performance and capacity.”

SATADIMM is a 6Gb SATA SSD with capacities up to 512GB. A next generation SAS solution with capacities of 1TB & 2TB will be available early in 2014. For more information, visit our website www.vikingtechnology.com or email us at sales@vikingtechnology.com.

Sales information is available at: www.vikingtechnology.com, via email at sales@vikingtechnology.com or by calling (949) 643-7255.

About Viking Technology Viking Technology is recognized as a leader in NVDIMM technology. Supporting a broad range of memory solutions that bridge DRAM and SSD, Viking delivers solutions to OEMs in the enterprise, high-performance computing, industrial and the telecommunications markets. Viking Technology is a division of Sanmina Corporation (Nasdaq: SANM), a leading Electronics Manufacturing Services (EMS) provider. More information is available at www.vikingtechnology.com.

About SolidFire SolidFire is the market leader in high-performance data storage systems designed for large-scale public and private cloud infrastructure. Leveraging an all-flash scale-out architecture with patented volume-level quality of service (QoS) control, providers can now guarantee storage performance to thousands of applications within a shared infrastructure. In-line data reduction techniques along with system-wide automation are fueling new block-storage services and advancing the way the world uses the cloud.

What’s inside the press release

On the surface this might cause some to jump to the conclusion that the nand flash SSD is being accessed via the fast memory bus normally used for DRAM (e.g. main memory) of a server or storage system controller. For some this might even cause a jump to conclusion that Viking has figured out a way to use nand flash for reads and writes not only via a DDR3 DIMM memory location, as well as doing so with the Serial ATA (SATA) protocol enabling server boot and use by any operating system or hypervisors (e.g. VMware vSphere or ESXi, Microsoft Hyper-V, Xen or KVM among others).

Note for those not familiar or needing a refresh on DRAM, DIMM and related items, here is an excerpt from Chapter 7 (Servers – Physical, Virtual and Software) from my book "The Green and Virtual Data Center" (CRC Press).

7.2.2 Memory

Computers rely on some form of memory ranging from internal registers, local on-board processor Level 1 (L1) and Level 2 (L2) caches, random accessible memory (RAM), non-volatile RAM (NVRAM) or Flash along with external disk storage. Memory, which includes external disk storage, is used for storing operating system software along with associated tools or utilities, application programs and data. Read more of the excerpt here…

Is SATADIMM memory bus nand flash SSD storage?

In short no.

Some vendors or their surrogates might be tempted to spin such a story by masking some details to allow your imagination to run wild a bit. When I saw the press release announcement I reached out to Tinh Ngo (Director Marketing Communications) over at Viking with some questions. I was expecting the usual marketing spin story, dancing around the questions with long answers or simply not responding with anything of substance (or that requires some substance to believe). Again what I found was the opposite and thus want to share with you some of the types of questions and answers.

So what actually is SATADIMM? See for yourself in the following image (click on it to view or Viking site).

Via Viking website, click on image or here to learn more about SATADIMM

Does SATADIMM actually move data via DDR3 and memory bus? No, SATADIMM only draws power from it (yes nand flash does need power when in use contrary to a myth I was told about).

Wait, then how is data moved and how does it get to and through the SATA IO stack (hardware and software)?

Simple, there is a cable connector that attached to the SATADIMM that in turn attached to an internal SATA port. Or using a different connector cable attach the SATADIMM (up to four) to a standard SAS internal port such as on a main board, HBA, RAID or caching adapter.

industry trend

Does that mean that Viking and who ever uses SATADIMM is not actually moving data or implementing SATA via the memory bus and DDR3 DIMM sockets? That would be correct, data movement occurs via cable connection to standard SATA or SAS ports.

Wait, why would I give up a DDR3 DIMM socket in my server that could be used for more DRAM? Great question and one that should be it depends on if you need more DRAM or more nand flash? If you are out of drive slots or PCIe card slots and have enough DRAM for your needs along with available DDR3 slots, you can stuff more nand flash into those locations assuming you have SAS or SATA connectivity.

satadimm
SATADIMM with SATA connector top right via Viking

satadimm sata connector
SATADIMM SATA connector via Viking

satadimm sas connector
SATADIMM SAS (Internal) connector via Viking

Why not just use the onboard USB ports and plug-in some high-capacity USB thumb drives to cut cost? If that is your primary objective it would probably work and I can also think of some other ways to cut cost. However those are also probably not the primary tenants that people looking to deploy something like SATADIMM would be looking for.

What are the storage capacities that can be placed on the SATADIMM? They are available in different sizes up to 400GB for SLC and 480GB for MLC. Viking indicated that there are larger capacities and faster 12Gb SAS interfaces in the works which would be more of a surprise if there were not. Learn more about current product specifications here.

Good questions. Attached are three images that sort of illustrates the connector. As well, why not a USB drive; well, there are customers that put 12 of these in the system (with up to 480GB usable capacity) that equates to roughly an added 5.7TBs inside the box without touching the drive bays (left for mass HDD’s). You will then need to raid/connect) all the SATADIMM via a HBA.

How fast is the SATADIMM and does putting it into a DDR3 slot speed things up or slow them down? Viking has some basic performance information on their site (here). However generally should be the same or similar to reach a SAS or SATA SSD drive, although keep SSD metrics and performance in the proper context. Also keep in mind that the DDR3 DIMM slot is only being used for power and not real data movement.

Is the SATADIMM using 3Gbs or 6Gbs SATA? Good questions, today is 6Gb SATA (remember that SATA can attach to a SAS port however not vise versa). Lets see if Viking responds in the comments with more including RAID support (hardware or software) along with other insight such as UNMAP, TRIM, Advanced Format (AF) 4KByte blocks among other things.

Have I actually tried SATADIMM yet? No, not yet. However would like to give it a test drive and workout if one were to show up on my doorstep along with disclosure and share the results if applicable.

industry trend

Future of nand flash in DRAM DIMM sockets

Keep in mind that someday nand flash will actually seem not only in a Webex or Powerpoint demo preso (e.g. similar to what Diablo Technology is previewing), as well as in real use for example what Micron earlier this year predicted for flash on DDR4 (more DDR3 vs. DDR4 here).

Is SATADIMM the best nand flash SSD approach for every solution or environment? No, however it does give some interesting options for those who are PCIe card, or HDD and SSD drive slot constrained that also have available DDR3 DIMM sockets. As to price, check with Viking, wish I could say tell them Greg from StorageIO sent you for a good value, however not sure what they would say or do.

Related more reading:
How much storage performance do you want vs. need?
Can RAID extend the life of nand flash SSD?
Can we get a side of context with them IOPS and other storage metrics?
SSD & Real Estate: Location, Location, Location
What is the best kind of IO? The one you do not have to do
SSD, flash and DRAM, DejaVu or something new?

Ok, nuff said (for now).

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Can RAID extend the life of nand flash SSD?

Storage I/O trends

Can RAID extend nand flash SSD life?

Imho, the short answer is YES, under some circumstances.

There is a myth and some FUD that RAID (Redundant Array of Independent Disks) can shorten the life durability of nand flash SSD (Solid State Device) vs. HDD (Hard Disk Drives) due to extra IOP’s. The reality is that depending on how configured, RAID level, implementation and other factors, nand flash SSD can be extended as I discuss in this here video.

Video

Nand flash SSD cells and wear

First, there is a myth that nand flash SSD does not have moving parts like hard disk drives (HDD’s) thus do not wear out or break. That is just a myth in that nand flash by its nature wears out with write usage. This is due to how they store data in cells that have a rated number of program erase (P/E) cycles that vary by type of medium. For example, Single Level Cell (SLC) has a longer P/E life duration vs. Multi-Level Cells (MLC) and eMLC that stack multiple cells together.

There are a number of factors that contribute to nand flash wear, also known as duty cycle or durability tied to P/E. For example, some storage systems or controllers do a better job both at the lower level flash translation layer (FTL) in addition to controllers, firmware, caching using DRAM and IO optimization such as write ordering or grouping.

Now what about this RAID and SSD thing?

Ok first as a recap keep in mind that there are many RAID levels along with variations, enhancements and where, or how implemented ranging from software to hardware, adapters to controllers to storage systems.

In the case of RAID 1 or mirroring, just like replication or other one to one or one too many copy operation a write to one device is echoed to another. In the case of RAID 5, data is spread across drives and parity; however, the parity is rotated across all drives in an equal manner.

Some FUD or myths or misunderstandings come into play is that not all RAID 5 implementations as an example are not the same. Some do a better job of buffering or caching data in battery protected mirrored DRAM memory until a full stripe write can occur, or if needed, a partial write.

Another attribute is the chunk or shard size (how much data is sent to each drive member) along with the stripe width (how many drives). Some systems have narrow stripes of say 3+1 or 4+1 or 5+1 while others can be 14+1 or 15+1 or wider. Thus, data can be written across a wider number of drives reducing the P/E consumption or use of a single drive depending on implementation.

How about RAID 6 (dual parity)?

Same thing, it is a matter of how well the implementation is, how the write gathering is done and so forth.

What about RAID wearing out nand flash SSD?

While it is possible that it has or can occur depending on type of RAID implementation, lack of caching or optimization, configuration, type of SSD, RAID level and other things, in general I will say myth busted.

Want some proof?

I could go through a long technical proof point and citing lots of facts, figures, experts and so forth leaving you all silenced and dazed similar to the students listening to Ben Stein in Ferris Buelers day off (Click here to see what I mean) asking “anybody anybody Buleler?

Ben Stein via https://nostagjicmoviesandthings.blogspot.com
Image via nostagjicmoviesandthings.blogspot.com

How about some simple SSD and storage math?

On a very conservative basis, my estimate is that around 250PB of nand flash SSD drives are shipped and installed on a revenue basis attached to or in storage systems and appliances. Combine what Dell + DotHill + EMC + Fujitsu + HDS + HP + IBM (including TMS) + NEC + NetApp + NEC + Oracle among other legacy along with new all flash as well as hybrid vendors (e.g. Cloudbyte, FusionIO (Via their Nexgen acquisition), Kaminario, Greenbytes, Nutanix or Nimble, Purestorage, Starboard or Solidfire, Tegile or Tintri, Violin or Whiptail among others).

It is also a safe assumption based on how customers configure and use those and other storage systems is with some form of RAID. Thus if things were as bad as some researchers were, vendors and their pundits have made them out to be, wouldn’t’t we be hearing of those issues?

Is it just a RAID 5 problem and that RAID 6 magically corrects the problem?

Well, that depends on apples to apples vs. apples to oranges comparisons.

For example if you are using a 14+2 (16 drive) RAID 6 to compare to say a 3+1 (4 drive) RAID 5 that is not a fair comparison. Granted, it is a handy one if you are a vendor that supports wider RAID groups, stripes and ranks vs. those who do not. However also keep in mind that some legacy vendors actually also support wide stripes and RAID groups.

So in some cases the magic is not in the RAID level, rather the implementation or how configured or lack thereof.

Video

Watch this TechTarget produced video recorded live while I was at EMCworld 2013 to learn more.

Otherwise, ok, nuff said (for now).

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

As the platters spin, HDD’s for cloud, virtual and traditional storage environments

HDDs for cloud, virtual and traditional storage environments

Storage I/O trends

Updated 1/23/2018

As the platters spin is a follow-up to a recent series of posts on Hard Disk Drives (HDD’s) along with some posts about How Many IOPS HDD’s can do.

HDD and storage trends and directions include among others

HDD’s will continue to be declared dead into the next decade, just as they have been for over a decade, meanwhile they are being enhanced, continued to be used in evolving roles.

hdd and ssd

SSD will continue to coexist with HDD, either as separate or converged HHDD’s. Where, where and how they are used will also continue to evolve. High IO (IOPS) or low latency activity will continue to move to some form of nand flash SSD (PCM around the corner), while storage capacity including some of which has been on tape stays on disk. Instead of more HDD capacity in a server, it moves to a SAN or NAS or to a cloud or service provider. This includes for backup/restore, BC, DR, archive and online reference or what some call active archives.

The need for storage spindle speed and more

The need for faster revolutions per minute (RPM’s) performance of drives (e.g. platter spin speed) is being replaced by SSD and more robust smaller form factor (SFF) drives. For example, some of today’s 2.5” SFF 10,000 RPM (e.g. 10K) SAS HDD’s can do as well or better than their larger 3.5” 15K predecessors can for both IOPS and bandwidth. This is also an example where the RPM speed of a drive may not be the only determination for performance as it has been in the past.


Performance comparison of four different drive types, click to view larger image.

The need for storage space capacity and areal density

In terms of storage enhancements, watch for the appearance of Shingled Magnetic Recording (SMR) enabled HDD’s to help further boost the space capacity in the same footprint. Using SMR HDD manufactures can put more bits (e.g. areal density) into the same physical space on a platter.


Traditional vs. SMR to increase storage areal density capacity

The generic idea with SMR is to increase areal density (how many bits can be safely stored per square inch) of data placed on spinning disk platter media. In the above image on the left is a representative example of how traditional magnetic disk media lays down tracks next to each other. With traditional magnetic recording approaches, the tracks are placed as close together as possible for the write heads to safely write data.

With new recording formats such as SMR along with improvements to read/write heads, the tracks can be more closely grouped together in an overlapping way. This overlapping way (used in a generic sense) is like how the shingles on a roof overlap, hence Shingled Magnetic Recording. Other magnetic recording or storage enhancements in the works include Heat Assisted Magnetic Recording (HAMR) and Helium filed drives. Thus, there is still plenty of bits and bytes room for growth in HDD’s well into the next decade to co-exist and complement SSD’s.

DIF and AF (Advanced Format), or software defining the drives

Another evolving storage feature that ties into HDD’s is Data Integrity Feature (DIF) that has a couple of different types. Depending on which type of DIF (0, 1, 2, and 3) is used; there can be added data integrity checks from the application to the storage medium or drive beyond normal functionality. Here is something to keep in mind, as there are different types or levels of DIF, when somebody says they support or need DIF, ask them which type or level as well as why.

Are you familiar with Advanced Format (AF)? If not you should be. Traditionally outside of special formats for some operating systems or controllers, that standard open system data storage block, page or sector has been 512 bytes. This has served well in the past, however; with the advent of TByte and larger sized drives, a new mechanism is needed. The need is to support both larger average data allocation sizes from operating systems and storage systems, as well as to cut the overhead of managing all the small sectors. Operating systems and file systems have added new partitioning features such as GUID Partition Table (GPT) to support 1TB and larger SSD, HDD and storage system LUN’s.

These enhancements are enabling larger devices to be used in place of traditional Master Boot Record (MBR) or other operating system partition and allocation schemes. The next step, however, is to teach operating systems, file systems, and hypervisors along with their associated tools or drives how to work with 4,096 byte or 4 Kbyte sectors. The advantage will be to cut the overhead of tracking all of those smaller sectors or file system extents and clusters. Today many HDD’s support AF however by default may have 512-byte emulation mode enabled due to lack of operating system or other support.

Intelligent Power Management, moving beyond drive spin down

Intelligent Power Management (IPM) is a collection of techniques that can be applied to vary the amount of energy consumed by a drive, controller or processor to do its work. These include in the case of an HDD slowing the spin rate of platters, however, keep in mind that mass in motion tends to stay in motion. This means that HDD’s once up and spinning do not need as much relative power as they function like a flywheel. Where their power draw comes in is during reading and write, in part to the movement of reading/write heads, however also for running the processors and electronics that control the device. Another big power consumer is when drives spin up, thus if they can be kept moving, however at a lower rate, along with disabling energy used by read/write heads and their electronics, you can see a drop in power consumption. Btw, a current generation 3.5” 4TB 6Gbs SATA HDD consumes about 6-7 watts of power while in active use, or less when in idle mode. Likewise a current generation high performance 2.5” 1.2TB HDD consumes about 4.8 watts of energy, a far cry from the 12-16 plus watts of energy some use as HDD fud.

Hybrid Hard Disk Drives (HHDD) and Solid State Hybrid Drives (SSDHD)

Hybrid HDD’s (HHDD’s) also known as Solid State Hybrid Drives (SSHD) have been around for a while and if you have read my earlier posts, you know that I have been a user and fan of them for several years. However one of the drawbacks of the HHDD’s has been lack of write acceleration, (e.g. they only optimize for reads) with some models. Current and emerging HDDD’s are appearing with a mix of nand flash SLC (used in earlier versions), MLC and eMLC along with DRAM while enabling write optimization. There are also more drive options available as HHDD’s from different manufactures both for desktop and enterprise class scenarios.

The challenge with HHDD’s is that many vendors either do not understand how they fit and compliment their tiering or storage management software tools or simply do not see the value proposition. I have had vendors and others tell me that the HHDD’s don’t make sense as they are too simple, how can they be a fit without requiring tiering software, controllers, SSD and HDD’s to be viable?

Storage I/O trends

I also see a trend similar to when the desktop high-capacity SATA drives appeared for enterprise-class storage systems in the early 2000s. Some of the same people did not see where or how a desktop class product or technology could ever be used in an enterprise solution.

Hmm, hey wait a minute, I seem to recall similar thinking when SCSI drives appeared in the early 90s, funny how some things do not change, DejaVu anybody?

Does that mean HHDD’s will be used everywhere?

Not necessarily, however, there will be places where they make sense, others where either an HDD or SSD will be more practical.

Networking with your server and storage

Drive native interfaces near-term will remain as 6Gbs (going to 12Gbs) SAS and SATA with some FC (you might still find a parallel SCSI drive out there). Likewise, with bridges or interface cards, those drives may appear as USB or something else.

What about SCSI over PCIe, will that catch on as a drive interface? Tough to say however I am sure we can find some people who will gladly try to convince you of that. FC based drives operating at 4Gbs FC (4GFC) are still being used for some environments however most activity is shifting over to SAS and SATA. SAS and SATA are switching over from 3Gbs to 6Gbs with 12Gbs SAS on the roadmaps.

So which drive is best for you?

That depends; do you need bandwidth or IOPS, low latency or high capacity, small low profile thin form factor or feature functions? Do you need a hybrid or all SSD or a self-encrypting device (SED) also known as Instant Secure Erase (ISE), these are among your various options.

Disk drives

Why the storage diversity?

Simple, some are legacy soon to be replaced and disposed of while others are newer. I also have a collection so to speak that get used for various testing, research, learning and trying things out. Click here and here to read about some of the ways I use various drives in my VMware environment including creating Raw Device Mapped (RDM) local SAS and SATA devices.

Other capabilities and functionality existing or being added to HDD’s include RAID and data copy assist; securely erase, self-encrypting, vibration dampening among other abilities for supporting dense data environments.

Where To Learn More

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Do not judge a drive only by its interface, space capacity, cost or RPM alone. Look under the cover a bit to see what is inside in terms of functionality, performance, and reliability among other options to fit your needs. After all, in the data center or information factory not everything is the same.

From a marketing and fun to talk about new technology perspective, HDD’s might be dead for some. The reality is that they are very much alive in physical, virtual and cloud environments, granted their role is changing.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Seagate provides proof of life: Enterprise HDD enhancements

Storage I/O trends

Proof of life: Enterprise Hard Disk Drives (HDD’s) are enhanced

Last week while hard disk drive (HDD) competitor Western Digital (WD) was announcing yet another (Velobit) in a string of acquisitions ( e.g. earlier included Stec, Arkeia) and investments (Skyera), Seagate announced new enterprise class HDD’s to their portfolio. Note that it was only two years ago that WD acquired Hitachi Global Storage Technologies (HGST) the disk drive manufacturing business of Hitachi Ltd. (not to be confused with HDS).

Seagate

Similar to WD expanding their presence in the growing nand flash SSD market, Seagate also in May of this year extended their existing enterprise class SSD portfolio. These enhancements included new drives with 12Gbs SAS interface, along with a partnership (and investment) with PCIe flash card startup vendor Virident. Other PCIe flash SSD card vendors (manufacturers and OEMs) include Cisco, Dell, EMC, FusionIO, HP, IBM, LSI, Micron, NetApp and Oracle among others.

These new Seagate enterprise class HDD’s are designed for use in cloud and traditional data center servers and storage systems. A month or two ago Seagate also announced new ultra-thin (5mm) client (aka desktop) class HDD’s along with a 3.5 inch 4TB video optimized HDD. The video optimized HDD’s are intended for Digital Video Recorders (DVR’s), Set Top Boxes (STB’s) or other similar applications.

What was announced?

Specifically what Seagate announced were two enterprise class drives, one for performance (e.g. 1.2TB 10K) and the other for space capacity (e.g. 4TB).

 

Enterprise High Performance 10K.7 (aka formerly known as Savio)

Enterprise Terascale (aka formerly known as constellation)

Class/category

Enterprise / High Performance

Enterprise High Capacity

Form factor

2.5” Small Form Factor (SFF)

3.5”

Interface

6Gbs SAS

6Gbs SATA

Space capacity

1,200GB (1.2TB)

4TB

RPM speed

10,000

5,900

Average seek

2.9 ms

12 ms

DRAM cache

64MB

64MB

Power idle / operating

4.8 watts

5.49 / 6.49 watts

Intelligent Power Management (IPM)

Yes – Seagate PowerChoice

Yes – Seagate PowerChoice

Warranty

Limited 5 years

Limited 3 years

Instant Secure Erase (ISE)

Yes

Optional

Other features

RAID Rebuild assist, Self-Encrypting Device (SED)

Advanced Format (AF) 4K block in addition to standard 512 byte sectors

Use cases

Replace earlier generation 3.5” 15K SAS and Fibre Channel HDD’s for higher performance applications including file systems, databases where SSD are not practical fit.

Backup and data protection, replication, copy operations for erasure coding and data dispersal, active in dormant archives, unstructured NAS, big data, data warehouse, cloud and object storage.

Note the Seagate Terascale has a disk rotation speed of 5,900 (5.9K RPM) which is not a typo given the more traditional 5.4K RPM drives. This slight increase in performance from 5.4K to 5.9K should give when combined with other enhancements (e.g. firmware, electronics) to boost performance for higher capacity workloads.

Let us watch for some performance numbers to be published by Seagate or others. Note that I have not had a chance to try these new drives yet, however look forward to getting my hands on them (among others) sometime in the future for a test drive to add to the growing list found here (hey Seagate and WD, that’s a hint ;) ).

What this all means?

Storage I/O trends

Wait, weren’t HDD’s supposed to be dead or dying?

Some people just like new and emerging things and thus will declare anything existing or that they have lost interest in (or their jobs need it) as old, boring or dead.

For example if you listen to some, they may say nand flash SSD are also dead or dying. For what it is worth, imho nand flash-based SSDs still have a bright future in front of them even with new technologies emerging as they will take time to mature (read more here or listen here).

However, the reality is that for at least the next decade, like them or not, HDD’s will continue to play a role that is also evolving. Thus, these and other improvements with HDD’s will be needed until current nand flash or emerging PCM (Phase Change Memory) among other forms of SSD are capable of picking up all the storage workloads in a cost-effective way.

Btw, yes, I am also a fan and user of nand flash-based SSD’s, in addition to HDD’s and see roles for both as being viable complementing each other for traditional, virtual and cloud environments.

In short, HDD’s will keep spinning (pun intended) for some time granted their roles and usage will also evolve similar to that of tape summit resources.

Storage I/O trends

With this announcement by Seagate along with other enhancements from WD show that the HDD will not only see its 60th birthday, (and here), it will probably also easily see its 70th and not from the comfort of a computer museum. The reason is that there is yet another wave of HDD improvements just around the corner including Shingled Magnetic Recording (SMR) (more info here) along with Heat Assisted Magnetic Recording (HAMR) among others. Watch for more on HAMR and SMR in future posts. With these and other enhancements, we should be able to see a return to the rapid density improvements with HDD’s observed during the mid to late 2000 era when Perpendicular recording became available.

What is up with this ISE stuff is that the same as what Xiotech (e.g. XIO) had?

Is this the same technology that Xiotech (now Xio) referred to the ISE the answer is no. This Seagate ISE is for fast secure erase of data on disk. The benefit of Instant Secure Erase (ISE) is to cut from hours or days the time required to erase a drive for secure disposal to seconds (or less). For those environments that already factor drives erase time as part of those overall costs, this can increase the useful time in service to help improve TCO and ROI.

Wait a minute, aren’t slower RPM’s supposed to be lower performance?

Some of you might be wondering or asking the question of wait, how can a 10,000 revolution per minute (10K RPM) HDD be considered fast vs. a 15K HDD, let alone SSD?

Storage I/O trends

There is a trend occurring with HDD’s that the old rules of IOPS or performance being tied directly to the size and rotational speed (RPM’s) of drives, along with their interfaces. This comes down to being careful to judge a book or in this case a drive by its cover. While RPM’s do have an impact on performance, new generation drives at 10K such as some 2.5” models are delivering performance equal to or better than earlier generation 3.5” 15K device’s.

Likewise, there are similar improvements with 5.4K devices vs. previous generation 7.2K models. As you will see in some of the results found here, not all the old rules of thumbs when it comes to drive performance are still valid. Likewise, keep those metrics that matter in the proper context.


Click on above image to see various performance results

For example as seen in the results (above), the more DRAM or DDR cache on the drives has a positive impact on sequential reads which can be good news if that is what your applications need. Thus, do your homework and avoid judging a device simply by its RPM, interface or form factor.

Other considerations, temperature and vibration

Another consideration is that with increased density of more drives being placed in a given amount of space, some of which may not have the best climate controls, humidity and vibration are concerns. Thus, the importance of drives having vibration dampening or safeguards to keep up performance are important. Likewise, even though drive heads and platters are sealed, there are also considerations that need to be taken care of for humidity in data center or cloud service providers in hot environments near the equator.

If this is not connecting with you, think about how close parts of Southeast Asia and the India subcontinent are to the equator along with the rapid growth and low-cost focus occurring there. Your data center might be temperature and humidity controlled, however others who very focused on cost cutting may not be as concerned with normal facilities best practices.

What type of drives should be used for cloud, virtual and traditional storage?

Good question and one where the answer should be it depends upon what you are trying or need to do (e.g. see previous posts here or here and here (via Seagate)).For example here are some tips for big data storage and storage making decisions in general.

Disclosure

Seagate recently invited me along with several other industry analysts to their cloud storage analyst summit in San Francisco where they covered roundtrip coach airfare, lodging, airport transfers and a nice dinner at the Epic Roast house.

hdd image

I also have received in the past a couple of Momentus XT HHDD (aka SSHD) from Seagate. These are in addition to those that I bought including various Seagate, WD along with HGST, Fujitsu, Toshiba and Samsung (SSD and HDD’s) that I use for various things.

Ok, nuff said (for now).

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Can we get a side of context with them IOPS server storage metrics?

Can we get a side of context with them server storage metrics?

Whats the best server storage I/O network metric or benchmark? It depends as there needs to be some context with them IOPS and other server storage I/O metrics that matter.

There is an old saying that the best I/O (Input/Output) is the one that you do not have to do.

In the meantime, let’s get a side of some context with them IOPS from vendors, marketers and their pundits who are tossing them around for server, storage and IO metrics that matter.

Expanding the conversation, the need for more context

The good news is that people are beginning to discuss storage beyond space capacity and cost per GByte, TByte or PByte for both DRAM or nand flash Solid State Devices (SSD), Hard Disk Drives (HDD) along with Hybrid HDD (HHDD) and Solid State Hybrid Drive (SSHD) based solutions. This applies to traditional enterprise or SMB IT data center with physical, virtual or cloud based infrastructures.

hdd and ssd iops

This is good because it expands the conversation beyond just cost for space capacity into other aspects including performance (IOPS, latency, bandwidth) for various workload scenarios along with availability, energy effective and management.

Adding a side of context

The catch is that IOPS while part of the equation are just one aspect of performance and by themselves without context, may have little meaning if not misleading in some situations.

Granted it can be entertaining, fun to talk about or simply make good press copy for a million IOPS. IOPS vary in size depending on the type of work being done, not to mention reads or writes, random and sequential which also have a bearing on data throughout or bandwidth (Mbytes per second) along with response time. Not to mention block, file, object or blob as well as table.

However, are those million IOP’s applicable to your environment or needs?

Likewise, what do those million or more IOPS represent about type of work being done? For example, are they small 64 byte or large 64 Kbyte sized, random or sequential, cached reads or lazy writes (deferred or buffered) on a SSD or HDD?

How about the response time or latency for achieving them IOPS?

In other words, what is the context of those metrics and why do they matter?

storage i/o iops
Click on image to view more metrics that matter including IOP’s for HDD and SSD’s

Metrics that matter give context for example IO sizes closer to what your real needs are, reads and writes, mixed workloads, random or sequential, sustained or bursty, in other words, real world reflective.

As with any benchmark take them with a grain (or more) of salt, they key is use them as an indicator then align to your needs. The tool or technology should work for you, not the other way around.

Here are some examples of context that can be added to help make IOP’s and other metrics matter:

  • What is the IOP size, are they 512 byte (or smaller) vs. 4K bytes (or larger)?
  • Are they reads, writes, random, sequential or mixed and what percentage?
  • How was the storage configured including RAID, replication, erasure or dispersal codes?
  • Then there is the latency or response time and IO queue depths for the given number of IOPS.
  • Let us not forget if the storage systems (and servers) were busy with other work or not.
  • If there is a cost per IOP, is that list price or discount (hint, if discount start negotiations from there)
  • What was the number of threads or workers, along with how many servers?
  • What tool was used, its configuration, as well as raw or cooked (aka file system) IO?
  • Was the IOP’s number with one worker or multiple workers on a single or multiple servers?
  • Did the IOP’s number come from a single storage system or total of multiple systems?
  • Fast storage needs fast serves and networks, what was their configuration?
  • Was the performance a short burst, or long sustained period?
  • What was the size of the test data used; did it all fit into cache?
  • Were short stroking for IOPS or long stroking for bandwidth techniques used?
  • Data footprint reduction (DFR) techniques (thin provisioned, compression or dedupe) used?
  • Were write data committed synchronously to storage, or deferred (aka lazy writes used)?

The above are just a sampling and not all may be relevant to your particular needs, however they help to put IOP’s into more contexts. Another consideration around IOPS are the configuration of the environment, from an actual running application using some measurement tool, or are they generated from a workload tool such as IOmeter, IOrate, VDbench among others.

Sure, there are more contexts and information that would be interesting as well, however learning to walk before running will help prevent falling down.

Storage I/O trends

Does size or age of vendors make a difference when it comes to context?

Some vendors are doing a good job of going for out of this world record-setting marketing hero numbers.

Meanwhile other vendors are doing a good job of adding context to their IOP or response time or bandwidth among other metrics that matter. There is a mix of startup and established that give context with their IOP’s or other metrics, likewise size or age does not seem to matter for those who lack context.

Some vendors may not offer metrics or information publicly, so fine, go under NDA to learn more and see if the results are applicable to your environments.

Likewise, if they do not want to provide the context, then ask some tough yet fair questions to decide if their solution is applicable for your needs.

Storage I/O trends

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

What this means is let us start putting and asking for metrics that matter such as IOP’s with context.

If you have a great IOP metric, if you want it to matter than include some context such as what size (e.g. 4K, 8K, 16K, 32K, etc.), percentage of reads vs. writes, latency or response time, random or sequential.

IMHO the most interesting or applicable metrics that matter are those relevant to your environment and application. For example if your main application that needs SSD does about 75% reads (random) and 25% writes (sequential) with an average size of 32K, while fun to hear about, how relevant is a million 64 byte read IOPS? Likewise when looking at IOPS, pay attention to the latency, particular if SSD or performance is your main concern.

Get in the habit of asking or telling vendors or their surrogates to provide some context with them metrics if you want them to matter.

So how about some context around them IOP’s (or latency and bandwidth or availability for that matter)?

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Part II: How many IOPS can a HDD HHDD SSD do with VMware?

How many IOPS can a HDD HHDD SSD do with VMware?

server storage data infrastructure i/o iop hdd ssd trends

Updated 2/10/2018

This is the second post of a two-part series looking at storage performance, specifically in the context of drive or device (e.g. mediums) characteristics of How many IOPS can a HDD HHDD SSD do with VMware. In the first post the focus was around putting some context around drive or device performance with the second part looking at some workload characteristics (e.g. benchmarks).

A common question is how many IOPS (IO Operations Per Second) can a storage device or system do?

The answer is or should be it depends.

Here are some examples to give you some more insight.

For example, the following shows how IOPS vary by changing the percent of reads, writes, random and sequential for a 4K (4,096 bytes or 4 KBytes) IO size with each test step (4 minutes each).

IO Size for test
Workload Pattern of test
Avg. Resp (R+W) ms
Avg. IOP Sec (R+W)
Bandwidth KB Sec (R+W)
4KB
100% Seq 100% Read
0.0
29,736
118,944
4KB
60% Seq 100% Read
4.2
236
947
4KB
30% Seq 100% Read
7.1
140
563
4KB
0% Seq 100% Read
10.0
100
400
4KB
100% Seq 60% Read
3.4
293
1,174
4KB
60% Seq 60% Read
7.2
138
554
4KB
30% Seq 60% Read
9.1
109
439
4KB
0% Seq 60% Read
10.9
91
366
4KB
100% Seq 30% Read
5.9
168
675
4KB
60% Seq 30% Read
9.1
109
439
4KB
30% Seq 30% Read
10.7
93
373
4KB
0% Seq 30% Read
11.5
86
346
4KB
100% Seq 0% Read
8.4
118
474
4KB
60% Seq 0% Read
13.0
76
307
4KB
30% Seq 0% Read
11.6
86
344
4KB
0% Seq 0% Read
12.1
82
330

Dell/Western Digital (WD) 1TB 7200 RPM SATA HDD (Raw IO) thread count 1 4K IO size

In the above example the drive is a 1TB 7200 RPM 3.5 inch Dell (Western Digital) 3Gb SATA device doing raw (non file system) IO. Note the high IOP rate with 100 percent sequential reads and a small IO size which might be a result of locality of reference due to drive level cache or buffering.

Some drives have larger buffers than others from a couple to 16MB (or more) of DRAM that can be used for read ahead caching. Note that this level of cache is independent of a storage system, RAID adapter or controller or other forms and levels of buffering.

Does this mean you can expect or plan on getting those levels of performance?

I would not make that assumption, and thus this serves as an example of using metrics like these in the proper context.

Building off of the previous example, the following is using the same drive however with a 16K IO size.

IO Size for test
Workload Pattern of test
Avg. Resp (R+W) ms
Avg. IOP Sec (R+W)
Bandwidth KB Sec (R+W)
16KB
100% Seq 100% Read
0.1
7,658
122,537
16KB
60% Seq 100% Read
4.7
210
3,370
16KB
30% Seq 100% Read
7.7
130
2,080
16KB
0% Seq 100% Read
10.1
98
1,580
16KB
100% Seq 60% Read
3.5
282
4,522
16KB
60% Seq 60% Read
7.7
130
2,090
16KB
30% Seq 60% Read
9.3
107
1,715
16KB
0% Seq 60% Read
11.1
90
1,443
16KB
100% Seq 30% Read
6.0
165
2,644
16KB
60% Seq 30% Read
9.2
109
1,745
16KB
30% Seq 30% Read
11.0
90
1,450
16KB
0% Seq 30% Read
11.7
85
1,364
16KB
100% Seq 0% Read
8.5
117
1,874
16KB
60% Seq 0% Read
10.9
92
1,472
16KB
30% Seq 0% Read
11.8
84
1,353
16KB
0% Seq 0% Read
12.2
81
1,310

Dell/Western Digital (WD) 1TB 7200 RPM SATA HDD (Raw IO) thread count 1 16K IO size

The previous two examples are excerpts of a series of workload simulation tests (ok, you can call them benchmarks) that I have done to collect information, as well as try some different things out.

The following is an example of the summary for each test output that includes the IO size, workload pattern (reads, writes, random, sequential), duration for each workload step, totals for reads and writes, along with averages including IOP’s, bandwidth and latency or response time.

disk iops

Want to see more numbers, speeds and feeds, check out the following table which will be updated with extra results as they become available.

Device
Vendor
Make

Model

Form Factor
Capacity
Interface
RPM Speed
Raw
Test Result
HDD
HGST
Desktop
HK250-160
2.5
160GB
SATA
5.4K
HDD
Seagate
Mobile
ST2000LM003
2.5
2TB
SATA
5.4K
HDD
Fujitsu
Desktop
MHWZ160BH
2.5
160GB
SATA
7.2K
HDD
Seagate
Momentus
ST9160823AS
2.5
160GB
SATA
7.2K
HDD
Seagate
MomentusXT
ST95005620AS
2.5
500GB
SATA
7.2K(1)
HDD
Seagate
Barracuda
ST3500320AS
3.5
500GB
SATA
7.2K
HDD
WD/Dell
Enterprise
WD1003FBYX
3.5
1TB
SATA
7.2K
HDD
Seagate
Barracuda
ST3000DM01
3.5
3TB
SATA
7.2K
HDD
Seagate
Desktop
ST4000DM000
3.5
4TB
SATA
HDD
HDD
Seagate
Capacity
ST6000NM00
3.5
6TB
SATA
HDD
HDD
Seagate
Capacity
ST6000NM00
3.5
6TB
12GSAS
HDD
HDD
Seagate
Savio 10K.3
ST9300603SS
2.5
300GB
SAS
10K
HDD
Seagate
Cheetah
ST3146855SS
3.5
146GB
SAS
15K
HDD
Seagate
Savio 15K.2
ST9146852SS
2.5
146GB
SAS
15K
HDD
Seagate
Ent. 15K
ST600MP0003
2.5
600GB
SAS
15K
SSHD
Seagate
Ent. Turbo
ST600MX0004
2.5
600GB
SAS
SSHD
SSD
Samsung
840 PRo
MZ-7PD256
2.5
256GB
SATA
SSD
HDD
Seagate
600 SSD
ST480HM000
2.5
480GB
SATA
SSD
SSD
Seagate
1200 SSD
ST400FM0073
2.5
400GB
12GSAS
SSD

Performance characteristics 1 worker (thread count) for RAW IO (non-file system)

Note: (1) Seagate Momentus XT is a Hybrid Hard Disk Drive (HHDD) based on a 7.2K 2.5 HDD with SLC nand flash integrated for read buffer in addition to normal DRAM buffer. This model is a XT I (4GB SLC nand flash), may add an XT II (8GB SLC nand flash) at some future time.

As a starting point, these results are raw IO with file system based information to be added soon along with more devices. These results are for tests with one worker or thread count, other results will be added with such as 16 workers or thread counts to show how those differ.

The above results include all reads, all writes, mix of reads and writes, along with all random, sequential and mixed for each IO size. IO sizes include 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, 1024K and 2048K. As with any workload simulation, benchmark or comparison test, take these results with a grain of salt as your mileage can and will vary. For example you will see some what I consider very high IO rates with sequential reads even without file system buffering. These results might be due to locality of reference of IO’s being resolved out of the drives DRAM cache (read ahead) which vary in size for different devices. Use the vendor model numbers in the table above to check the manufactures specs on drive DRAM and other attributes.

If you are used to seeing 4K or 8K and wonder why anybody would be interested in some of the larger sizes take a look at big fast data or cloud and object storage. For some of those applications 2048K may not seem all that big. Likewise if you are used to the larger sizes, there are still applications doing smaller sizes. Sorry for those who like 512 byte or smaller IO’s as they are not included. Note that for all of these unless indicated a 512 byte standard sector or drive format is used as opposed to emerging Advanced Format (AF) 4KB sector or block size. Watch for some more drive and device types to be added to the above, along with results for more workers or thread counts, along with file system and other scenarios.

Using VMware as part of a Server, Storage and IO (aka StorageIO) test platform

vmware vexpert

The above performance results were generated on Ubuntu 12.04 (since upgraded to 14.04 which was hosted on a VMware vSphere 5.1 (upgraded to 5.5U2) purchased version (you can get the ESXi free version here) with vCenter enabled system. I also have VMware workstation installed on some of my Windows-based laptops for doing preliminary testing of scripts and other activity prior to running them on the larger server-based VMware environment. Other VMware tools include vCenter Converter, vSphere Client and CLI. Note that other guest virtual machines (VMs) were idle during the tests (e.g. other guest VMs were quiet). You may experience different results if you ran Ubuntu native on a physical machine or with different adapters, processors and device configurations among many other variables (that was a disclaimer btw ;) ).

Storage I/O trends

All of the devices (HDD, HHDD, SSD’s including those not shown or published yet) were Raw Device Mapped (RDM) to the Ubuntu VM bypassing VMware file system.

Example of creating an RDM for local SAS or SATA direct attached device.

vmkfstools -z /vmfs/devices/disks/naa.600605b0005f125018e923064cc17e7c /vmfs/volumes/dat1/RDM_ST1500Z110S6M5.vmdk

The above uses the drives address (find by doing a ls -l /dev/disks via VMware shell command line) to then create a vmdk container stored in a dat. Note that the RDM being created does not actually store data in the .vmdk, it’s there for VMware management operations.

If you are not familiar with how to create a RDM of a local SAS or SATA device, check out this post to learn how.This is important to note in that while VMware was used as a platform to support the guest operating systems (e.g. Ubuntu or Windows), the real devices are not being mapped through or via VMware virtual drives.

vmware iops

The above shows examples of RDM SAS and SATA devices along with other VMware devices and dats. In the next figure is an example of a workload being run in the test environment.

vmware iops

One of the advantages of using VMware (or other hypervisor) with RDM’s is that I can quickly define via software commands where a device gets attached to different operating systems (e.g. the other aspect of software defined storage). This means that after a test run, I can quickly simply shutdown Ubuntu, remove the RDM device from that guests settings, move the device just tested to a Windows guest if needed and restart those VMs. All of that from where ever I happen to be working from without physically changing things or dealing with multi-boot or cabling issues.

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

So how many IOPs can a device do?

That depends, however have a look at the above information and results.

Check back from time to time here to see what is new or has been added including more drives, devices and other related themes.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

How many I/O iops can flash SSD or HDD do?

How many i/o iops can flash ssd or hdd do with vmware?

sddc data infrastructure Storage I/O ssd trends

Updated 2/10/2018

A common question I run across is how many I/O iopsS can flash SSD or HDD storage device or system do or give.

The answer is or should be it depends.

This is the first of a two-part series looking at storage performance, and in context specifically around drive or device (e.g. mediums) characteristics across HDD, HHDD and SSD that can be found in cloud, virtual, and legacy environments. In this first part the focus is around putting some context around drive or device performance with the second part looking at some workload characteristics (e.g. benchmarks).

What about cloud, tape summit resources, storage systems or appliance?

Lets leave those for a different discussion at another time.

Getting started

Part of my interest in tools, metrics that matter, measurements, analyst, forecasting ties back to having been a server, storage and IO performance and capacity planning analyst when I worked in IT. Another aspect ties back to also having been a sys admin as well as business applications developer when on the IT customer side of things. This was followed by switching over to the vendor world involved with among other things competitive positioning, customer design configuration, validation, simulation and benchmarking HDD and SSD based solutions (e.g. life before becoming an analyst and advisory consultant).

Btw, if you happen to be interested in learn more about server, storage and IO performance and capacity planning, check out my first book Resilient Storage Networks (Elsevier) that has a bit of information on it. There is also coverage of metrics and planning in my two other books The Green and Virtual Data Center (CRC Press) and Cloud and Virtual Data Storage Networking (CRC Press). I have some copies of Resilient Storage Networks available at a special reader or viewer rate (essentially shipping and handling). If interested drop me a note and can fill you in on the details.

There are many rules of thumb (RUT) when it comes to metrics that matter such as IOPS, some that are older while others may be guess or measured in different ways. However the answer is that it depends on many things ranging from if a standalone hard disk drive (HDD), Hybrid HDD (HHDD), Solid State Device (SSD) or if attached to a storage system, appliance, or RAID adapter card among others.

Taking a step back, the big picture

hdd image
Various HDD, HHDD and SSD’s

Server, storage and I/O performance and benchmark fundamentals

Even if just looking at a HDD, there are many variables ranging from the rotational speed or Revolutions Per Minute (RPM), interface including 1.5Gb, 3.0Gb, 6Gb or 12Gb SAS or SATA or 4Gb Fibre Channel. If simply using a RUT or number based on RPM can cause issues particular with 2.5 vs. 3.5 or enterprise and desktop. For example, some current generation 10K 2.5 HDD can deliver the same or better performance than an older generation 3.5 15K. Other drive factors (see this link for HDD fundamentals) including physical size such as 3.5 inch or 2.5 inch small form factor (SFF), enterprise or desktop or consumer, amount of drive level cache (DRAM). Space capacity of a drive can also have an impact such as if all or just a portion of a large or small capacity devices is used. Not to mention what the drive is attached to ranging from in internal SAS or SATA drive bay, USB port, or a HBA or RAID adapter card or in a storage system.

disk iops
HDD fundamentals

How about benchmark and performance for marketing or comparison tricks including delayed, deferred or asynchronous writes vs. synchronous or actually committed data to devices? Lets not forget about short stroking (only using a portion of a drive for better IOP’s) or even long stroking (to get better bandwidth leveraging spiral transfers) among others.

Almost forgot, there are also thick, standard, thin and ultra thin drives in 2.5 and 3.5 inch form factors. What’s the difference? The number of platters and read write heads. Look at the following image showing various thickness 2.5 inch drives that have various numbers of platters to increase space capacity in a given density. Want to take a wild guess as to which one has the most space capacity in a given footprint? Also want to guess which type I use for removable disk based archives along with for onsite disk based backup targets (compliments my offsite cloud backups)?

types of disks
Thick, thin and ultra thin devices

Beyond physical and configuration items, then there are logical configuration including the type of workload, large or small IOPS, random, sequential, reads, writes or mixed (various random, sequential, read, write, large and small IO). Other considerations include file system or raw device, number of workers or concurrent IO threads, size of the target storage space area to decide impact of any locality of reference or buffering. Some other items include how long the test or workload simulation ran for, was the device new or worn in before use among other items.

Tools and the performance toolbox

Then there are the various tools for generating IO’s or workloads along with recording metrics such as reads, writes, response time and other information. Some examples (mix of free or for fee) include Bonnie, Iometer, Iorate, IOzone, Vdbench, TPC, SPC, Microsoft ESRP, SPEC and netmist, Swifttest, Vmark, DVDstore and PCmark 7 among many others. Some are focused just on the storage system and IO path while others are application specific thus exercising servers, storage and IO paths.

performance tools
Server, storage and IO performance toolbox

Having used Iometer since the late 90s, it has its place and is popular given its ease of use. Iometer is also long in the tooth and has its limits including not much if any new development, never the less, I have it in the toolbox. I also have Futremark PCmark 7 (full version) which turns out has some interesting abilities to do more than exercise an entire Windows PC. For example PCmark can use a secondary drive for doing IO to.

PCmark can be handy for spinning up with VMware (or other tools) lots of virtual Windows systems pointing to a NAS or other shared storage device doing real world type activity. Something that could be handy for testing or stressing virtual desktop infrastructures (VDI) along with other storage systems, servers and solutions. I also have Vdbench among others tools in the toolbox including Iorate which was used to drive the workloads shown below.

What I look for in a tool are how extensible are the scripting capabilities to define various workloads along with capabilities of the test engine. A nice GUI is handy which makes Iometer popular and yes there are script capabilities with Iometer. That is also where Iometer is long in the tooth compared to some of the newer generation of tools that have more emphasis on extensibility vs. ease of use interfaces. This also assumes knowing what workloads to generate vs. simply kicking off some IOPs using default settings to see what happens.

Another handy tool is for recording what’s going on with a running system including IO’s, reads, writes, bandwidth or transfers, random and sequential among other things. This is where when needed I turn to something like HiMon from HyperIO, if you have not tried it, get in touch with Tom West over at HyperIO and tell him StorageIO sent you to get a demo or trial. HiMon is what I used for doing start, stop and boot among other testing being able to see IO’s at the Windows file system level (or below) including very early in the boot or shutdown phase.

Here is a link to some other things I did awhile back with HiMon to profile some Windows and VDI activity test profiling.

What’s the best tool or benchmark or workload generator?

The one that meets your needs, usually your applications or something as close as possible to it.

disk iops
Various 2.5 and 3.5 inch HDD, HHDD, SSD with different performance

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

That depends, however continue reading part II of this series to see some results for various types of drives and workloads.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Conversation with Justin Stottlemyer of Shutterfly and object storage discussion

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode from SNW Spring 2013 in Orlando Florida, Bruce Ravid (@BruceRave) and me visit with Justin Stottlemyer (@JHStott) who is a Fellow and Storage Architect at Shutterfly.

Shutterfly image via shutterfly.com

Our conversation centers on how Justin and Shutterfly maximize their return on innovation (the new ROI) by using object storage along with other technology and techniques to create a resilient, scalable flexible data infrastructure.

Justin was at SNW presenting on overcoming object integration at Shutterfly where their data infrastructure consists of 80PB of storage to house over 30PB of user content data that continues to grow.

Example of how we have used Shutterfly to create photo books from vacations

For those not familiar, Shutterfly providers customers with free unlimited storage of their photos which can then be printed in coffee table type books such as the one shown in the above figure. My wife has used Shutterfly a few times to create photo books such as the one shown above in the image.

As you will hear Justin explain in the pod cast, photos get uploaded and ingested into their environment and then available for printing.

In addition to talking about object storage, private clouds, business continuance (BC) and disaster recovery, other topics include performance and capacity planning, maximizing return on innovation in addition to return on investment among other items.
Varies and managed by user interface

Listen in to hear how Justin and Shutterfly are currently managing 80PB of storage with over 30PB of user data that continues to grow.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Justin and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode from SNW Spring 2013 with Justin Stottlemyer of Shutterfly.

Speaking of cloud and object storage, check out www.objectstoragecenter.com to view more related material.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved