Which Enterprise HDD for Content Applications Different File Size Impact

Which HDD for Content Applications Different File Size Impact

Different File Size Impact server storage I/O trends

Updated 1/23/2018

Which enterprise HDD to use with a content server platform different file size impact.

Insight for effective server storage I/O decision making
Server StorageIO Lab Review

Which enterprise HDD to use for content servers

This is the fifth in a multi-part series (read part four here) based on a white paper hands-on lab report I did compliments of Servers Direct and Seagate that you can read in PDF form here. The focus is looking at the Servers Direct (www.serversdirect.com) converged Content Solution platforms with Seagate Enterprise Hard Disk Drive (HDD’s). In this post the focus looks at large and small file I/O processing.

File Performance Activity

Tip, Content solutions use files in various ways. Use the following to gain perspective how various HDD’s handle workloads similar to your specific needs.

Two separate file processing workloads were run (12), one with a relative small number of large files, and another with a large number of small files. For the large file processing (table-3), 5 GByte sized files were created and then accessed via 128 Kbyte (128KB) sized I/O over a 10 hour period with 90% read using 64 threads (workers). Large file workload simulates what might be seen with higher definition video, image or other content streaming.

(Note 12) File processing workloads were run using Vdbench 5.04 and file anchors with sample script configuration below. Instead of vdbench you could also use other tools such as sysbench or fio among others.

VdbenchFSBigTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=5,files=20,size=5G
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=128k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSBigTest.txt -m 16 -o Results_FSbig_H_060615

VdbenchFSSmallTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=64,files=25600,size=16k
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=1k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSSmallTest.txt -m 16 -o Results_FSsmall_H_060615

The 10% writes are intended to reflect some update activity for new content or other changes to content. Note that 128KB per second translates to roughly 1 Gbps streaming content such as higher definition video. However 4K video (not optimized) would require a higher speed as well as resulting in larger file sizes. Table-3 shows the performance during the large file access period showing average read /write rates and response time, bandwidth (MBps), average open and close rates with response time.

Avg. File Read Rate

Avg. Read Resp. Time
Sec.

Avg. File Write Rate

Avg. Write Resp. Time
Sec.

Avg.
CPU %
Total

Avg. CPU % System

Avg. MBps
Read

Avg. MBps
Write

ENT 15K R1

580.7

107.9

64.5

19.7

52.2

35.5

72.6

8.1

ENT 10K R1

455.4

135.5

50.6

44.6

34.0

22.7

56.9

6.3

ENT CAP R1

285.5

221.9

31.8

19.0

43.9

28.3

37.7

4.0

ENT 10K R10

690.9

87.21

76.8

48.6

35.0

21.8

86.4

9.6

Table-3 Performance summary for large file access operations (90% read)

Table-3 shows that for two-drive RAID 1, the Enterprise 15K are the fastest performance, however using a RAID 10 with four 10K HDD’s with enhanced cache features provide a good price, performance and space capacity option. Software RAID was used in this workload test.

Figure-4 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

large file processing
Figure-4 Large file processing 90% read, 10% write rate and response time

In figure-4 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K HDD’s).

Results in figure-4 above and table-4 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-4 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-4 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

Avg.
File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

580.7

$1.02

$2.05

$ 0.99

$0.99

$1,190

100%

$2.1

107.9

ENT 10K R1

455.5

1.92

3.84

0.49

0.49

1,750

100%

3.8

135.5

ENT CAP R1

285.5

1.40

2.80

0.20

0.20

798

100%

2.8

271.9

ENT 10K R10

690.9

1.27

5.07

0.49

0.97

3,500

100%

5.1

87.2

Table-4 Performance, capacity and cost analysis for big file processing

Small File Size Processing

To simulate a general file sharing environment, or content streaming with many smaller objects, 1,638,464 16KB sized files were created on each device being tested (table-5). These files were spread across 64 directories (25,600 files each) and accessed via 64 threads (workers) doing 90% reads with a 1KB I/O size over a ten hour time frame. Like the large file test, and database activity, all workloads were run at the same time (e.g. test devices were concurrently busy).

Avg. File Read Rate

Avg. Read Resp. Time
Sec.

Avg. File Write Rate

Avg. Write Resp. Time
Sec.

Avg.
CPU %
Total

Avg. CPU % System

Avg. MBps
Read

Avg. MBps
Write

ENT 15K R1

3,415.7

1.5

379.4

132.2

24.9

19.5

3.3

0.4

ENT 10K R1

2,203.4

2.9

244.7

172.8

24.7

19.3

2.2

0.2

ENT CAP R1

1,063.1

12.7

118.1

303.3

24.6

19.2

1.1

0.1

ENT 10K R10

4,590.5

0.7

509.9

101.7

27.7

22.1

4.5

0.5

Table-5 Performance summary for small sized (16KB) file access operations (90% read)

Figure-5 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

small file processing
Figure-5 Small file processing 90% read, 10% write rate and response time

In figure-5 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K RPM), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K RPM HDD’s) that has higher performance and capacity along with costs (table-5).

Results in figure-5 above and table-5 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-6 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-6 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

Avg.
File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

3,415.7

$0.17

$0.35

$0.99

$0.99

$1,190

100%

$0.35

1.51

ENT 10K R1

2,203.4

0.40

0.79

0.49

0.49

1,750

100%

0.79

2.90

ENT CAP R1

1,063.1

0.38

0.75

0.20

0.20

798

100%

0.75

12.70

ENT 10K R10

4,590.5

0.19

0.76

0.49

0.97

3,500

100%

0.76

0.70

Table-6 Performance, capacity and cost analysis for small file processing

Looking at the small file processing analysis in table-5 shows that the 15K HDD’s on an apples to apples basis (e.g. same RAID level and number of drives) provide the best performance. However when also factoring in space capacity, performance, different RAID level or other protection schemes along with cost, there are other considerations. On the other hand the Enterprise Capacity 2TB HDD’s have a low cost per capacity, however do not have the performance of other options, assuming your applications need more performance.

Thus the right HDD for one application may not be the best one for a different scenario as well as multiple metrics as shown in table-5 need to be included in an informed storage decision making process.

Where To Learn More

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

File processing are common content applications tasks, some being small, others large or mixed as well as reads and writes. Even if your content environment is using object storage, chances are unless it is a new applications or a gateway exists, you may be using NAS or file based access. Thus the importance of if your applications are doing file based processing, either run your own applications or use tools that can simulate as close as possible to what your environment is doing.

Continue reading part six in this multi-part series here where the focus is around general I/O including 8KB and 128KB sized IOPs along with associated metrics.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

HDDs evolve for Content Application servers

HDDs evolve for Content Application servers

hdds evolve server storage I/O trends

Updated 1/23/2018

Enterprise HDDs evolve for content server platform

Insight for effective server storage I/O decision making
Server StorageIO Lab Review

Which enterprise HDD to use for content servers

This is the seventh and final post in this multi-part series (read part six here) based on a white paper hands-on lab report I did compliments of Servers Direct and Seagate that you can read in PDF form here. The focus is looking at the Servers Direct (www.serversdirect.com) converged Content Solution platforms with Seagate Enterprise Hard Disk Drive (HDD’s). The focus of this post is comparing how HDD continue to evolve over various generations boosting performance as well as capacity and reliability. This also looks at how there is more to HDD performance than the traditional focus on Revolutions Per Minute (RPM) as a speed indicator.

Comparing Different Enterprise 10K And 15K HDD Generations

There is more to HDD performance than RPM speed of the device. RPM plays an important role, however there are other things that impact HDD performance. A common myth is that HDD’s have not improved on performance over the past several years with each successive generation. Table-10 shows a sampling of various generations of enterprise 10K and 15K HDD’s (14) including different form factors and how their performance continues to improve.

different 10K and 15K HDDs
Figure-9 10K and 15K HDD performance improvements

Figure-9 shows how performance continues to improve with 10K and 15K HDD’s with each new generation including those with enhanced cache features. The result is that with improvements in cache software within the drives, along with enhanced persistent non-volatile memory (NVM) and incremental mechanical drive improvements, both read and write performance continues to be enhanced.

Figure-9 puts into perspective the continued performance enhancements of HDD’s comparing various enterprise 10K and 15K devices. The workload is the same TPC-C tests used earlier in a similar (14) (with no RAID). 100 simulated users are shown in figure-9 accessing a database on each of the different drives all running concurrently. The older 15K 3.5” Cheetah and 2.5” Savio used had a capacity of 146GB which used a database scale factor of 1500 or 134GB. All other drives used a scale factor 3000 or 276GB. Figure-9 also highlights the improvements in both TPS performance as well as lower response time with new HDD’s including those with performance enhanced cache feature.

The workloads run are same as the TPC-C ones shown earlier, however these drives were not configured with any RAID. The TPC-C activity used Benchmark Factory with similar setup and configuration to those used earlier including on a multi-socket, multi-core Windows 2012 R2 server supporting a Microsoft SQL Server 2012 database with a database for each drive type.

ENT 10K V3 2.5"

ENT (Cheetah) 15K 3.5"

Users

1

20

50

100

Users

1

20

50

100

TPS (TPC-C)

14.8

50.9

30.3

39.9

TPS (TPC-C)

14.6

51.3

27.1

39.3

Resp. Time (Sec.)

0.0

0.4

1.6

1.7

Resp. Time (Sec.)

0.0

0.3

1.8

2.1

ENT 10K 2.5" (with cache)

ENT (Savio) 15K 2.5"

Users

1

20

50

100

Users

1

20

50

100

TPS (TPC-C)

19.2

146.3

72.6

71.0

TPS (TPC-C)

15.8

59.1

40.2

53.6

Resp. Time (Sec.)

0.0

0.1

0.7

0.0

Resp. Time (Sec.)

0.0

0.3

1.2

1.2

ENT 15K V4 2.5"

Users

1

20

50

100

TPS (TPC-C)

19.7

119.8

75.3

69.2

Resp. Time (Sec.)

0.0

0.1

0.6

1.0

ENT 15K (enhanced cache) 2.5"

Users

1

20

50

100

TPS (TPC-C)

20.1

184.1

113.7

122.1

Resp. Time (Sec.)

0.0

0.1

0.4

0.2

Table-10 Continued Enterprise 10K and 15K HDD performance improvements

(Note 14) 10K and 15K generational comparisons were run on a separate comparable server to what was used for other test workloads. Workload configuration settings were the same as other database workloads including using Microsoft SQL Server 2012 on a Windows 2012 R2 system with Benchmark Factory driving the workload. Database memory sized was reduced however to only 8GB vs. 16GB used in other tests.

Where To Learn More

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

A little bit of flash in the right place with applicable algorithms goes a long way, an example being the Seagate Enterprise HDD’s with enhanced cache feature. Likewise, HDD’s are very much alive complementing SSD and vice versa. For high-performance content application workloads flash SSD solutions including NVMe, 12Gbps SAS and 6Gbps SATA devices are cost effective solutions. HDD’s continue to be cost-effective data storage devices for both capacity, as well as environments that do not need the performance of flash SSD.

For some environments using a combination of flash and HDD’s complementing each other along with cache software can be a cost-effective solution. The previous workload examples provide insight for making cost-effective informed storage decisions.

Evaluate today’s HDD’s on their effective performance running workloads as close as similar to your own, or, actually try them out with your applications. Today there is more to HDD performance than just RPM speed, particular with the Seagate Enterprise Performance 10K and 15K HDD’s with enhanced caching feature.

However the Enterprise Performance 10K with enhanced cache feature provides a good balance of capacity, performance while being cost-effective. If you are using older 3.5” 15K or even previous generation 2.5” 15K RPM and “non-performance enhanced” HDD’s, take a look at how the newer generation HDD’s perform, looking beyond the RPM of the device.

Fast content applications need fast content and flexible content solution platforms such as those from Servers Direct and HDD’s from Seagate. Key to a successful content application deployment is having the flexibility to hardware define and software defined the platform to meet your needs. Just as there are many different types of content applications along with diverse environments, content solution platforms need to be flexible, scalable and robust, not to mention cost effective.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Big Files Lots of Little File Processing Benchmarking with Vdbench

Big Files Lots of Little File Processing Benchmarking with Vdbench


server storage data infrastructure i/o File Processing Benchmarking with Vdbench

Updated 2/10/2018

Need to test a server, storage I/O networking, hardware, software, services, cloud, virtual, physical or other environment that is either doing some form of file processing, or, that you simply want to have some extra workload running in the background for what ever reason? An option is File Processing Benchmarking with Vdbench.

I/O performance

Getting Started


Here’s a quick and relatively easy way to do it with Vdbench (Free from Oracle). Granted there are other tools, both for free and for fee that can similar things, however we will leave those for another day and post. Here’s the con to this approach, there is no Uui Gui like what you have available with some other tools Here’s the pro to this approach, its free, flexible and limited by your creative, amount of storage space, server memory and I/O capacity.

If you need a background on Vdbench and benchmarking, check out the series of related posts here (e.g. www.storageio.com/performance).

Get and Install the Vdbench Bits and Bytes


If you do not already have Vdbench installed, get a copy from the Oracle or Source Forge site (now points to Oracle here).

Vdbench is free, you simply sign-up and accept the free license, select the version down load (it is a single, common distribution for all OS) the bits as well as documentation.

Installation particular on Windows is really easy, basically follow the instructions in the documentation by copying the contents of the download folder to a specified directory, set up any environment variables, and make sure that you have Java installed.

Here is a hint and tip for Windows Servers, if you get an error message about counters, open a command prompt with Administrator rights, and type the command:

$ lodctr /r


The above command will reset your I/O counters. Note however that command will also overwrite counters if enabled so only use it if you have to.

Likewise *nix install is also easy, copy the files, make sure to copy the applicable *nix shell script (they are in the download folder), and verify Java is installed and working.

You can do a vdbench -t (windows) or ./vdbench -t (*nix) to verify that it is working.

Vdbench File Processing

There are many options with Vdbench as it has a very robust command and scripting language including ability to set up for loops among other things. We are only going to touch the surface here using its file processing capabilities. Likewise, Vdbench can run from a single server accessing multiple storage systems or file systems, as well as running from multiple servers to a single file system. For simplicity, we will stick with the basics in the following examples to exercise a local file system. The limits on the number of files and file size are limited by server memory and storage space.

You can specify number and depth of directories to put files into for processing. One of the parameters is the anchor point for the file processing, in the following examples =S:\SIOTEMP\FS1 is used as the anchor point. Other parameters include the I/O size, percent reads, number of threads, run time and sample interval as well as output folder name for the result files. Note that unlike some tools, Vdbench does not create a single file of results, rather a folder with several files including summary, totals, parameters, histograms, CSV among others.


Simple Vdbench File Processing Commands

For flexibility and ease of use I put the following three Vdbench commands into a simple text file that is then called with parameters on the command line.
fsd=fsd1,anchor=!fanchor,depth=!dirdep,width=!dirwid,files=!numfiles,size=!filesize

fwd=fwd1,fsd=fsd1,rdpct=!filrdpct,xfersize=!fxfersize,fileselect=random,fileio=random,threads=!thrds

rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=!etime,interval=!itime

Simple Vdbench script

# SIO_vdbench_filesystest.txt
#
# Example Vdbench script for file processing
#
# fanchor = file system place where directories and files will be created
# dirwid = how wide should the directories be (e.g. how many directories wide)
# numfiles = how many files per directory
# filesize = size in in k, m, g e.g. 16k = 16KBytes
# fxfersize = file I/O transfer size in kbytes
# thrds = how many threads or workers
# etime = how long to run in minutes (m) or hours (h)
# itime = interval sample time e.g. 30 seconds
# dirdep = how deep the directory tree
# filrdpct = percent of reads e.g. 90 = 90 percent reads
# -p processnumber = optional specify a process number, only needed if running multiple vdbenchs at same time, number should be unique
# -o output file that describes what being done and some config info
#
# Sample command line shown for Windows, for *nix add ./
#
# The real Vdbench script with command line parameters indicated by !=
#

fsd=fsd1,anchor=!fanchor,depth=!dirdep,width=!dirwid,files=!numfiles,size=!filesize

fwd=fwd1,fsd=fsd1,rdpct=!filrdpct,xfersize=!fxfersize,fileselect=random,fileio=random,threads=!thrds

rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=!etime,interval=!itime

Big Files Processing Script


With the above script file defined, for Big Files I specify a command line such as the following.
$ vdbench -f SIO_vdbench_filesystest.txt fanchor=S:\SIOTemp\FS1 dirwid=1 numfiles=60 filesize=5G fxfersize=128k thrds=64 etime=10h itime=30 numdir=1 dirdep=1 filrdpct=90 -p 5576 -o SIOWS2012R220_NOFUZE_5Gx60_BigFiles_64TH_STX1200_020116

Big Files Processing Example Results


The following is one of the result files from the folder of results created via the above command for Big File processing showing totals.


Run totals

21:09:36.001 Starting RD=format_for_rd1

Feb 01, 2016 .Interval. .ReqstdOps.. ...cpu%... read ....read.... ...write.... ..mb/sec... mb/sec .xfer.. ...mkdir... ...rmdir... ..create... ...open.... ...close... ..delete...
rate resp total sys pct rate resp rate resp read write total size rate resp rate resp rate resp rate resp rate resp rate resp
21:23:34.101 avg_2-28 2848.2 2.70 8.8 8.32 0.0 0.0 0.00 2848.2 2.70 0.00 356.0 356.02 131071 0.0 0.00 0.0 0.00 0.1 109176 0.1 0.55 0.1 2006 0.0 0.00

21:23:35.009 Starting RD=rd1; elapsed=36000; fwdrate=max. For loops: None

07:23:35.000 avg_2-1200 4939.5 1.62 18.5 17.3 90.0 4445.8 1.79 493.7 0.07 555.7 61.72 617.44 131071 0.0 0.00 0.0 0.00 0.0 0.00 0.1 0.03 0.1 2.95 0.0 0.00


Lots of Little Files Processing Script


For lots of little files, the following is used.


$ vdbench -f SIO_vdbench_filesystest.txt fanchor=S:\SIOTEMP\FS1 dirwid=64 numfiles=25600 filesize=16k fxfersize=1k thrds=64 etime=10h itime=30 dirdep=1 filrdpct=90 -p 5576 -o SIOWS2012R220_NOFUZE_SmallFiles_64TH_STX1200_020116

Lots of Little Files Processing Example Results


The following is one of the result files from the folder of results created via the above command for Big File processing showing totals.
Run totals

09:17:38.001 Starting RD=format_for_rd1

Feb 02, 2016 .Interval. .ReqstdOps.. ...cpu%... read ....read.... ...write.... ..mb/sec... mb/sec .xfer.. ...mkdir... ...rmdir... ..create... ...open.... ...close... ..delete...
rate resp total sys pct rate resp rate resp read write total size rate resp rate resp rate resp rate resp rate resp rate resp
09:19:48.016 avg_2-5 10138 0.14 75.7 64.6 0.0 0.0 0.00 10138 0.14 0.00 158.4 158.42 16384 0.0 0.00 0.0 0.00 10138 0.65 10138 0.43 10138 0.05 0.0 0.00

09:19:49.000 Starting RD=rd1; elapsed=36000; fwdrate=max. For loops: None

19:19:49.001 avg_2-1200 113049 0.41 67.0 55.0 90.0 101747 0.19 11302 2.42 99.36 11.04 110.40 1023 0.0 0.00 0.0 0.00 0.0 0.00 7065 0.85 7065 1.60 0.0 0.00


Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

The above examples can easily be modified to do different things particular if you read the Vdbench documentation on how to setup multi-host, multi-storage system, multiple job streams to do different types of processing. This means you can benchmark a storage systems, server or converged and hyper-converged platform, or simply put a workload on it as part of other testing. There are even options for handling data footprint reduction such as compression and dedupe.

Ok, nuff said, for now.

Gs

Greg Schulz - Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Water, Data and Storage Analogy

Water, Data and Storage Analogy

server storage I/O trends

Recently I did a piece over at InfoStor titled "Water, Data and Storage Analogy". Besides being taken for granted and all of us being dependent on them, several other similarities exist between water, data, and storage. In addition to being a link that piece, this is a companion with some different images to help show the similarities between water, data and storage if for no other reason to have a few moments of fun. Read the entire piece here.

Water, Data and Storage Similarities

Water can get cold and freeze, data can also go cold becoming dormant and a candidate for archiving or cold cloud storage.

Like data and storage water can be frozen
Like data and storage water can be frozen

Various types of storage devices
Various types of storage drives (HDD & SSD)

different tiers of frozen water storage containers
Different types and tiers of frozen water storage containers

Data, like water, can move or be dormant, can be warm and active, or cold, frozen and inactive. Water, data and storage can also be used for work or fun.

Kyak fishing
Fishing on water vs. phishing for data on storage

Eagle fly fishing on st croix river
Eagle fly fishing on water over st croix river

Data can be transformed into 3D images and video, water transformed into Snow can also be made into various virtual images or things.

Data on storage can be transformed like water
Data on storage can be transformed like water (e.g. snow)

Data, like water, can exist in clouds, resulting in storms that if not properly prepared for, can cause problems.

Data and storage can be damaged including by water, water can also be damaged by putting things into it or the environment.

Water can destroy things, data and storage can be destroyed
Water can destroy things, data and storage can be destroyed

There are data lakes, data pools, data ponds, oceans of storage and seas of data as well as data centers.

inside a data center
Rows of servers and storage in a data center

An indoor water lake (e.g. not an indoor data lake)
An indoor water lake (e.g. not an indoor data lake)

As water flows downstream it tends to increase in volume as tributaries or streams adding to the volume in lakes, reservoirs, rivers and streams. Another similarity is that water will tend to flow and seek its level filling up space, while data can involve a seek on an HDD in addition to filling up space.

Flood of water vs. flood of data
Flood of water vs. flood of data (e.g. need for Data Protection)

There are also hybrid uses (or types) of water, just like hybrid technologies for supporting data infrastructures.

Amphicar hybrid automobile
Hybrid Automobile on water

What this all means

We might take water, data and storage for granted, yet they each need to be managed, protected, preserved and served. Servers utilize storage to support applications for managing water; water is used for cooling and powering storage, not to mention for making coffee for those who take care of IT resources.

When you hear about data lakes, ponds or pools, keep in mind that there are also data streams, all of which need to be managed to prevent the flood of data from overwhelming you.

Ok, nuff said (for now)

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO All Rights Reserved

April 2015 Server StorageIO Update Newsletter

Volume 15, Issue IV

Hello and welcome to this April 2015 Server and StorageIO update newsletter.

This months newsletter has a focus on cloud and object storage for bulk data, unstructured data, big data, archiving among other scenarios.

Enjoy this edition of the Server and StorageIO update newsletter and watch for new tips, articles, StorageIO lab report reviews, blog posts, videos and Podcasts along with in the news commentary appearing soon.

Storage I/O trends

StorageIOblog posts

April StorageIOblog posts include:

View other recent as well as past blog posts here

April Newsletter Feature Theme
Cloud and Object Storage Fundamentals

There are many facets to object storage including technology implementation, products, services, access and architectures for various applications and use scenarios. The following is a short synopsis of some basic terms and concepts associated with cloud and object storage.

Common cloud and object storage terms

  • Account or project – Top of the hierarchy that represent owner or billing information for a service that where buckets are also attached.
  • Availability Zone (AZ) can be rack of servers and storage or data center where data is spread across for storage and durability.
  • AWS regions and availability zones (AZ)
    Example of some AWS Regions and AZ’s

  • Bucket or Container – Where objects or sub-folders containing objects are attached and accessed. Note in some environments such as AWS S3 you can have sub-folders in a bucket.
  • Connector or how your applications access the cloud or object storage such as via an API, S3, Swift, Rest, CDMI, Torrent, JSON, NAS file, block of other access gateway or software.
  • Durability – Data dispersed with copies in multiple locations to survive failure of storage or server hardware, software, zone or even region. Availability = Access + Durability.
  • End-point – Where or what your software, application or tool and utilities or gateways attach to for accessing buckets and objects.
  • Ephemeral – Temporary or non-persistent
  • Eventual consistency – Data is eventually made consistency, think in terms of asynchronous or deferred writes where there is a time lag vs. synchronous or real-time updates.
  • Immutable – Persistent, non-altered or write once read many copy of data. Objects generally are not updated, rather new objects created.
  • Object storage and cloud
    Via Cloud Virtual Data Storage (CRC)

  • Object – Byte (or bit) stream that can be as small as one byte to as large as several TBytes (some solutions and services support up to 5TByte sized objects). The object contains what ever data in any organization along with meta data. Different solutions and services support from a couple hundred KBytes of meta-data to MBytes worth of meta-data. In terms of what can be stored in an object, anything from files, videos, images, virtual disks (VMDK’s, VHDX), ZIP or tar files, backup and archive save sets, executable images or ISO’s, anything you want.
  • OPS – Objects per second or how many objects accessed similar to a IOP. Access includes gets, puts, list, head, deletes for a CRUD interface e.g. Created, Read, Update, Delete.
  • Region – Location where data is stored that can include one or more data centers also known as Availability Zones.
  • Sub-folder – While object storage can be accessed in a flat name space for commonality and organization some solutions and service support the notion of sub-folder that resemble traditional directory hierarchy.

Learn more in Cloud Virtual Storage Networking (CRC) and www.objectstoragecenter.com

Storage I/O trends

OpenStack Manila (e.g. Folders and Files)

AWS recently announced their new cloud based Elastic File Storage (EFS) to compliment their existing Elastic Block Storage (EBS) offerings. However are you aware of what is going on with cloud files within OpenStack?

For those who are familiar with OpenStack or simply talk about it and Swift object storage, or perhaps Cinder block storage, are you aware that there is also a file (NAS or Network Attached Storage) component called Manila?

In concept Manila should provide a similar capability to what AWS has recently announce with their Elastic File Service (EFS), or depending on your perspective, perhaps the other way around. If you are familiar and have done anything with Manila what are your initial thoughts and perspectives.

What this all means

People routinely tell me this is the most exciting and interesting times ever in servers, storage, I/O networking, hardware, software, backup or data protection, performance, cloud and virtual or take your pick too which I would not disagree.

However, for the past several years (no, make that decade), there is new and more interesting things including in adjacent areas.

I predict that at least for the next few years (no, make that decades), we will continue to see plenty of new and interesting things, questions include.

However, what’s applicable to you and your environment vs. simply fun and interesting to watch?

Ok, nuff said, for now

Cheers gs

 

In This Issue

  • Industry Trends Perspectives News
  • Commentary in the news
  • Tips and Articles
  • StorageIOblog posts
  • Events and Webinars
  • StorageIOblog posts
  • Server StorageIO Lab reports
  • Resources and Links
  • Industry News and Activity

    Recent Industry news and activity

    View other recent industry activity here

    StorageIO Commentary in the news

    StorageIO news (image licensed for use from Shutterstock by StorageIO)
    Recent Server StorageIO commentary and industry trends perspectives about news, activities and announcements.

    CyberTrend: Comments on Software Defined Data Center and Virtualization

    View more trends comments here

    StorageIO Tips and Articles

    Check out these resources and links on server storage I/O performance and benchmarking tools. View more tips and articles here

    Various Industry Events

    EMCworld – May 4-6 2015 (Las Vegas)

    Interop – April 29 2015 (Las Vegas)
    Presenting
    Smart Shopping for Your Enterprise Storage Strategy

    View other recent and upcoming events here

    Webinars


    BrightTalk Webinar – June 23 2015
    Server Storage I/O Innovation Update

    View other webinars here

    Videos and Podcasts

    Data Protection Gumbo Podcast
    Protect Preserve and Serve Data

    In this episode, Greg Schulz is a guest on Data Protection Gumbo hosted by Demetrius Malbrough(@dmalbrough). The conversation covers various aspects of data protection which has a focus of protect preserve and serve information, applications and data across different environments and customer segments.

    While we discuss enterprise and SMB data protection, we also talk about trends from Mobile to the cloud among many others tools, technologies and techniques. Check out the podcast here.

    Springtime in Kentucky
    With Kendrick Coleman of EMCcode
    Cloud Object Storage S3motion and more

    In this episode, @EMCcode (Part of EMC) developer advocate Kendrick Coleman (@KendrickColeman) joins me (e.g. Greg Schulz) for a conversation.

    Conversation covers what is EMCcode, EMC Federation, Cloud Foundryclouds, object storage, buckets, containers, objects, node.jsDocker, OpenStack, AWS S3, micro services, and the S3motion tool Kendrick developed.

    S3motion is a good tool to have in your server storage I/O tool box for working with cloud and object storage along with others such as Cloudberry, S3fs, Cyberduck, S3 browser among many others. You can get S3motion for free from git hub here Check out the companion blog post for this podcast here.

    StorageIO podcast’s are also available via Server Storage I/O audio podcastServer Storage I/O video & at StorageIO.tv

    From StorageIO Labs

    Research, Reviews and Reports

    AWS S3 Cross-Region Replication

    AWS S3 Cross region replication
    Moving and Replicating Buckets/Containers, Sub folders and Objects (Click on Image to read about AWS Cross-Region Replication)

    View other StorageIO lab review reports here

    Resources and Links

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    How to test your HDD SSD or all flash array (AFA) storage fundamentals

    How to test your HDD SSD AFA Hybrid or cloud storage

    server storage data infrastructure i/o hdd ssd all flash array afa fundamentals

    Updated 2/14/2018

    Over at BizTech Magazine I have a new article 4 Ways to Performance Test Your New HDD or SSD that provides a quick guide to verifying or learning what the speed characteristic of your new storage device are capable of.

    An out-take from the article used by BizTech as a "tease" is:

    These four steps will help you evaluate new storage drives. And … psst … we included the metrics that matter.

    Building off the basics, server storage I/O benchmark fundamentals

    The four basic steps in the article are:

    • Plan what and how you are going to test (what’s applicable for you)
    • Decide on a benchmarking tool (learn about various tools here)
    • Test the test (find bugs, errors before a long running test)
    • Focus on metrics that matter (what’s important for your environment)

    Server Storage I/O performance

    Where To Learn More

    View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    To some the above (read the full article here) may seem like common sense tips and things everybody should know otoh there are many people who are new to servers storage I/O networking hardware software cloud virtual along with various applications, not to mention different tools.

    Thus the above is a refresher for some (e.g. Dejavu) while for others it might be new and revolutionary or simply helpful. Interested in HDD’s, SSD’s as well as other server storage I/O performance along with benchmarking tools, techniques and trends check out the collection of links here (Server and Storage I/O Benchmarking and Performance Resources).

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    Server Storage I/O Benchmark Performance Resource Tools

    Server Storage I/O Benchmarking Performance Resource Tools

    server storage I/O trends

    Updated 1/23/2018

    Server storage I/O benchmark performance resource tools, various articles and tips. These include tools for legacy, virtual, cloud and software defined environments.

    benchmark performance resource tools server storage I/O performance

    The best server and storage I/O (input/output operation) is the one that you do not have to do, the second best is the one with the least impact.

    server storage I/O locality of reference

    This is where the idea of locality of reference (e.g. how close is the data to where your application is running) comes into play which is implemented via tiered memory, storage and caching shown in the figure above.

    Cloud virtual software defined storage I/O

    Server storage I/O performance applies to cloud, virtual, software defined and legacy environments

    What this has to do with server storage I/O (and networking) performance benchmarking is keeping the idea of locality of reference, context and the application workload in perspective regardless of if cloud, virtual, software defined or legacy physical environments.

    StorageIOblog: I/O, I/O how well do you know about good or bad server and storage I/Os?
    StorageIOblog: Server and Storage I/O benchmarking 101 for smarties
    StorageIOblog: Which Enterprise HDDs to use for a Content Server Platform (7 part series with using benchmark tools)
    StorageIO.com: Enmotus FuzeDrive MicroTiering lab test using various tools
    StorageIOblog: Some server storage I/O benchmark tools, workload scripts and examples (Part I) and (Part II)
    StorageIOblog: Get in the NVMe SSD game (if you are not already)
    Doridmen.com: Transcend SSD360S Review with tips on using ATTO and Crystal benchmark tools
    ComputerWeekly: Storage performance metrics: How suppliers spin performance specifications

    Via StorageIO Podcast: Kevin Closson discusses SLOB Server CPU I/O Database Performance benchmarks
    Via @KevinClosson: SLOB Use Cases By Industry Vendors. Learn SLOB, Speak The Experts’ Language
    Via BeyondTheBlocks (Reduxio): 8 Useful Tools for Storage I/O Benchmarking
    Via CCSIObench: Cold-cache Sequential I/O Benchmark
    Doridmen.com: Transcend SSD360S Review with tips on using ATTO and Crystal benchmark tools
    CISJournal: Benchmarking the Performance of Microsoft Hyper-V server, VMware ESXi and Xen Hypervisors (PDF)
    Microsoft TechNet:Windows Server 2016 Hyper-V large-scale VM performance for in-memory transaction processing
    InfoStor: What’s The Best Storage Benchmark?
    StorageIOblog: How to test your HDD, SSD or all flash array (AFA) storage fundamentals
    Via ATTO: Atto V3.05 free storage test tool available
    Via StorageIOblog: Big Files and Lots of Little File Processing and Benchmarking with Vdbench

    Via StorageIO.com: Which Enterprise Hard Disk Drives (HDDs) to use with a Content Server Platform (White Paper)
    Via VMware Blogs: A Free Storage Performance Testing Tool For Hyperconverged
    Microsoft Technet: Test Storage Spaces Performance Using Synthetic Workloads in Windows Server
    Microsoft Technet: Microsoft Windows Server Storage Spaces – Designing for Performance
    BizTech: 4 Ways to Performance-Test Your New HDD or SSD
    EnterpriseStorageForum: Data Storage Benchmarking Guide
    StorageSearch.com: How fast can your SSD run backwards?
    OpenStack: How to calculate IOPS for Cinder Storage ?
    StorageAcceleration: Tips for Measuring Your Storage Acceleration

    server storage I/O STI and SUT

    Spiceworks: Determining HDD SSD SSHD IOP Performance
    Spiceworks: Calculating IOPS from Perfmon data
    Spiceworks: profiling IOPs

    vdbench server storage I/O benchmark
    Vdbench example via StorageIOblog.com

    StorageIOblog: What does server storage I/O scaling mean to you?
    StorageIOblog: What is the best kind of IO? The one you do not have to do
    Testmyworkload.com: Collect and report various OS workloads
    Whoishostingthis: Various SQL resources
    StorageAcceleration: What, When, Why & How to Accelerate Storage
    Filesystems.org: Various tools and links
    StorageIOblog: Can we get a side of context with them IOPS and other storage metrics?

    flash ssd and hdd

    BrightTalk Webinar: Data Center Monitoring – Metrics that Matter for Effective Management
    StorageIOblog: Enterprise SSHD and Flash SSD Part of an Enterprise Tiered Storage Strategy
    StorageIOblog: Has SSD put Hard Disk Drives (HDD’s) On Endangered Species List?

    server storage I/O bottlenecks and I/O blender

    Microsoft TechNet: Measuring Disk Latency with Windows Performance Monitor (Perfmon)
    Via Scalegrid.io: How to benchmark MongoDB with YCSB? (Perfmon)
    Microsoft MSDN: List of Perfmon counters for sql server
    Microsoft TechNet: Taking Your Server’s Pulse
    StorageIOblog: Part II: How many IOPS can a HDD, HHDD or SSD do with VMware?
    CMG: I/O Performance Issues and Impacts on Time-Sensitive Applications

    flash ssd and hdd

    Virtualization Practice: IO IO it is off to Storage and IO metrics we go
    InfoStor: Is HP Short Stroking for Performance and Capacity Gains?
    StorageIOblog: Is Computer Data Storage Complex? It Depends
    StorageIOblog: More storage and IO metrics that matter
    StorageIOblog: Moving Beyond the Benchmark Brouhaha
    Yellow-Bricks: VSAN VDI Benchmarking and Beta refresh!

    server storage I/O benchmark example

    YellowBricks: VSAN performance: many SAS low capacity VS some SATA high capacity?
    YellowBricsk: VSAN VDI Benchmarking and Beta refresh!
    StorageIOblog: Seagate 1200 12Gbs Enterprise SAS SSD StorgeIO lab review
    StorageIOblog: Part II: Seagate 1200 12Gbs Enterprise SAS SSD StorgeIO lab review
    StorageIOblog: Server Storage I/O Network Benchmark Winter Olympic Games

    flash ssd and hdd

    VMware VDImark aka View Planner (also here, here and here) as well as VMmark here
    StorageIOblog: SPC and Storage Benchmarking Games
    StorageIOblog: Speaking of speeding up business with SSD storage
    StorageIOblog: SSD and Storage System Performance

    Hadoop server storage I/O performance
    Various Server Storage I/O tools in a hadoop environment

    Michael-noll.com: Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO
    Virtualization Practice: SSD options for Virtual (and Physical) Environments Part I: Spinning up to speed on SSD
    StorageIOblog: Storage and IO metrics that matter
    InfoStor: Storage Metrics and Measurements That Matter: Getting Started
    SilvertonConsulting: Storage throughput vs. IO response time and why it matters
    Splunk: The percentage of Read / Write utilization to get to 800 IOPS?

    flash ssd and hdd
    Various server storage I/O benchmarking tools

    Spiceworks: What is the best IO IOPs testing tool out there
    StorageIOblog: How many IOPS can a HDD, HHDD or SSD do?
    StorageIOblog: Some Windows Server Storage I/O related commands
    Openmaniak: Iperf overview and Iperf.fr: Iperf overview
    StorageIOblog: Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I and Part II)
    Quest: SQL Server Perfmon Poster (PDF)
    Server and Storage I/O Networking Performance Management (webinar)
    Data Center Monitoring – Metrics that Matter for Effective Management (webinar)
    Flash back to reality – Flash SSD Myths and Realities (Industry trends & benchmarking tips), (MSP CMG presentation)
    DBAstackexchange: How can I determine how many IOPs I need for my AWS RDS database?
    ITToolbox: Benchmarking the Performance of SANs

    server storage IO labs

    StorageIOblog: Dell Inspiron 660 i660, Virtual Server Diamond in the rough (Server review)
    StorageIOblog: Part II: Lenovo TS140 Server and Storage I/O Review (Server review)
    StorageIOblog: DIY converged server software defined storage on a budget using Lenovo TS140
    StorageIOblog: Server storage I/O Intel NUC nick knack notes First impressions (Server review)
    StorageIOblog & ITKE: Storage performance needs availability, availability needs performance
    StorageIOblog: Why SSD based arrays and storage appliances can be a good idea (Part I)
    StorageIOblog: Revisiting RAID storage remains relevant and resources

    Interested in cloud and object storage visit our objectstoragecenter.com page, for flash SSD checkout storageio.com/ssd page, along with data protection, RAID, various industry links and more here.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    Watch for additional links to be added above in addition to those that appear via comments.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    I/O, I/O how well do you know good bad ugly server storage I/O iops?

    How well do you know good bad ugly I/O iops?

    server storage i/o iops activity data infrastructure trends

    Updated 2/10/2018

    There are many different types of server storage I/O iops associated with various environments, applications and workloads. Some I/Os activity are iops, others are transactions per second (TPS), files or messages per time (hour, minute, second), gets, puts or other operations. The best IO is one you do not have to do.

    What about all the cloud, virtual, software defined and legacy based application that still need to do I/O?

    If no IO operation is the best IO, then the second best IO is the one that can be done as close to the application and processor as possible with the best locality of reference.

    Also keep in mind that aggregation (e.g. consolidation) can cause aggravation (server storage I/O performance bottlenecks).

    aggregation causes aggravation
    Example of aggregation (consolidation) causing aggravation (server storage i/o blender bottlenecks)

    And the third best?

    It’s the one that can be done in less time or at least cost or effect to the requesting application, which means moving further down the memory and storage stack.

    solving server storage i/o blender and other bottlenecks
    Leveraging flash SSD and cache technologies to find and fix server storage I/O bottlenecks

    On the other hand, any IOP regardless of if for block, file or object storage that involves some context is better than those without, particular involving metrics that matter (here, here and here [webinar] )

    Server Storage I/O optimization and effectiveness

    The problem with IO’s is that they are a basic operations to get data into and out of a computer or processor, so there’s no way to avoid all of them, unless you have a very large budget. Even if you have a large budget that can afford an all flash SSD solution, you may still meet bottlenecks or other barriers.

    IO’s require CPU or processor time and memory to set up and then process the results as well as IO and networking resources to move data too their destination or retrieve them from where they are stored. While IO’s cannot be eliminated, their impact can be greatly improved or optimized by, among other techniques, doing fewer of them via caching and by grouping reads or writes (pre-fetch, write-behind).

    server storage I/O STI and SUT

    Think of it this way: Instead of going on multiple errands, sometimes you can group multiple destinations together making for a shorter, more efficient trip. However, that optimization may also mean your drive will take longer. So, sometimes it makes sense to go on a couple of quick, short, low-latency trips instead of one larger one that takes half a day even as it accomplishes many tasks. Of course, how far you have to go on those trips (i.e., their locality) makes a difference about how many you can do in a given amount of time.

    Locality of reference (or proximity)

    What is locality of reference?

    This refers to how close (i.e., its place) data exists to where it is needed (being referenced) for use. For example, the best locality of reference in a computer would be registers in the processor core, ready to be acted on immediately. This would be followed by levels 1, 2, and 3 (L1, L2, and L3) onboard caches, followed by main memory, or DRAM. After that comes solid-state memory typically NAND flash either on PCIe cards or accessible on a direct attached storage (DAS), SAN, or NAS device. 

    server storage I/O locality of reference

    Even though a PCIe NAND flash card is close to the processor, there still remains the overhead of traversing the PCIe bus and associated drivers. To help offset that impact, PCIe cards use DRAM as cache or buffers for data along with meta or control information to further optimize and improve locality of reference. In other words, this information is used to help with cache hits, cache use, and cache effectiveness vs. simply boosting cache use.

    SSD to the rescue?

    What can you do the cut the impact of IO’s?

    There are many steps one can take, starting with establishing baseline performance and availability metrics.

    The metrics that matter include IOP’s, latency, bandwidth, and availability. Then, leverage metrics to gain insight into your application’s performance.

    Understand that IO’s are a fact of applications doing work (storing, retrieving, managing data) no matter whether systems are virtual, physical, or running up in the cloud. But it’s important to understand just what a bad IO is, along with its impact on performance. Try to identify those that are bad, and then find and fix the problem, either with software, application, or database changes. Perhaps you need to throw more software caching tools, hypervisors, or hardware at the problem. Hardware may include faster processors with more DRAM and faster internal busses.

    Leveraging local PCIe flash SSD cards for caching or as targets is another option.

    You may want to use storage systems or appliances that rely on intelligent caching and storage optimization capabilities to help with performance, availability, and capacity.

    Where to gain insight into your server storage I/O environment

    There are many tools that you can be used to gain insight into your server storage I/O environment across cloud, virtual, software defined and legacy as well as from different layers (e.g. applications, database, file systems, operating systems, hypervisors, server, storage, I/O networking). Many applications along with databases have either built-in or optional tools from their provider, third-party, or via other sources that can give information about work activity being done. Likewise there are tools to dig down deeper into the various data information infrastructure to see what is happening at the various layers as shown in the following figures.

    application storage I/O performance
    Gaining application and operating system level performance insight via different tools

    windows and linux storage I/O performance
    Insight and awareness via operating system tools on Windows and Linux

    In the above example, Spotlight on Windows (SoW) which you can download for free from Dell here along with Ubuntu utilities are shown, You could also use other tools to look at server storage I/O performance including Windows Perfmon among others.

    vmware server storage I/O
    Hypervisor performance using VMware ESXi / vsphere built-in tools

    vmware server storage I/O performance
    Using Visual ESXtop to dig deeper into virtual server storage I/O performance

    vmware server storage i/o cache
    Gaining insight into virtual server storage I/O cache performance

    Wrap up and summary

    There are many approaches to address (e.g. find and fix) vs. simply move or mask data center and server storage I/O bottlenecks. Having insight and awareness into how your environment along with applications is important to know to focus resources. Also keep in mind that a bit of flash SSD or DRAM cache in the applicable place can go along way while a lot of cache will also cost you cash. Even if you cant eliminate I/Os, look for ways to decrease their impact on your applications and systems.

    Where To Learn More

    View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    >Keep in mind: SSD including flash and DRAM among others are in your future, the question is where, when, with what, how much and whose technology or packaging.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    Revisiting RAID data protection remains relevant resource links

    Revisiting RAID data protection remains relevant and resources

    Storage I/O trends

    Updated 2/10/2018

    RAID data protection remains relevant including erasure codes (EC), local reconstruction codes (LRC) among other technologies. If RAID were really not relevant anymore (e.g. actually dead), why do some people spend so much time trying to convince others that it is dead or to use a different RAID level or enhanced RAID or beyond raid with related advanced approaches?

    When you hear RAID, what comes to mind?

    A legacy monolithic storage system that supports narrow 4, 5 or 6 drive wide stripe sets or a modern system support dozens of drives in a RAID group with different options?

    RAID means many things, likewise there are different implementations (hardware, software, systems, adapters, operating systems) with various functionality, some better than others.

    For example, which of the items in the following figure come to mind, or perhaps are new to your RAID vocabulary?

    RAID questions

    There are Many Variations of RAID Storage some for the enterprise, some for SMB, SOHO or consumer. Some have better performance than others, some have poor performance for example causing extra writes that lead to the perception that all parity based RAID do extra writes (some actually do write gathering and optimization).

    Some hardware and software implementations using WBC (write back cache) mirrored or battery backed-BBU along with being able to group writes together in memory (cache) to do full stripe writes. The result can be fewer back-end writes compared to other systems. Hence, not all RAID implementations in either hardware or software are the same. Likewise, just because a RAID definition shows a particular theoretical implementation approach does not mean all vendors have implemented it in that way.

    RAID is not a replacement for backup rather part of an overall approach to providing data availability and accessibility.

    data protection and durability

    What’s the best RAID level? The one that meets YOUR needs

    There are different RAID levels and implementations (hardware, software, controller, storage system, operating system, adapter among others) for various environments (enterprise, SME, SMB, SOHO, consumer) supporting primary, secondary, tertiary (backup/data protection, archiving).

    RAID comparison
    General RAID comparisons

    Thus one size or approach does fit all solutions, likewise RAID rules of thumbs or guides need context. Context means that a RAID rule or guide for consumer or SOHO or SMB might be different for enterprise and vise versa, not to mention on the type of storage system, number of drives, drive type and capacity among other factors.

    RAID comparison
    General basic RAID comparisons

    Thus the best RAID level is the one that meets your specific needs in your environment. What is best for one environment and application may be different from what is applicable to your needs.

    Key points and RAID considerations include:

    · Not all RAID implementations are the same, some are very much alive and evolving while others are in need of a rest or rewrite. So it is not the technology or techniques that are often the problem, rather how it is implemented and then deployed.

    · It may not be RAID that is dead, rather the solution that uses it, hence if you think a particular storage system, appliance, product or software is old and dead along with its RAID implementation, then just say that product or vendors solution is dead.

    · RAID can be implemented in hardware controllers, adapters or storage systems and appliances as well as via software and those have different features, capabilities or constraints.

    · Long or slow drive rebuilds are a reality with larger disk drives and parity-based approaches; however, you have options on how to balance performance, availability, capacity, and economics.

    · RAID can be single, dual or multiple parity or mirroring-based.

    · Erasure and other coding schemes leverage parity schemes and guess what umbrella parity schemes fall under.

    · RAID may not be cool, sexy or a fun topic and technology to talk about, however many trendy tools, solutions and services actually use some form or variation of RAID as part of their basic building blocks. This is an example of using new and old things in new ways to help each other do more without increasing complexity.

    ·  Even if you are not a fan of RAID and think it is old and dead, at least take a few minutes to learn more about what it is that you do not like to update your dead FUD.

    Wait, Isn’t RAID dead?

    There is some dead marketing that paints a broad picture that RAID is dead to prop up something new, which in some cases may be a derivative variation of parity RAID.

    data dispersal
    Data dispersal and durability

    RAID rebuild improving
    RAID continues to evolve with rapid rebuilds for some systems

    Otoh, there are some specific products, technologies, implementations that may be end of life or actually dead. Likewise what might be dead, dying or simply not in vogue are specific RAID implementations or packaging. Certainly there is a lot of buzz around object storage, cloud storage, forward error correction (FEC) and erasure coding including messages of how they cut RAID. Catch is that some object storage solutions are overlayed on top of lower level file systems that do things such as RAID 6, granted they are out of sight, out of mind.

    RAID comparison
    General RAID parity and erasure code/FEC comparisons

    Then there are advanced parity protection schemes which include FEC and erasure codes that while they are not your traditional RAID levels, they have characteristic including chunking or sharding data, spreading it out over multiple devices with multiple parity (or derivatives of parity) protection.

    Bottom line is that for some environments, different RAID levels may be more applicable and alive than for others.

    Via BizTech – How to Turn Storage Networks into Better Performers

    • Maintain Situational Awareness
    • Design for Performance and Availability
    • Determine Networked Server and Storage Patterns
    • Make Use of Applicable Technologies and Techniques

    If RAID is alive, what to do with it?

    If you are new to RAID, learn more about the past, present and future keeping mind context. Keeping context in mind means that there are different RAID levels and implementations for various environments. Not all RAID 0, 1, 1/0, 10, 2, 3, 4, 5, 6 or other variations (past, present and emerging) are the same for consumer vs. SOHO vs. SMB vs. SME vs. Enterprise, nor are the usage cases. Some need performance for reads, others for writes, some for high-capacity with low performance using hardware or software. RAID Rules of thumb are ok and useful, however keep them in context to what you are doing as well as using.

    What to do next?

    Take some time to learn, ask questions including what to use when, where, why and how as well as if an approach or recommendation are applicable to your needs. Check out the following links to read some extra perspectives about RAID and keep in mind, what might apply to enterprise may not be relevant for consumer or SMB and vise versa.

    Some advise needed on SSD’s and Raid (Via Spiceworks)
    RAID 5 URE Rebuild Means The Sky Is Falling (Via BenchmarkReview)
    Double drive failures in a RAID-10 configuration (Via SearchStorage)
    Industry Trends and Perspectives: RAID Rebuild Rates (Via StorageIOblog)
    RAID, IOPS and IO observations (Via StorageIOBlog)
    RAID Relevance Revisited (Via StorageIOBlog)
    HDDs Are Still Spinning (Rust Never Sleeps) (Via InfoStor)
    When and Where to Use NAND Flash SSD for Virtual Servers (Via TheVirtualizationPractice)
    What’s the best way to learn about RAID storage? (Via Spiceworks)
    Design considerations for the host local FVP architecture (Via Frank Denneman)
    Some basic RAID fundamentals and definitions (Via SearchStorage)
    Can RAID extend nand flash SSD life? (Via StorageIOBlog)
    I/O Performance Issues and Impacts on Time-Sensitive Applications (Via CMG)
    The original RAID white paper (PDF) that while over 20 years old, it provides a basis, foundation and some history by Katz, Gibson, Patterson et al
    Storage Interview Series (Via Infortrend)
    Different RAID methods (Via RAID Recovery Guide)
    A good RAID tutorial (Via TheGeekStuff)
    Basics of RAID explained (Via ZDNet)
    RAID and IOPs (Via VMware Communities)

    Where To Learn More

    View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    What is my favorite or preferred RAID level?

    That depends, for some things its RAID 1, for others RAID 10 yet for others RAID 4, 5, 6 or DP and yet other situations could be a fit for RAID 0 or erasure codes and FEC. Instead of being focused on just one or two RAID levels as the solution for different problems, I prefer to look at the environment (consumer, SOHO, small or large SMB, SME, enterprise), type of usage (primary or secondary or data protection), performance characteristics, reads, writes, type and number of drives among other factors. What might be a fit for one environment would not be a fit for others, thus my preferred RAID level along with where implemented is the one that meets the given situation. However also keep in mind is tying RAID into part of an overall data protection strategy, remember, RAID is not a replacement for backup.

    What this all means

    Like other technologies that have been declared dead for years or decades, aka the Zombie technologies (e.g. dead yet still alive) RAID continues to be used while the technologies evolves. There are specific products, implementations or even RAID levels that have faded away, or are declining in some environments, yet alive in others. RAID and its variations are still alive, however how it is used or deployed in conjunction with other technologies also is evolving.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    VMware VVOLs storage I/O fundementals (Part 1)

    VMware VVOL’s storage I/O fundamentals (Part I)

    Note that this is a three part series with the first piece here (e.g. Are VMware VVOL’s in your virtual server and storage I/O future?), the second piece here (e.g.VMware VVOL’s and storage I/O fundamentals Part 1) and the third piece here (e.g. VMware VVOL’s and storage I/O fundamentals Part 2).

    Some of you may already be participating in the VMware beta of VVOL involving one of the initial storage vendors also in the beta program.

    Ok, now let’s go a bit deeper, however if you want some good music to listen to while reading this, check out @BruceRave GoDeepMusic.Net and shows here.

    Taking a step back, digging deeper into Storage I/O and VVOL’s fundamentals

    Instead of a VM host accessing its virtual disk (aka VMDK) which is stored in a VMFS formatted data store (part of ESXi hypervisor) built on top of a SCSI LUN (e.g. SAS, SATA, iSCSI, Fibre Channel aka FC, FCoE aka FC over Ethernet, IBA/SRP, etc) or an NFS file system presented by a storage system (or appliance), VVOL’s push more functionality and visibility down into the storage system. VVOL’s shift more intelligence and work from the hypervisor down into the storage system. Instead of a storage system simply presenting a SCSI LUN or NFS mount point and having limited (coarse) to no visibility into how the underlying storage bits, bytes as well as blocks are being used, storage systems gain more awareness.

    Keep in mind that even files and objects still get ultimately mapped to pages and blocks aka sectors even on nand flash-based SSD’s. However also keep an eye on some new technology such as the Seagate Kinetic drive that instead of responding to SCSI block based commands, leverage object API’s and associated software on servers. Read more about these emerging trends here and here at objectstoragecenter.com.

    With a normal SCSI LUN the underlying storage system has no knowledge of how the upper level operating system, hypervisor, file system or application such as a database (doing raw IO) is allocating the pages or blocks of memory aka storage. It is up to the upper level storage and data management tools to map from objects and files to the corresponding extents, pages and logical block address (LBA) understood by the storage system. In the case of a NAS solution, there is a layer of abstractions placed over the underlying block storage handling file management and the associated file to LBA mapping activity.

    Storage I/O basics
    Storage I/O and IOP basics and addressing: LBA’s and LBN’s

    Getting back to VVOL, instead of simply presenting a LUN which is essentially a linear range of LBA’s (think of a big table or array) that the hypervisor then manages data placement and access, the storage system now gains insight into what LBA’s correspond to various entities such as a VMDK or VMX, log, clone, swap or other VMware objects. With this more insight, storage systems can now do native and more granular functions such as clone, replication, snapshot among others as opposed to simply working on a coarse LUN basis. The similar concepts extend over to NAS NFS based access. Granted, there are more to VVOL’s including ability to get the underlying storage system more closely integrated with the virtual machine, hypervisor and associated management including supported service manage and classes or categories of service across performance, availability, capacity, economics.

    What about VVOL, VAAI and VASA?

    VVOL’s are building from earlier VMware initiatives including VAAI and VASA. With VAAI, VMware hypervisor’s can off-load common functions to storage systems that support features such as copy, clone, zero copy among others like how a computer can off-load graphics processing to a graphics card if present.

    VASA however provides a means for visibility, insight and awareness between the hypervisor and its associated management (e.g. vCenter etc) as well as the storage system. This includes storage systems being able to communicate and publish to VMware its capabilities for storage space capacity, availability, performance and configuration among other things.

    With VVOL’s VASA gets leveraged for unidirectional (e.g. two-way) communication where VMware hypervisor and management tools can tell the storage system of things, configuration, activities to do among others. Hence why VASA is important to have in your VMware CASA.

    What’s this object storage stuff?

    VVOL’s are a form of object storage access in that they differ from traditional block (LUN’s) and files (NAS volumes/mount points). However, keep in mind that not all object storage are the same as there are object storage access and architectures.

    object storage
    Object Storage basics, generalities and block file relationships

    Avoid making the mistake of when you hear object storage that means ANSI T10 (the folks that manage the SCSI command specifications) Object Storage Device (OSD) or something else. There are many different types of underlying object storage architectures some with block and file as well as object access front ends. Likewise there are many different types of object access that sit on top of object architectures as well as traditional storage system.

    Object storage I/O
    An example of how some object storage gets accessed (not VMware specific)

    Also keep in mind that there are many different types of object access mechanism including HTTP Rest based, S3 (e.g. a common industry defacto standard based on Amazon Simple Storage Service), SNIA CDMI, SOAP, Torrent, XAM, JSON, XML, DICOM, IL7 just to name a few, not to mention various programmatic bindings or application specific implementations and API’s. Read more about object storage architectures, access and related topics, themes and trends at www.objecstoragecenter.com

    Lets take a break here and when you are ready, click here to read the third piece in this series VMware VVOL’s and storage I/O fundamentals Part 2.

    Ok, nuff said (for now)

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Are VMware VVOLs in your virtual server and storage I/O future?

    Are VMware VVOL’s in your virtual server and storage I/O future?

    Note that this is a three part series with the first piece here (e.g. Are VMware VVOL’s in your virtual server and storage I/O future?), the second piece here (e.g. VMware VVOL’s and storage I/O fundamentals Part 1) and the third piece here (e.g. VMware VVOL’s and storage I/O fundamentals Part 2).

    With VMworld 2014 just around the corner, for some of you the question is not if Virtual Volumes (VVOL’s) are in your future, rather when, where, how and with what.

    What this means is that for some hands on beta testing is already occurring or will be soon, while for others that might be around the corner or down the road.

    Some of you may already be participating in the VMware beta of VVOL involving one of the first storage vendors also in the beta program.

    VMware vvol beta

    On the other hand, some of you may not be in VMware centric environments and thus VVOL’s may not yet be in your vocabulary.

    How do you know if VVOL are in your future if you don’t know what they are?

    First, to be clear, as of the time this was written VMware VVOL’s are not released and only in beta as well as having been covered in earlier VMworld’s. Consequently what you are going to read here is based on what VVOL material has already been made public in various venues including earlier VMworld’s and VMware blogs among other places.

    The quick synopsis of VMware VVOL’s overview:

  • Higher level of abstraction of storage vs. traditional SCSI LUN’s or NAS NFS mount points
  • Tighter level of integration and awareness between VMware hypervisors and storage systems
  • Simplified management for storage and virtualization administrators
  • Removing complexity to support increased scaling
  • Enable automation and service managed storage aka software defined storage management
  • VVOL considerations and your future

    As mentioned, as of this writing, VVOL’s are still a future item granted they exist in beta.

    For those of you in VMware environments, now is the time to add VVOL to your vocabulary which might mean simply taking the time to read a piece like this, or digging deeper into the theories of operations, configuration, usage, hints and tips, tutorials along with vendor specific implementations.

    Explore your options, and ask yourself, do you want VVOL or do you need it

    What support does your current vendor(s) have for VVOL or what is their statement of direction (SOD) which you might have to get from them under NDA.

    This means that there will be some first vendors with some of their products supporting VVOL’s with more vendors and products following (hence watch for many statements of direction announcements).

    Speaking of vendors, watch for a growing list of vendors to announce their current or plans for supporting VVOL’s, not to mention watch some of them jump up and down like Donkey in Shrek saying "oh oh pick me pick me".

    When you ask a vendor if they support VVOL’s, move beyond the simple yes or no, ask which of their specific products, it is a block (e.g. iSCSI) or NAS file (e.g. NFS) based and other caveats or configuration options.

    Watch for more information about VVOL’s in the weeks and months to come both from VMware along with from their storage provider partners.

    How will VVOL impact your organizations best practices, policies, workflow’s including who does what, along with associated responsibilities.

    Where to learn more

    Check out the companion piece to this that takes a closer look at storage I/O and VMware VVOL fundamentals here and here.

    Also check out this good VMware blog via Cormac Hogan (@CormacJHogan) that includes a video demo, granted its from 2012, however some of this stuff actually does take time and thus this is very timely. Speaking of VMware, Duncan Epping (aka @DuncanYB) at his Yellow-Bricks site has some good posts to check out as well with links to others including this here. Also check out the various VVOL related sessions at VMworld as well as the many existing, and soon to be many more blogs, articles and videos you can find via Google. And if you need a refresher, Why VASA is important to have in your VMware CASA.

    Of course keep an eye here or whichever venue you happen to read this for future follow-up and companion posts, and if you have not done so, sign up for the beta here as there are lots of good material including SDKs, configuration guides and more.

    VVOL Poll

    What are you VVOL plans, view results and cast your vote here

    Wrap up (for now)

    Hope you found this quick overview on VVOL’s of use, since VVOL’s at the time of this writing are not yet released, you will need to wait for more detailed info, or join the beta or poke around the web (for now).

    Keep an eye on and learn more about VVOL’s at VMworld 2014 as well as in various other venues.

    IMHO VVOL’s are or will be in your future, however the question will be is there going to be a back to the future moment for some of you with VVOL’s?

    Also what VVOL questions, comments and concerns are in your future and on your mind?

    And remember to check out the second part to this series here.

    Ok, nuff said (for now)

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Fall 2013 StorageIO Update Newsletter

    Storage I/O trends

    Fall 2013 StorageIO Update Newsletter

    Welcome to the Fall 2013 (joint September and October) edition of the StorageIO Update (newsletter) containing trends perspectives on cloud, virtualization and data infrastructure topics. It is fall (at least here in north America) which means conferences, symposium, virtual and physical events, seminars, webinars in addition to normal client project activities. Starting with VMworld back in late August, that event occurred in San Francisco which kicked off the fall (or back to school) season of activity. VMworld was followed with many other events including in-person along with virtual or on-line such as webinars, Google+ hangouts among others, not to mention all the briefings for vendor product announcements and updates. Check out the industry trends perspectives articles, comments and blog posts below that covers some activity over the past few months.

    VMworld 2013
    Congratulations to VMworld on the 10th anniversary of the event. With the largest installment yet of a VMworld in terms of attendance, there were also many announcements. Here are a synopsis of some of those announcements which of course included plenty of software defined marketing (SDM).

    CMG and Storage Performance
    During mid-September I was invited to give an industry trends and perspectives presentation to the Storage Performance Council (SPC) board. The SPC board were meeting in the Minneapolis area and I gave a brief talk about Metrics that Matter and importance of context with focus on applications. Speaking of the Minneapolis area, Tom Becchetti (@tbecchetti) organized a great CMG event hosted over at Blue Cross Blue Shield of Minnesota. I gave a discussion around Technolutionary, technology evolution and revolution, using old and new things in new ways.

    Check out our backup, restore, BC, DR and archiving (Under the resources section on StorageIO.com) for various presentation, book chapter downloads and other content.

    SNW Fall 2013 Long Beach
    Talking about traveling, there was a quick trip out to Long Beach for the fall 2013 edition of Storage Networking World (SNW) where I had some good meetings and conversations with those who were actually there. No need to sugar coat it, likewise no need to kick sand in its face. Plain and simple, SNW is not the event it used to be has been a common discussion theme for several years which I had set my expectation accordingly.

    Some have asked me why I even spent time, money and resources to attend SNW?

    My answer is that I had some meetings to attend to, wanted to see and meet with others who were going to be there, and perhaps even say goodbye to an event that I have been involved with for over a decade.

    Does that mean I’m all done with SNW?

    Not sure yet as will have to wait and see what SNIA and IDG/Computerworld the event co-owners and producers put together for future events. However there are enough other events and activities to pick up the slack which is part of what has caused the steady decline in events like SNW among others.

    Perhaps it is time for SNIA to partner with another adjacent yet like-minded organization such as CMG to collaborate and try doing something like what was done in the early 2000s? That is SNIA providing their own seminars along with others such as myself who involved with both CMG, SNW and SNIA to beef up or set up a storage and I/O focused track at the CMG event.

    Beyond those items mentioned above, or in the following section, there are plenty of interesting and exciting things occurring in the background that I cant talk about yet. However watch for future posts, commentary, perspectives and other information down the road (and in the not so distant future).

    Enjoy this edition of the StorageIO Update newsletter.

    Ok, nuff said (for now)

    Cheers gs

    StorageIO Industry Trends and PerspectivesIndustry trends perspectives and commentary
    What is being seen, heard and talked about while out and about

    The following is a synopsis of some StorageIOblog posts, articles and comments in different venues on various industry trends, perspectives and related themes about clouds, virtualization, data and storage infrastructure topics among related themes.

    Storage I/O trends

    InfoStor: Perspectives on Data Dynamics file migration tool (Read more about StorageX later in this newsletter)
    SearchStorage: Perspectives on Data Dynamics resurrects StorageX for file migration
    SearchStorage: Perspectives on Cisco buying SSD storage vendor Whiptail

    Recent StorageIO Tips and Articles in various venues:

    21cIT:  Why You Should Consider Object Storage
    InfoStor:  HDDs Are Still Spinning (Rust Never Sleeps)
    21cIT:  Object Storage Is in Your Future, Even if You Use Files
    21cIT:  Playing the Name Game With Virtual Storage
    InfoStor:  Flash Data Storage: Myth vs. Reality
    InfoStor:  The Nand Flash Cache SSD Cash Dance
    SearchEnterpriseWAN:  Remote Office / ROBO backup and data protection for networking Pro’s
    TheVirtualizationPractice:  When and Where to use NAND Flash SSD for Virtual Servers
    FedTech:  These Data Center (DCIM) Tools Can Streamline Computing Resources

    Storage I/O posts

    Recent StorageIO blog post:

    Seagate Kinetic Cloud and Object Storage I/O platform (and Ethernet HDD)
    Cloud conversations: Has Nirvanix shutdown caused cloud confidence concerns?
    Cisco buys Whiptail continuing the SSD storage I/O flash cash cache dash
    WD buys nand flash SSD storage I/O cache vendor Virident
    EMC New VNX MCx doing more storage I/O work vs. just being more
    Is more of something always better? Depends on what you are doing
    VMworld 2013 Vmware, server, storage I/O and networking update (Day 1)
    EMC ViPR software defined object storage part II

    Check out our objectstoragecenter.com page where you will find a growing collection of information and links pertaining to cloud and object storage themes, technologies and trends.

    Brouwer Storage Consultancy

    StorageIO in Europe (Netherlands)
    Spent over a week in the Netherlands where I presented three different seminar workshop sessions organized by Brouwer Storage Consultancy who is celebrating their 10th anniversary in business. These sessions spanned five full days of interactive discussions with an engaged diverse group of attendees in the Nijkerk area who came from across Holland to take part in these workshops.

    Congratulations to Gert and Frank Brouwer on their ten years of being in business and best wishes for many more. Fwiw those who are curious StorageIO will be ten years young in business in about two years.

    StorageIO Industry Trends and Perspectives

    Some observations from while in Europe:

    Continued cloud privacy concerns amplified by NSA and suspicion of US-based companies, yet many are not aware of similar concerns of European or UK-based firms from those outside those areas. While there were some cloud concern conversations over the demise of Nirvanix, those seemed less so then in the media or US given that at least in Holland they have seen other cloud and storage as a service firms come and go already. It should be noted that the US has also seen cloud and storage as a service startups come and go, however I think sometimes we or at least the media tends to have a short if not selective memory at times.

    In one of our workshops sessions we were talking about service level objectives (SLO), service level agreements (SLA), recovery point objectives (RPO) and recovery time objectives (RTO) among other themes. Somebody mentioned why the focus of time in RPO and questions why not a transactional perspective which I thought was a brilliant question. We had a good conversation in the group and concurred that while RPO is what the industry uses, that there also needs to be a transactional state context tie to what is inferred or assumed with RPO and RTO. Thus the importance of looking beyond just the point in time, however the importance of a transactional context or state, such as not just the time, however to a given transactional point.

    Note that transactional could mean database, file system, backup or data protection index or catalog, meta data repository or other entity. This is where some should be jumping up and down like Donkey in Shrek wanting to point out that is exactly what RTO and RPO refer to which would be great. However all to often what is assumed is not conveyed, thus those who don’t know, well, they assume or simply don’t know what others.

    StorageIO Industry Trends and Perspectives

    Data Dynamics StorageX 7.0 Intelligent Policy Based File Data Migration – There is no such thing as a data or information recession . Likewise, people and data are living longer as well as getting larger. These span various use cases from traditional to personal or at work productivity. From little to big data content, collaboration including file or document sharing to rich media applications all of which are leveraging unstructured data. For example, email, word processing back-office documents, web and text files, presentations (e.g. PowerPoint), photos, audio and video among others. These macro trends result in the continued growth of unstructured Network Attached Storage (NAS) file data.

    Thus, a common theme is adding management including automated data movement and migration to carry out structure around unstructured NAS file data. More than a data mover or storage migration tool, Data Dynamics StorageX is a software platform for adding storage management structure around unstructured local and distributed NAS file data. This includes heterogeneous vendor support across different storage system, protocols and tools including Windows CIFS and Unix/Linux NFS.
    (Disclosure DataDynamics has been a StorageIO client). Visit Data Dynamics at www.datadynamicsinc.com/

    Server and StorageIO seminars, conferences, web cats, events, activities StorageIO activities (out and about)

    Seminars, symposium, conferences, webinars
    Live in person and recorded recent and upcoming events

    Announcing: Backup.U brought to you by Dell

    Some on-line (live and recorded) events have include an ongoing series tied to data protection (Backup/restore, HA, BC, DR and Archiving) called Backup.U organized and sponsored by Dell Data Protection Software that you can learn more about at the landing page www.software.dell.com/backupu (more on this in a future post). In addition to data protection, some other events and activities including a BrightTalk webinar on storage I/O and networking for cloud environments (here).

    In addition to the above, check out the StorageIO calendar to see more recent and upcoming activities.

    Watch for more 2013 events to be added soon to the StorageIO events calendar page. Topics include data protection modernization (backup/restore, HA, BC, DR, archive), data footprint reduction (archive, compression, dedupe), storage optimization, SSD, object storage, server and storage virtualization, big data, little data, cloud and object storage, performance and management trends among others.

    Vendors, VAR’s and event organizers, give us a call or send an email to discuss having us involved in your upcoming pod cast, web cast, virtual seminar, conference or other events.

    If you missed the Summer (July and August) 2013 StorageIO update newsletter, click here to view that and other previous editions as HTML or PDF versions. Subscribe to this newsletter (and pass it along)

    and click here to subscribe to this news letter. View archives of past StorageIO update news letters as well as download PDF versions at: www.storageio.com/newsletter

    Ok, nuff said (for now)

    Cheers
    Gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Seagate Kinetic Cloud and Object Storage I/O platform (and Ethernet HDD)

    Storage I/O trends

    Seagate Kinetic Cloud and Object Storage I/O platform

    Seagate announced today their Kinetic platform and drive designed for use by object API accessed storage including for cloud deployments. The Kinetic platform includes Hard Disk Drives (HDD) that feature 1Gb Ethernet (1 GbE) attached devices that speak object access API or what Seagate refers to as a key / value.

    Seagate Kinetic architecture

    What is being announced with Seagate Kinetic Cloud and Object (Ethernet HDD) Storage?

    • Kinetic Open Storage Platform – Ethernet drives, key / value (object access) API, partner software
    • Software developer’s kits (SDK) – Developer tools, documentation, drive simulator, code libraries, code samples including for SwiftStack and Riak.
    • Partner ecosystem

    What is Kinetic?

    While it has 1 GbE ports, do not expect to be able to use those for iSCSI or NAS including NFS, CIFS or other standard access methods. Being Ethernet based, the Kinetic drive only supports the key value object access API. What this means is that applications, cloud or object stacks, key value and NoSQL data repositories, or other software that adopt the API can communicate directly using object access.

    Seagate Kinetic storage

    Internal, the HDD functions as a normal drive would store and accessing data, the object access function and translation layer shifts from being in an Object Storage Device (OSD) server node to inside the HDD. The Kinetic drive takes on the key value API personality over 1 GbE ports instead of traditional Logical Block Addressing (LBA) and Logical Block Number (LBN) access using 3g, 6g or emerging 12g SAS or SATA interfaces. Instead Kinetic drives respond to object access (aka what Seagate calls key / value) API commands such as Get, Put among others. Learn more about object storage, access and clouds at www.objectstoragecenter.com.

    Storage I/O trends

    Some questions and comments

    Is this the same as what was attempted almost a decade ago now with the T10 OSD drives?

    Seagate claims no.

    What is different this time around with Seagate doing a drive that to some may vaguely resemble the predecessor failed T10 OSD approach?

    Industry support for object access and API development have progressed from an era of build it and they will come thinking, to now where the drives are adapted to support current cloud, object and key value software deployment.

    Wont 1GbE ports be too slow vs. 12g or 6g or even 3g SAS and SATA ports?

    Keep in mind those would be apples to oranges comparisons based on the protocols and types of activity being handled. Kinetic types of devices initially will be used for large data intensive applications where emphasis is on storing or retrieving large amounts of information, vs. low latency transactional. Also, keep in mind that one of the design premises is to keep cost low, spread the work over many nodes, devices to meet those goals while relying on server-side caching tools.

    Storage I/O trends

    Does this mean that the HDD is actually software defined?

    Seagate or other HDD manufactures have not yet noticed the software defined marketing (SDM) bandwagon. They could join the software defined fun (SDF) and talk about a software defined disk (SDD) or software defined HDD (SDHDD) however let us leave that alone for now.

    The reality is that there is far more software that exists in a typical HDD than what is realized. Sure some of that is packaged inside ASICs (Application Specific Integrated Circuits) or running as firmware that can be updated. However, there is a lot of software running in a HDD hence the need for power yet energy-efficient processors found in those devices. On a drive per drive basis, you may see a Kinetic device consume more energy vs. other equivalence HDDs due to the increase in processing (compute) needed to run the extra software. However that also represents an off-load of some work from servers enabling them to be smaller or do more work.

    Are these drives for everybody?

    It depends on if your application, environment, platform and technology can leverage them or not. This means if you view the world only through what is new or emerging then these drives may be for all of those environments, while other environments will continue to leverage different drive options.

    Object storage access

    Does this mean that block storage access is now dead?

    Not quite, after all there is still some block activity involved, it is just that they have been further abstracted. On the other hand, many applications, systems or environments still rely on block as well as file based access.

    What about OpenStack, Ceph, Cassandra, Mongo, Hbase and other support?

    Seagate has indicated those and others are targeted to be included in the ecosystem.

    Seagate needs to be careful balancing their story and message with Kinetic to play to and support those focused on the new and emerging, while also addressing their bread and butter legacy markets. The balancing act is communicating options, flexibility to choose and adopt the right technology for the task without being scared of the future, or clinging to the past, not to mention throwing the baby out with the bath water in exchange for something new.

    For those looking to do object storage systems, or cloud and other scale based solutions, Kinetic represents a new tool to do your due diligence and learn more about.

    Ok, nuff said (for now)

    Cheers
    Gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    As the platters spin, HDD’s for cloud, virtual and traditional storage environments

    HDDs for cloud, virtual and traditional storage environments

    Storage I/O trends

    Updated 1/23/2018

    As the platters spin is a follow-up to a recent series of posts on Hard Disk Drives (HDD’s) along with some posts about How Many IOPS HDD’s can do.

    HDD and storage trends and directions include among others

    HDD’s will continue to be declared dead into the next decade, just as they have been for over a decade, meanwhile they are being enhanced, continued to be used in evolving roles.

    hdd and ssd

    SSD will continue to coexist with HDD, either as separate or converged HHDD’s. Where, where and how they are used will also continue to evolve. High IO (IOPS) or low latency activity will continue to move to some form of nand flash SSD (PCM around the corner), while storage capacity including some of which has been on tape stays on disk. Instead of more HDD capacity in a server, it moves to a SAN or NAS or to a cloud or service provider. This includes for backup/restore, BC, DR, archive and online reference or what some call active archives.

    The need for storage spindle speed and more

    The need for faster revolutions per minute (RPM’s) performance of drives (e.g. platter spin speed) is being replaced by SSD and more robust smaller form factor (SFF) drives. For example, some of today’s 2.5” SFF 10,000 RPM (e.g. 10K) SAS HDD’s can do as well or better than their larger 3.5” 15K predecessors can for both IOPS and bandwidth. This is also an example where the RPM speed of a drive may not be the only determination for performance as it has been in the past.


    Performance comparison of four different drive types, click to view larger image.

    The need for storage space capacity and areal density

    In terms of storage enhancements, watch for the appearance of Shingled Magnetic Recording (SMR) enabled HDD’s to help further boost the space capacity in the same footprint. Using SMR HDD manufactures can put more bits (e.g. areal density) into the same physical space on a platter.


    Traditional vs. SMR to increase storage areal density capacity

    The generic idea with SMR is to increase areal density (how many bits can be safely stored per square inch) of data placed on spinning disk platter media. In the above image on the left is a representative example of how traditional magnetic disk media lays down tracks next to each other. With traditional magnetic recording approaches, the tracks are placed as close together as possible for the write heads to safely write data.

    With new recording formats such as SMR along with improvements to read/write heads, the tracks can be more closely grouped together in an overlapping way. This overlapping way (used in a generic sense) is like how the shingles on a roof overlap, hence Shingled Magnetic Recording. Other magnetic recording or storage enhancements in the works include Heat Assisted Magnetic Recording (HAMR) and Helium filed drives. Thus, there is still plenty of bits and bytes room for growth in HDD’s well into the next decade to co-exist and complement SSD’s.

    DIF and AF (Advanced Format), or software defining the drives

    Another evolving storage feature that ties into HDD’s is Data Integrity Feature (DIF) that has a couple of different types. Depending on which type of DIF (0, 1, 2, and 3) is used; there can be added data integrity checks from the application to the storage medium or drive beyond normal functionality. Here is something to keep in mind, as there are different types or levels of DIF, when somebody says they support or need DIF, ask them which type or level as well as why.

    Are you familiar with Advanced Format (AF)? If not you should be. Traditionally outside of special formats for some operating systems or controllers, that standard open system data storage block, page or sector has been 512 bytes. This has served well in the past, however; with the advent of TByte and larger sized drives, a new mechanism is needed. The need is to support both larger average data allocation sizes from operating systems and storage systems, as well as to cut the overhead of managing all the small sectors. Operating systems and file systems have added new partitioning features such as GUID Partition Table (GPT) to support 1TB and larger SSD, HDD and storage system LUN’s.

    These enhancements are enabling larger devices to be used in place of traditional Master Boot Record (MBR) or other operating system partition and allocation schemes. The next step, however, is to teach operating systems, file systems, and hypervisors along with their associated tools or drives how to work with 4,096 byte or 4 Kbyte sectors. The advantage will be to cut the overhead of tracking all of those smaller sectors or file system extents and clusters. Today many HDD’s support AF however by default may have 512-byte emulation mode enabled due to lack of operating system or other support.

    Intelligent Power Management, moving beyond drive spin down

    Intelligent Power Management (IPM) is a collection of techniques that can be applied to vary the amount of energy consumed by a drive, controller or processor to do its work. These include in the case of an HDD slowing the spin rate of platters, however, keep in mind that mass in motion tends to stay in motion. This means that HDD’s once up and spinning do not need as much relative power as they function like a flywheel. Where their power draw comes in is during reading and write, in part to the movement of reading/write heads, however also for running the processors and electronics that control the device. Another big power consumer is when drives spin up, thus if they can be kept moving, however at a lower rate, along with disabling energy used by read/write heads and their electronics, you can see a drop in power consumption. Btw, a current generation 3.5” 4TB 6Gbs SATA HDD consumes about 6-7 watts of power while in active use, or less when in idle mode. Likewise a current generation high performance 2.5” 1.2TB HDD consumes about 4.8 watts of energy, a far cry from the 12-16 plus watts of energy some use as HDD fud.

    Hybrid Hard Disk Drives (HHDD) and Solid State Hybrid Drives (SSDHD)

    Hybrid HDD’s (HHDD’s) also known as Solid State Hybrid Drives (SSHD) have been around for a while and if you have read my earlier posts, you know that I have been a user and fan of them for several years. However one of the drawbacks of the HHDD’s has been lack of write acceleration, (e.g. they only optimize for reads) with some models. Current and emerging HDDD’s are appearing with a mix of nand flash SLC (used in earlier versions), MLC and eMLC along with DRAM while enabling write optimization. There are also more drive options available as HHDD’s from different manufactures both for desktop and enterprise class scenarios.

    The challenge with HHDD’s is that many vendors either do not understand how they fit and compliment their tiering or storage management software tools or simply do not see the value proposition. I have had vendors and others tell me that the HHDD’s don’t make sense as they are too simple, how can they be a fit without requiring tiering software, controllers, SSD and HDD’s to be viable?

    Storage I/O trends

    I also see a trend similar to when the desktop high-capacity SATA drives appeared for enterprise-class storage systems in the early 2000s. Some of the same people did not see where or how a desktop class product or technology could ever be used in an enterprise solution.

    Hmm, hey wait a minute, I seem to recall similar thinking when SCSI drives appeared in the early 90s, funny how some things do not change, DejaVu anybody?

    Does that mean HHDD’s will be used everywhere?

    Not necessarily, however, there will be places where they make sense, others where either an HDD or SSD will be more practical.

    Networking with your server and storage

    Drive native interfaces near-term will remain as 6Gbs (going to 12Gbs) SAS and SATA with some FC (you might still find a parallel SCSI drive out there). Likewise, with bridges or interface cards, those drives may appear as USB or something else.

    What about SCSI over PCIe, will that catch on as a drive interface? Tough to say however I am sure we can find some people who will gladly try to convince you of that. FC based drives operating at 4Gbs FC (4GFC) are still being used for some environments however most activity is shifting over to SAS and SATA. SAS and SATA are switching over from 3Gbs to 6Gbs with 12Gbs SAS on the roadmaps.

    So which drive is best for you?

    That depends; do you need bandwidth or IOPS, low latency or high capacity, small low profile thin form factor or feature functions? Do you need a hybrid or all SSD or a self-encrypting device (SED) also known as Instant Secure Erase (ISE), these are among your various options.

    Disk drives

    Why the storage diversity?

    Simple, some are legacy soon to be replaced and disposed of while others are newer. I also have a collection so to speak that get used for various testing, research, learning and trying things out. Click here and here to read about some of the ways I use various drives in my VMware environment including creating Raw Device Mapped (RDM) local SAS and SATA devices.

    Other capabilities and functionality existing or being added to HDD’s include RAID and data copy assist; securely erase, self-encrypting, vibration dampening among other abilities for supporting dense data environments.

    Where To Learn More

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    Do not judge a drive only by its interface, space capacity, cost or RPM alone. Look under the cover a bit to see what is inside in terms of functionality, performance, and reliability among other options to fit your needs. After all, in the data center or information factory not everything is the same.

    From a marketing and fun to talk about new technology perspective, HDD’s might be dead for some. The reality is that they are very much alive in physical, virtual and cloud environments, granted their role is changing.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.