Data Infrastructure IT Industry Related Resource Links to Others

Data Infrastructure IT Industry Related Resource Links to Others

IT Data Center and Data Infrastructure Industry Resources

Updated 2/20/2018

Following are some useful Data Infrastructure IT Industry Resource Links to cloud, virtual and traditional IT data infrastructure related web sites. The data infrastructure environment (servers, storage, IO and networking, hardware, software, services, virtual, container and cloud) is rapidly changing. You may encounter a missing URL, or a URL that has changed. This list is updated on a regular basis to reflect changes (additions, changes, and retirement).

Disclaimer and note: URL’s submitted for inclusion on this site will be reviewed for consideration and to be in generally accepted good taste in regards to the theme of this site.

Best effort has been made to validate and verify the data infrastructure URLs that appear on this page and web site however they are subject to change. The author and/or maintainer(s) of this page and web site make no endorsement to and assume no responsibility for the URLs and their content that are listed on this page.

Software Defined Data Infrastructure Essentials Book SDDC

Send an email note to info at storageio dot com that includes company name, URL, contact name, title and phone number along with a brief 40 character description to be considered for addition to the above data infrastructure list, or, to be removed. Note that Server StorageIO and UnlimitedIO LLC (e.g. StorageIO) does not sell, trade, barter, borrow or share your contact information per our Privacy and Disclosure policy. View related data infrastructure Server StorageIO content here, and signup for our free newsletter here.

Links A-E
Links F-J
Links K-O
Links P-T
Links U-Z
Other Links

  • www.10gea.org    10Gb Ethernet industry trade organization
  • www.1394ta.org    1394 (Firewire) trade association
  • www.3com.com    Networking equipment (Bought by HP)
  • www.3leafnetworks.com    I/O virtualization
  • www.3par.com    Clustered storage systems (Bought by HP)
  • www.3tera.com    IT Cloud management tools (Bought by CA)
  • www.4blox.com    Data center design services
  • www.4blox.com    Linux iSCSI target optimization stack
  • www.4bridgeworks.com aka Bridgeworks    SAN networking and connectivity solutions
  • www.80plus.org    Energy efficient power supply trade group

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Visit the following additional data infrastructure and IT data center related links.

Links A-E
Links F-J
Links K-O
Links P-T
Links U-Z
Other Links

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Cloud and Object storage are in your future, what are some questions?

Cloud and Object storage are in your future, what are some questions?

server storage I/O trends

IMHO there is no doubt that cloud and object storage are in your future, what are some questions?

Granted, what type of cloud and object storage or service along with for work or entertainment are some questions.

Likewise, what are your cloud and object storage concerns (assuming you already have heard the benefits)?

Some other questions include when, where for different applications workload needs, as well as how and with what among others.

Keep in mind that there are many aspects to cloud storage and they are not all object, likewise, there are many facets to object storage.

Recently I did a piece over at InfoStor titled Cloud Storage Concerns, Considerations and Trends that looks at the above among other items including:

  • Is cloud storage cheaper than traditional storage?
  • How do you access cloud object storage from legacy block and file applications?
  • How do you implement on-site cloud storage?
  • Is enterprise file sync and share (EFSS) safe and secure?
  • Does cloud storage need to be backed up and protected?
  • What geographic location requirements or regulations apply to you?

When it comes to cloud computing and, in particular, cloud storage, context matters. Conversations are necessary to discuss concerns, as well as discuss various considerations, options and alternatives. People often ask me questions about the best cloud storage to use, concerns about privacy, security, performance and cost.

Some of the most common cloud conversations topics involve context :

  • Public, private or hybrid cloud; turnkey subscription service or do it yourself (DIY)?
  • Storage, compute server, networking, applications or development tools?
  • Storage application such as file sync and share like Dropbox?
  • Storage resources such as table, queues, objects, file or block?
  • Storage for applications in the cloud, on-site or hybrid?

Continue reading Cloud Storage Concerns, Considerations and Trends over at InfoStor.

Where To Learn More

Additional related content can be found at:

What This All Means

As I mentioned above, cloud and object storage are in your future, granted your future may not rely on just cloud or object storage. Take a few minutes to check out some of the conversation topics, tips and trends in my piece over at InfoStor Cloud Storage Concerns, Considerations and Trends along with more material at www.objectstoragecenter.com.

Btw, what are your questions, comments, concerns, claims or caveats as part of cloud and object storage conversations?

Ok, nuff said, for now…

Cheers
Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, vSAN and VMware vExpert. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO All Rights Reserved

EMCworld 2016 Getting Started on Dell EMC announcements

EMCworld 2016 Getting Started on Dell EMC announcements

server storage I/O trends

It’s the first morning of EMCworld 2016 here in Las Vegas with some items already announced today, and more in the wings. One of the underlying themes and discussions besides what’s new or who’s doing what, is that this is for all practical purpose the last EMCworld with the upcoming Dell acquisition. What’s not clear is will there be a renamed and repackaged Dell/EMCworld?

With current EMC President Jeremy Burton who used to be the Chief Marketing Officer (CMO) at EMC slated to become the CMO across all of Dell, my bet is that there will be some type of new event picking up and moving to a new level of where EMCworld and Dellworld have been. More on the future of EMC and Dell in future posts, however for now, lets see what has unfolded so far today.

Today’s EMCworld theme is modernize the data center which means a mix of hardware, software and services announcements spanning physical, virtual, cloud among others (e.g. how do you want your servers, storage and data infrastructure wrapped). While the themes are still EMC as the Dell acquisition has yet to be completed, however there is a Dell presence, including Michael Dell here in person (more on Dell later).

The first wave of announcements include:

  • Unity All Flash Array (AFA) for small, entry-level environments
  • EMC Enterprise Copy Data Management software tools portfolio
  • ViPR Version 3.0 Controller
  • Virtustream global hyper-scale Storage Cloud for data protection and cloud native object
  • MyService360

  • Datadomain virtual edition and long-term archive

What About The Dell Deal

Michael Dell who is here at EMCworld announced on the main stage that Dell Technologies will be the name of the families of business.

This family of business includes the joint Dell, EMC, VMware, Pivotal, Secureworks, RSA and Virtustream. The Dell client focused business will be called Dell leveraging

that Brand, while the new joint Dell and EMC enterprise business will be called Dell EMC leveraging both of those brands. As a reminder, the Dell servers business unit will be moving into the existing EMC business as part of the enterprise business unit.

Lets move onto the technology announcements from today.

Unity AFA (and Hybrid)

The new Unity all flash array (AFA) is a dual controller storage system optimized for Nonvolatile Memory (NVM) flash SSD, with unified (block and file) access. EMC is positioning Unity as an entry-level AFA starting around $18K USD for a 2U solutions (much capacity that includes is not yet known, more on that in a future post). As well as having a low entry cost, EMC is positioning Unity for a broad, mass market, volume distribution that can be leveraged by their partners, including Dell. More on Unity in future posts. While Unity is new and modern, it comes from the same group who has created the VNXe leveraging that knowledge and skills base.

Note that Unity is positioned for small, mid-sized, remote office branch office (ROBO), departmental and specialized AFA situations, where EMC NVMe based DSSD D5 is positioned for higher-end shared direct attached server flash, while XtremIO and VMAX also positioned for higher-end, higher performance and workload consolidation scenarios.

  • Simple, flexible, easy to use in a 2U packaging that scale up to 80TB of NVM flash SSD storage
  • Scalable up to 3PB of storage for larger expanded configurations
  • Affordable ($18K USD starting price, $10K entry-level hybrid)
  • Modern AFA storage for entry, small, mid-sized, workgroup, departments and specialized environments
  • Unified file, block, and VMware VVOL support for storage access
  • Also available in hybrid, as well as software defined virtual and converged configurations
  • Higher performance (EMC indicates 300,000 IOPs) for given entry-level systems
  • Available in all-flash array, hybrid array, software-defined and converged configurations
  • Native controller based encryption with synchronous and asynchronous replication
  • VMware VASA 2.0, VAAI, VVols and VMware integration
  • Tight integration with EMC Data Protection portfolio tools

Read more about Unity here.

Copy Data Management

Enterprise Copy Data Management (eCDM) spans data copies from data protection including backup, BC, DR as well as for operational, analytics, test, dev, devops among other uses. Another term is Enterprise Copy Data Analytics (eCDA) which includes monitoring and management along with insight, awareness and of course analytics. These new offerings and initiatives tie together various capabilities across storage platforms and software defined storage management. Watch for more activity in and around eCDM and general copy data management. Read more here.

ViPR Controller 3.0

ViPR controller enhancements build on previous announcements, include automation as well as fail over with native replication to a standby ViPR controller. Note that there can actually be two standby controllers that are synchronized asynchronous with software built-in to ViPR. This means that there is no need for RecoverPoint or other products to do the replication of the ViPR controllers. To be clear, this is for high availability of the ViPR controllers themselves and not a replacement for HA or replication of upper layer applications, storage servers or underlying storage services. Also note that ViPR is available via open source (CoprHD via Github here). Read more here.

MyService360

MyService360 is a cloud based dashboard and data infrastructure monitoring management platform. Read more here.

Virtustream Storage Cloud

Viutustream cloud services and software tools compliments EMC (and others) storage systems as back-end for cool, cold or other bulk data storage needs. Focus is to sell primary storage to customers, then leverage back-end public cloud services for backup, archive, copy data management and other applications. This also means that the Virtustream storage cloud is not just for data protection such as archiving, backup, BC, DR it’s also for other big fast data including cloud and object native applications. Does this mean Virtustream is an alternative to other cloud and object storage services such as AWS S3, Google GCS among others? Yup. Read more here.

Where To Learn More

  • Session Streaming For video of keynotes, general sessions, backstage sessions, and EMC TV coverage, click here
  • Social: Follow @EMCWorld,  @EMCCorp, @EMC_News and @EMCStorage, and join conversations with  #EMCWORLD, and like EMC on Facebook
  • Photos: Access event photos via  Flickr and EMC Pulse Blog or visit the special EMC World News microsite here
  • Reflections: Read Core Technologies President, Guy Churchward’s Reflections post on today’s announcements here
  • Visit the EMC Store, the EMC Community Network Site and The Core Blog

What This All Means

With the announcement of Unity and impending Dell deal, some of you might (or should) have a Dejavu moment of over a decade or so ago when Dell and EMC entered into OEM agreement around the then Clariion mid range storage arrays (e.g. predecessors of VNX and VNXe). Unity is being designed as a high performance, easy to use, flexible, scalable, cost-effective storage solutions for a broad high-volume sales and distribution channel market.

What does Unity mean for EMC VNX and VNXe as well as XtremIO? Unity will position near where the VNXe has been positioned, along with some of the competing solutions from Dell among others. There might be some overlap with other EMC solutions, however if executed properly, Unity should open up some new markets, perhaps at the hands of some of the newer popular startups that only offer AFA vs. hybrids. Likewise I would expect Unity to appear in future converged solutions such as those via the EMC Converged business unit (e.g. VCE).

Even with the upcoming Dell acquisition and integration, EMC continues to evolve and innovate in many areas.

Watch for more announcements later today and throughout the week

Ok, nuff said

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO All Rights Reserved

Which Enterprise HDD for Content Applications Different File Size Impact

Which HDD for Content Applications Different File Size Impact

Different File Size Impact server storage I/O trends

Updated 1/23/2018

Which enterprise HDD to use with a content server platform different file size impact.

Insight for effective server storage I/O decision making
Server StorageIO Lab Review

Which enterprise HDD to use for content servers

This is the fifth in a multi-part series (read part four here) based on a white paper hands-on lab report I did compliments of Servers Direct and Seagate that you can read in PDF form here. The focus is looking at the Servers Direct (www.serversdirect.com) converged Content Solution platforms with Seagate Enterprise Hard Disk Drive (HDD’s). In this post the focus looks at large and small file I/O processing.

File Performance Activity

Tip, Content solutions use files in various ways. Use the following to gain perspective how various HDD’s handle workloads similar to your specific needs.

Two separate file processing workloads were run (12), one with a relative small number of large files, and another with a large number of small files. For the large file processing (table-3), 5 GByte sized files were created and then accessed via 128 Kbyte (128KB) sized I/O over a 10 hour period with 90% read using 64 threads (workers). Large file workload simulates what might be seen with higher definition video, image or other content streaming.

(Note 12) File processing workloads were run using Vdbench 5.04 and file anchors with sample script configuration below. Instead of vdbench you could also use other tools such as sysbench or fio among others.

VdbenchFSBigTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=5,files=20,size=5G
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=128k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSBigTest.txt -m 16 -o Results_FSbig_H_060615

VdbenchFSSmallTest.txt
# Sample script for big files testing
fsd=fsd1,anchor=H:,depth=1,width=64,files=25600,size=16k
fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=1k,fileselect=random,fileio=random,threads=64
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSSmallTest.txt -m 16 -o Results_FSsmall_H_060615

The 10% writes are intended to reflect some update activity for new content or other changes to content. Note that 128KB per second translates to roughly 1 Gbps streaming content such as higher definition video. However 4K video (not optimized) would require a higher speed as well as resulting in larger file sizes. Table-3 shows the performance during the large file access period showing average read /write rates and response time, bandwidth (MBps), average open and close rates with response time.

Avg. File Read Rate

Avg. Read Resp. Time
Sec.

Avg. File Write Rate

Avg. Write Resp. Time
Sec.

Avg.
CPU %
Total

Avg. CPU % System

Avg. MBps
Read

Avg. MBps
Write

ENT 15K R1

580.7

107.9

64.5

19.7

52.2

35.5

72.6

8.1

ENT 10K R1

455.4

135.5

50.6

44.6

34.0

22.7

56.9

6.3

ENT CAP R1

285.5

221.9

31.8

19.0

43.9

28.3

37.7

4.0

ENT 10K R10

690.9

87.21

76.8

48.6

35.0

21.8

86.4

9.6

Table-3 Performance summary for large file access operations (90% read)

Table-3 shows that for two-drive RAID 1, the Enterprise 15K are the fastest performance, however using a RAID 10 with four 10K HDD’s with enhanced cache features provide a good price, performance and space capacity option. Software RAID was used in this workload test.

Figure-4 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

large file processing
Figure-4 Large file processing 90% read, 10% write rate and response time

In figure-4 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K HDD’s).

Results in figure-4 above and table-4 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-4 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-4 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

Avg.
File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

580.7

$1.02

$2.05

$ 0.99

$0.99

$1,190

100%

$2.1

107.9

ENT 10K R1

455.5

1.92

3.84

0.49

0.49

1,750

100%

3.8

135.5

ENT CAP R1

285.5

1.40

2.80

0.20

0.20

798

100%

2.8

271.9

ENT 10K R10

690.9

1.27

5.07

0.49

0.97

3,500

100%

5.1

87.2

Table-4 Performance, capacity and cost analysis for big file processing

Small File Size Processing

To simulate a general file sharing environment, or content streaming with many smaller objects, 1,638,464 16KB sized files were created on each device being tested (table-5). These files were spread across 64 directories (25,600 files each) and accessed via 64 threads (workers) doing 90% reads with a 1KB I/O size over a ten hour time frame. Like the large file test, and database activity, all workloads were run at the same time (e.g. test devices were concurrently busy).

Avg. File Read Rate

Avg. Read Resp. Time
Sec.

Avg. File Write Rate

Avg. Write Resp. Time
Sec.

Avg.
CPU %
Total

Avg. CPU % System

Avg. MBps
Read

Avg. MBps
Write

ENT 15K R1

3,415.7

1.5

379.4

132.2

24.9

19.5

3.3

0.4

ENT 10K R1

2,203.4

2.9

244.7

172.8

24.7

19.3

2.2

0.2

ENT CAP R1

1,063.1

12.7

118.1

303.3

24.6

19.2

1.1

0.1

ENT 10K R10

4,590.5

0.7

509.9

101.7

27.7

22.1

4.5

0.5

Table-5 Performance summary for small sized (16KB) file access operations (90% read)

Figure-5 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

small file processing
Figure-5 Small file processing 90% read, 10% write rate and response time

In figure-5 you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K RPM), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K RPM HDD’s) that has higher performance and capacity along with costs (table-5).

Results in figure-5 above and table-5 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-6 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-6 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however that also comes at a budget cost (e.g. price).

Avg.
File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

3,415.7

$0.17

$0.35

$0.99

$0.99

$1,190

100%

$0.35

1.51

ENT 10K R1

2,203.4

0.40

0.79

0.49

0.49

1,750

100%

0.79

2.90

ENT CAP R1

1,063.1

0.38

0.75

0.20

0.20

798

100%

0.75

12.70

ENT 10K R10

4,590.5

0.19

0.76

0.49

0.97

3,500

100%

0.76

0.70

Table-6 Performance, capacity and cost analysis for small file processing

Looking at the small file processing analysis in table-5 shows that the 15K HDD’s on an apples to apples basis (e.g. same RAID level and number of drives) provide the best performance. However when also factoring in space capacity, performance, different RAID level or other protection schemes along with cost, there are other considerations. On the other hand the Enterprise Capacity 2TB HDD’s have a low cost per capacity, however do not have the performance of other options, assuming your applications need more performance.

Thus the right HDD for one application may not be the best one for a different scenario as well as multiple metrics as shown in table-5 need to be included in an informed storage decision making process.

Where To Learn More

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

File processing are common content applications tasks, some being small, others large or mixed as well as reads and writes. Even if your content environment is using object storage, chances are unless it is a new applications or a gateway exists, you may be using NAS or file based access. Thus the importance of if your applications are doing file based processing, either run your own applications or use tools that can simulate as close as possible to what your environment is doing.

Continue reading part six in this multi-part series here where the focus is around general I/O including 8KB and 128KB sized IOPs along with associated metrics.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Big Files Lots of Little File Processing Benchmarking with Vdbench

Big Files Lots of Little File Processing Benchmarking with Vdbench


server storage data infrastructure i/o File Processing Benchmarking with Vdbench

Updated 2/10/2018

Need to test a server, storage I/O networking, hardware, software, services, cloud, virtual, physical or other environment that is either doing some form of file processing, or, that you simply want to have some extra workload running in the background for what ever reason? An option is File Processing Benchmarking with Vdbench.

I/O performance

Getting Started


Here’s a quick and relatively easy way to do it with Vdbench (Free from Oracle). Granted there are other tools, both for free and for fee that can similar things, however we will leave those for another day and post. Here’s the con to this approach, there is no Uui Gui like what you have available with some other tools Here’s the pro to this approach, its free, flexible and limited by your creative, amount of storage space, server memory and I/O capacity.

If you need a background on Vdbench and benchmarking, check out the series of related posts here (e.g. www.storageio.com/performance).

Get and Install the Vdbench Bits and Bytes


If you do not already have Vdbench installed, get a copy from the Oracle or Source Forge site (now points to Oracle here).

Vdbench is free, you simply sign-up and accept the free license, select the version down load (it is a single, common distribution for all OS) the bits as well as documentation.

Installation particular on Windows is really easy, basically follow the instructions in the documentation by copying the contents of the download folder to a specified directory, set up any environment variables, and make sure that you have Java installed.

Here is a hint and tip for Windows Servers, if you get an error message about counters, open a command prompt with Administrator rights, and type the command:

$ lodctr /r


The above command will reset your I/O counters. Note however that command will also overwrite counters if enabled so only use it if you have to.

Likewise *nix install is also easy, copy the files, make sure to copy the applicable *nix shell script (they are in the download folder), and verify Java is installed and working.

You can do a vdbench -t (windows) or ./vdbench -t (*nix) to verify that it is working.

Vdbench File Processing

There are many options with Vdbench as it has a very robust command and scripting language including ability to set up for loops among other things. We are only going to touch the surface here using its file processing capabilities. Likewise, Vdbench can run from a single server accessing multiple storage systems or file systems, as well as running from multiple servers to a single file system. For simplicity, we will stick with the basics in the following examples to exercise a local file system. The limits on the number of files and file size are limited by server memory and storage space.

You can specify number and depth of directories to put files into for processing. One of the parameters is the anchor point for the file processing, in the following examples =S:\SIOTEMP\FS1 is used as the anchor point. Other parameters include the I/O size, percent reads, number of threads, run time and sample interval as well as output folder name for the result files. Note that unlike some tools, Vdbench does not create a single file of results, rather a folder with several files including summary, totals, parameters, histograms, CSV among others.


Simple Vdbench File Processing Commands

For flexibility and ease of use I put the following three Vdbench commands into a simple text file that is then called with parameters on the command line.
fsd=fsd1,anchor=!fanchor,depth=!dirdep,width=!dirwid,files=!numfiles,size=!filesize

fwd=fwd1,fsd=fsd1,rdpct=!filrdpct,xfersize=!fxfersize,fileselect=random,fileio=random,threads=!thrds

rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=!etime,interval=!itime

Simple Vdbench script

# SIO_vdbench_filesystest.txt
#
# Example Vdbench script for file processing
#
# fanchor = file system place where directories and files will be created
# dirwid = how wide should the directories be (e.g. how many directories wide)
# numfiles = how many files per directory
# filesize = size in in k, m, g e.g. 16k = 16KBytes
# fxfersize = file I/O transfer size in kbytes
# thrds = how many threads or workers
# etime = how long to run in minutes (m) or hours (h)
# itime = interval sample time e.g. 30 seconds
# dirdep = how deep the directory tree
# filrdpct = percent of reads e.g. 90 = 90 percent reads
# -p processnumber = optional specify a process number, only needed if running multiple vdbenchs at same time, number should be unique
# -o output file that describes what being done and some config info
#
# Sample command line shown for Windows, for *nix add ./
#
# The real Vdbench script with command line parameters indicated by !=
#

fsd=fsd1,anchor=!fanchor,depth=!dirdep,width=!dirwid,files=!numfiles,size=!filesize

fwd=fwd1,fsd=fsd1,rdpct=!filrdpct,xfersize=!fxfersize,fileselect=random,fileio=random,threads=!thrds

rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=!etime,interval=!itime

Big Files Processing Script


With the above script file defined, for Big Files I specify a command line such as the following.
$ vdbench -f SIO_vdbench_filesystest.txt fanchor=S:\SIOTemp\FS1 dirwid=1 numfiles=60 filesize=5G fxfersize=128k thrds=64 etime=10h itime=30 numdir=1 dirdep=1 filrdpct=90 -p 5576 -o SIOWS2012R220_NOFUZE_5Gx60_BigFiles_64TH_STX1200_020116

Big Files Processing Example Results


The following is one of the result files from the folder of results created via the above command for Big File processing showing totals.


Run totals

21:09:36.001 Starting RD=format_for_rd1

Feb 01, 2016 .Interval. .ReqstdOps.. ...cpu%... read ....read.... ...write.... ..mb/sec... mb/sec .xfer.. ...mkdir... ...rmdir... ..create... ...open.... ...close... ..delete...
rate resp total sys pct rate resp rate resp read write total size rate resp rate resp rate resp rate resp rate resp rate resp
21:23:34.101 avg_2-28 2848.2 2.70 8.8 8.32 0.0 0.0 0.00 2848.2 2.70 0.00 356.0 356.02 131071 0.0 0.00 0.0 0.00 0.1 109176 0.1 0.55 0.1 2006 0.0 0.00

21:23:35.009 Starting RD=rd1; elapsed=36000; fwdrate=max. For loops: None

07:23:35.000 avg_2-1200 4939.5 1.62 18.5 17.3 90.0 4445.8 1.79 493.7 0.07 555.7 61.72 617.44 131071 0.0 0.00 0.0 0.00 0.0 0.00 0.1 0.03 0.1 2.95 0.0 0.00


Lots of Little Files Processing Script


For lots of little files, the following is used.


$ vdbench -f SIO_vdbench_filesystest.txt fanchor=S:\SIOTEMP\FS1 dirwid=64 numfiles=25600 filesize=16k fxfersize=1k thrds=64 etime=10h itime=30 dirdep=1 filrdpct=90 -p 5576 -o SIOWS2012R220_NOFUZE_SmallFiles_64TH_STX1200_020116

Lots of Little Files Processing Example Results


The following is one of the result files from the folder of results created via the above command for Big File processing showing totals.
Run totals

09:17:38.001 Starting RD=format_for_rd1

Feb 02, 2016 .Interval. .ReqstdOps.. ...cpu%... read ....read.... ...write.... ..mb/sec... mb/sec .xfer.. ...mkdir... ...rmdir... ..create... ...open.... ...close... ..delete...
rate resp total sys pct rate resp rate resp read write total size rate resp rate resp rate resp rate resp rate resp rate resp
09:19:48.016 avg_2-5 10138 0.14 75.7 64.6 0.0 0.0 0.00 10138 0.14 0.00 158.4 158.42 16384 0.0 0.00 0.0 0.00 10138 0.65 10138 0.43 10138 0.05 0.0 0.00

09:19:49.000 Starting RD=rd1; elapsed=36000; fwdrate=max. For loops: None

19:19:49.001 avg_2-1200 113049 0.41 67.0 55.0 90.0 101747 0.19 11302 2.42 99.36 11.04 110.40 1023 0.0 0.00 0.0 0.00 0.0 0.00 7065 0.85 7065 1.60 0.0 0.00


Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

The above examples can easily be modified to do different things particular if you read the Vdbench documentation on how to setup multi-host, multi-storage system, multiple job streams to do different types of processing. This means you can benchmark a storage systems, server or converged and hyper-converged platform, or simply put a workload on it as part of other testing. There are even options for handling data footprint reduction such as compression and dedupe.

Ok, nuff said, for now.

Gs

Greg Schulz - Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

April 2015 Server StorageIO Update Newsletter

Volume 15, Issue IV

Hello and welcome to this April 2015 Server and StorageIO update newsletter.

This months newsletter has a focus on cloud and object storage for bulk data, unstructured data, big data, archiving among other scenarios.

Enjoy this edition of the Server and StorageIO update newsletter and watch for new tips, articles, StorageIO lab report reviews, blog posts, videos and Podcasts along with in the news commentary appearing soon.

Storage I/O trends

StorageIOblog posts

April StorageIOblog posts include:

View other recent as well as past blog posts here

April Newsletter Feature Theme
Cloud and Object Storage Fundamentals

There are many facets to object storage including technology implementation, products, services, access and architectures for various applications and use scenarios. The following is a short synopsis of some basic terms and concepts associated with cloud and object storage.

Common cloud and object storage terms

  • Account or project – Top of the hierarchy that represent owner or billing information for a service that where buckets are also attached.
  • Availability Zone (AZ) can be rack of servers and storage or data center where data is spread across for storage and durability.
  • AWS regions and availability zones (AZ)
    Example of some AWS Regions and AZ’s

  • Bucket or Container – Where objects or sub-folders containing objects are attached and accessed. Note in some environments such as AWS S3 you can have sub-folders in a bucket.
  • Connector or how your applications access the cloud or object storage such as via an API, S3, Swift, Rest, CDMI, Torrent, JSON, NAS file, block of other access gateway or software.
  • Durability – Data dispersed with copies in multiple locations to survive failure of storage or server hardware, software, zone or even region. Availability = Access + Durability.
  • End-point – Where or what your software, application or tool and utilities or gateways attach to for accessing buckets and objects.
  • Ephemeral – Temporary or non-persistent
  • Eventual consistency – Data is eventually made consistency, think in terms of asynchronous or deferred writes where there is a time lag vs. synchronous or real-time updates.
  • Immutable – Persistent, non-altered or write once read many copy of data. Objects generally are not updated, rather new objects created.
  • Object storage and cloud
    Via Cloud Virtual Data Storage (CRC)

  • Object – Byte (or bit) stream that can be as small as one byte to as large as several TBytes (some solutions and services support up to 5TByte sized objects). The object contains what ever data in any organization along with meta data. Different solutions and services support from a couple hundred KBytes of meta-data to MBytes worth of meta-data. In terms of what can be stored in an object, anything from files, videos, images, virtual disks (VMDK’s, VHDX), ZIP or tar files, backup and archive save sets, executable images or ISO’s, anything you want.
  • OPS – Objects per second or how many objects accessed similar to a IOP. Access includes gets, puts, list, head, deletes for a CRUD interface e.g. Created, Read, Update, Delete.
  • Region – Location where data is stored that can include one or more data centers also known as Availability Zones.
  • Sub-folder – While object storage can be accessed in a flat name space for commonality and organization some solutions and service support the notion of sub-folder that resemble traditional directory hierarchy.

Learn more in Cloud Virtual Storage Networking (CRC) and www.objectstoragecenter.com

Storage I/O trends

OpenStack Manila (e.g. Folders and Files)

AWS recently announced their new cloud based Elastic File Storage (EFS) to compliment their existing Elastic Block Storage (EBS) offerings. However are you aware of what is going on with cloud files within OpenStack?

For those who are familiar with OpenStack or simply talk about it and Swift object storage, or perhaps Cinder block storage, are you aware that there is also a file (NAS or Network Attached Storage) component called Manila?

In concept Manila should provide a similar capability to what AWS has recently announce with their Elastic File Service (EFS), or depending on your perspective, perhaps the other way around. If you are familiar and have done anything with Manila what are your initial thoughts and perspectives.

What this all means

People routinely tell me this is the most exciting and interesting times ever in servers, storage, I/O networking, hardware, software, backup or data protection, performance, cloud and virtual or take your pick too which I would not disagree.

However, for the past several years (no, make that decade), there is new and more interesting things including in adjacent areas.

I predict that at least for the next few years (no, make that decades), we will continue to see plenty of new and interesting things, questions include.

However, what’s applicable to you and your environment vs. simply fun and interesting to watch?

Ok, nuff said, for now

Cheers gs

 

In This Issue

  • Industry Trends Perspectives News
  • Commentary in the news
  • Tips and Articles
  • StorageIOblog posts
  • Events and Webinars
  • StorageIOblog posts
  • Server StorageIO Lab reports
  • Resources and Links
  • Industry News and Activity

    Recent Industry news and activity

    View other recent industry activity here

    StorageIO Commentary in the news

    StorageIO news (image licensed for use from Shutterstock by StorageIO)
    Recent Server StorageIO commentary and industry trends perspectives about news, activities and announcements.

    CyberTrend: Comments on Software Defined Data Center and Virtualization

    View more trends comments here

    StorageIO Tips and Articles

    Check out these resources and links on server storage I/O performance and benchmarking tools. View more tips and articles here

    Various Industry Events

    EMCworld – May 4-6 2015 (Las Vegas)

    Interop – April 29 2015 (Las Vegas)
    Presenting
    Smart Shopping for Your Enterprise Storage Strategy

    View other recent and upcoming events here

    Webinars


    BrightTalk Webinar – June 23 2015
    Server Storage I/O Innovation Update

    View other webinars here

    Videos and Podcasts

    Data Protection Gumbo Podcast
    Protect Preserve and Serve Data

    In this episode, Greg Schulz is a guest on Data Protection Gumbo hosted by Demetrius Malbrough(@dmalbrough). The conversation covers various aspects of data protection which has a focus of protect preserve and serve information, applications and data across different environments and customer segments.

    While we discuss enterprise and SMB data protection, we also talk about trends from Mobile to the cloud among many others tools, technologies and techniques. Check out the podcast here.

    Springtime in Kentucky
    With Kendrick Coleman of EMCcode
    Cloud Object Storage S3motion and more

    In this episode, @EMCcode (Part of EMC) developer advocate Kendrick Coleman (@KendrickColeman) joins me (e.g. Greg Schulz) for a conversation.

    Conversation covers what is EMCcode, EMC Federation, Cloud Foundryclouds, object storage, buckets, containers, objects, node.jsDocker, OpenStack, AWS S3, micro services, and the S3motion tool Kendrick developed.

    S3motion is a good tool to have in your server storage I/O tool box for working with cloud and object storage along with others such as Cloudberry, S3fs, Cyberduck, S3 browser among many others. You can get S3motion for free from git hub here Check out the companion blog post for this podcast here.

    StorageIO podcast’s are also available via Server Storage I/O audio podcastServer Storage I/O video & at StorageIO.tv

    From StorageIO Labs

    Research, Reviews and Reports

    AWS S3 Cross-Region Replication

    AWS S3 Cross region replication
    Moving and Replicating Buckets/Containers, Sub folders and Objects (Click on Image to read about AWS Cross-Region Replication)

    View other StorageIO lab review reports here

    Resources and Links

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    S3motion Buckets Containers Objects AWS S3 Cloud and EMCcode

    Storage I/O trends

    S3motion Buckets Containers Objects AWS S3 Cloud and EMCcode

    It’s springtime in Kentucky and recently I had the opportunity to have a conversation with Kendrick Coleman to talk about S3motion, Buckets, Containers, Objects, AWS S3, Cloud and Object Storage, node.js, EMCcode and open source among other related topics which are available in a podcast here, or video here and available at StorageIO.tv.

    In this Server StorageIO industry trends perspective podcast episode, @EMCcode (Part of EMC) developer advocate Kendrick Coleman (@KendrickColeman) joins me for a conversation. Our conversation spans spring-time in Kentucky (where Kendrick lives) which means Bourbon and horse racing as well as his blog (www.kendrickcoleman.com).

    Btw, in the podcast I refer to Captain Obvious and Kendrick’s beard, for those not familiar with who or what @Captainobvious is that is made reference to, click here to learn more.


    @Kendrickcoleman
    & @Captainobvious

    What about Clouds Object Storage Programming and other technical stuff?

    Of course we also talk some tech including what is EMCcode, EMC Federation, Cloud Foundry, clouds, object storage, buckets, containers, objects, node.js, Docker, Openstack, AWS S3, micro services, and the S3motion tool that Kendrick developed.

    Cloud and Object Storage Access
    Click to view video

    Kendrick explains the motivation behind S3motion along with trends in and around objects (including GET, PUT vs. traditional Read, Write) as well as programming among related topic themes and how context matters.

    S3motion for AWS S3 Google and object storage
    Click to listen to podcast

    I have used S3motion for moving buckets, containers and objects around including between AWS S3, Google Cloud Storage (GCS) and Microsoft Azure as well as to/from local. S3motion is a good tool to have in your server storage I/O tool box for working with cloud and object storage along with others such as Cloudberry, S3fs, Cyberduck, S3 browser among many others.

    You can get S3motion free from git hub here.

    Amazon Web Services AWS

    Where to learn more

    Here are some links to learn more about AWS S3, Cloud and Object Storage along with related topics

    Also available on

    What this all means and wrap-up

    Context matters when it comes to many things particular about objects as they can mean different things. Tools such as S3motion make it easy for moving your buckets or containers along with objects from one cloud storage system, solution or service to another. Also check out EMCcode to see what they are doing on different fronts from supporting new and greenfield development with Cloud Foundry and PaaS to Openstack to bridging current environments to the next generation of platforms. Also check out Kendricks blog site as he has a lot of good technical content as well as some other fun stuff to learn about. Look forward to having Kendrick on as a guest again soon to continue our conversations. In the meantime, check out S3motion to see how it can fit into your server storage I/O tool box.

    Ok, nuff said, for now..

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    March 2015 Server StorageIO Update Newsletter

    Volume 15, Issue III

    Hello and welcome to this March 2015 Server and StorageIO update newsletter. Here in the northern hemisphere at least by the calendar spring is here, weather wise winter continues to linger in some areas. March also means in the US college university sports tournaments with many focused on their NCAA men’s basketball championship brackets.

    Besides various college championships, March also has a connection to back up and data protection. Thus this months newsletter has a focus on data protection, after all March 31 is World Backup Day which means it should also be World Restore test day!

    Focus on Data Protection

    Data protection including backup/restore, business continuance (BC), disaster recovery (DR), business resiliency (BR) and archiving across physical, virtual and cloud environments.

    Data Protection Fundamentals

    A reminder on the importance of data protection including backup, BC, DR and related technologies is to make sure they are occuring as planned. Also test your copies and remember the 4 3 2 1 rule or guide.

    4 – Versions (different time intervals)
    3 – Copies of critical data (including versions)
    2 – Different media, devices or systems
    1 – Off-site (cloud or elsewhere)

    The above means having at least four (4) different versions from various points in time of your data. Having three (3) copies including various versions protects against one or more copies being corrupt or damaged. Placing those versions and copies on at least two (2) different storage systems, devices or media if something happens.

    While it might be common sense, a bad April Fools recovery joke would be finding out all of your copies were on the same device which is damaged. That might seem obvious however sometimes the obvious needs to be stated. Also make sure that at least one (1) of your copies is off-site either on off-line media (tape, disk, ssd, optical) or cloud.

    Take a few moments and to verify that your data protection strategy is being implemented and practiced as intended. Also test what is being copied including not only restore the data from cloud, disk, ssd or tape, also make sure you can actually read or use the data being protected. This means make sure that your security credentials including access certificates and decryption occur as expected.

    Watch for more news, updates industry trends perspectives commentary, tips, articles and other information at Storageio.com, StorageIOblog.com, various partner venues as well as in future newsletters.

    StorageIOblog posts

    Data Protection Diaries

    Are restores ready for World Backup Day?
    In case you forgot or did not know, World Backup Day is March 31 2015 (@worldbackupday) so now is a good time to be ready. The only challenge that I have with the World Backup Day (view their site here) that has gone on for a few years know is that it is a good way to call out the importance of backing up or protecting data.
    world backup day test your restore

    However it’s also time to put more emphasis and focus on being able to make sure those backups or protection copies actually work.

    By this I mean doing more than making sure that your data can be read from tape, disk, SSD or cloud service actually going a step further and verifying that restored data can actually be used (read, written, etc).

    The problem, issue and challenges are simple, are your applications, systems and data protected as well as can you use those protection copies (e.g. backups, snapshots, replicas or archives) when as well as were needed? Read more here about World Backup Day and what I’m doing as well as various tips to be ready for successful recovery and avoid being an April 1st fool ;).

    Cloud Conversations
    AWS S3 Cross Region Replication
    Amazon Web Services (AWS) announced several enhancements including a new Simple Storage Service (S3) cross-region replication of objects from a bucket (e.g. container) in one region to a bucket in another region.

    AWS also recently enhanced Elastic Block Storage (EBS) increasing maximum performance and size of Provisioned IOPS (SSD) and General Purpose (SSD) volumes. EBS enhancements included ability to store up to 16 TBytes of data in a single volume and do 20,000 input/output operations per second (IOPS). Read more about EBS and other AWS server, storage I/O  enhancements here.
    AWS regions and availability zones (AZ)
    Example of some AWS Regions and AZs

    AWS S3 buckets and objects are stored in a specific region designated by the customer or user (AWS S3, EBS, EC2, Glacier, Regions and Availability Zone primer can be found here). The challenge being addressed by AWS with S3 replication is being able to move data (e.g. objects) stored in AWS buckets in one region to another in a safe, secure, timely, automated, cost-effective way.

    Continue reading more here about AWS S3 bucket and object replication feature along with related material.

    Additional March StorageIOblog posts include:

    View other recent as well as past blog posts here

    In This Issue

  • Industry Trends Perspectives News
  • Commentary in the news
  • Tips and Articles
  • StorageIOblog posts
  • Events and Webinars
  • Recommended Reading List
  • StorageIOblog posts
  • Server StorageIO Lab reports
  • Resources and Links
  • Industry News and Activity

    Recent Industry news and activity

    EMC sets up cloudfoundry Dojo
    AWS S3, EBS IOPs and other updates
    New backup/data protection vendor Rubrik
    Google adds nearline Cloud Storage
    AWS and Microsoft Cloud Price battle

    View other recent and upcoming events here

    StorageIO Commentary in the news

    StorageIO news (image licensed for use from Shutterstock by StorageIO)
    Recent Server StorageIO commentary and industry trends perspectives about news, activities and announcements.

    Processor: Enterprise Backup Solution Tips
    Processor: Failed & Old Drives
    EnterpriseStorageForum: Disk Buying Guide
    ChannelProNetwork: 2015 Tech and SSD
    Processor: Detect & Avoid Drive Failures

    View more trends comments here

    StorageIO Tips and Articles

    So you have a new storage device or system. How will you test or find its performance? Check out this quick-read tip on storage benchmark and testing fundamentals over at BizTech.

    Keeping with this months theme of data protection including backup/restore, BC, DR, BR and archiving, here are some more tips. These tips span server storage I/O networking hardware, software, cloud, virtual, performance, data protection applications and related themes including:

    • Test your data restores, can you read and actually use the data? Is you data decrypted, proper security certificates applied?
    • Remember to back up or protect your security encryption keys, certificates and application settings!
    • Revisit what format your data is being saved in including how will you be able to use data saved to the cloud. Will you be able to do a restore to a cloud server or do you need to make sure a copy of your backup tools are on your cloud server instances?

    Check out these resources and links on server storage I/O performance and benchmarking tools. View more tips and articles here

    Various Industry Events

    EMCworld – May 4-6 2015

    Interop – April 29 2015 (Las Vegas)

    Presenting Smart Shopping for Your Storage Strategy

    NAB – April 14-15 2015

    SNIA DSI Event – April 7-9

    View other recent and upcoming events here

    Webinars

    December 11, 2014 – BrightTalk
    Server & Storage I/O Performance

    December 10, 2014 – BrightTalk
    Server & Storage I/O Decision Making

    December 9, 2014 – BrightTalk
    Virtual Server and Storage Decision Making

    December 3, 2014 – BrightTalk
    Data Protection Modernization

    Videos and Podcasts

    StorageIO podcasts are also available via and at StorageIO.tv

    From StorageIO Labs

    Research, Reviews and Reports

    Datadynamics StorageX
    Datadynamics StorageX

    More than a data mover migration tool, StorageX is a tool for adding management and automation around unstructured local and distributed NAS (NFS, CIFS, DFS) file data. Read more here.

    View other StorageIO lab review reports here

    Recommended Reading List

    This is a new section being introduced in this edition of the Server StorageIO update mentioning various books, websites, blogs, articles, tips, tools, videos, podcasts along with other things I have found interesting and want to share with you.

    • Introducing s3motion (via EMCcode e.g. opensource) a tool for copying buckets and objects between public, private and hybrid clouds (e.g. AWS S3, GCS, Microsoft Azure and others) as well as object storage systems. This is a great tool which I have added to my server storage I/O cloud, virtual and physical toolbox. If you are not familiar with EMCcode check it out to learn more…
    • Running Hadoop on Ubuntu Linux (Series of tutorials) for those who want to get their hands dirty vs. using one of the All In One (AIO) appliances.

    Resources and Links

    Check out these useful links and pages:
    storageio.com/links
    objectstoragecenter.com
    storageioblog.com/data-protection-diaries-main/

    storageperformance.us
    thessdplace.com
    storageio.com/raid
    storageio.com/ssd

    Enjoy this edition of the Server and StorageIO update newsletter and watch for new tips, articles, StorageIO lab report reviews, blog posts, videos and podcasts along with in the news commentary appearing soon.

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Cloud Conversations: AWS S3 Cross Region Replication storage enhancements

    Storage I/O trends

    Cloud Conversations: AWS S3 Cross Region Replication storage enhancements

    Amazon Web Services (AWS) recently among other enhancements announced new Simple Storage Service (S3) cross-region replication of objects from a bucket (e.g. container) in one region to a bucket in another region. AWS also recently enhanced Elastic Block Storage (EBS) increasing maximum performance and size of Provisioned IOPS (SSD) and General Purpose (SSD) volumes. EBS enhancements included ability to store up to 16 TBytes of data in a single volume and do 20,000 input/output operations per second (IOPS). Read more about EBS and other recent AWS server, storage I/O and application enhancements here.

    Amazon Web Services AWS

    The Problem, Issue, Challenge, Opportunity and Need

    The challenge is being able to move data (e.g. objects) stored in AWS buckets in one region to another in a safe, secure, timely, automated, cost-effective way.

    Even though AWS has a global name-space, buckets and their objects (e.g. files, data, videos, images, bit and byte streams) are stored in a specific region designated by the customer or user (AWS S3, EBS, EC2, Glacier, Regions and Availability Zone primer can be found here).

    aws regions architecture

    Understanding the challenge and designing a strategy

    The following diagram shows the challenge and how to copy or replicate objects in an S3 bucket in one region to a destination bucket in a different region. While objects can be copied or replicated without S3 cross-region replication, that involves essentially reading your objects pulling that data out via the internet and then writing to another place. The catch is that this can add extra costs, take time, consume network bandwidth and need extra tools (Cloudberry, Cyberduck, S3fuse, S3motion, S3browser, S3 tools (not AWS) and a long list of others).
    aws cross region replication

    What is AWS S3 Cross-region replication

    Highlights of AWS S3 Cross-region replication include:

    • AWS S3 Cross region replication is as its name implies, replication of S3 objects from a bucket in one region to a destination bucket in another region.
    • S3 replication of new objects added to an existing or new bucket (note new objects get replicated)
    • Policy based replication tied into S3 versioning and life-cycle rules
    • Quick and easy to set up for use in a matter of minutes via S3 dashboard or other interfaces
    • Keeps region to region data replication and movement within AWS networks (potential cost advantage)

    To activate, you simply enable versioning on a bucket, enable cross-region replication, indicate source bucket (or prefix of objects in bucket), specify destination region and target bucket name (or create one), then create or select an IAM (Identify Access Management) role and objects should be replicated.

    • Some AWS S3 cross-region replication things to keep in mind (e.g. considerations):
    • As with other forms of mirroring and replication if you add something on one side it gets replicated to other side
    • As with other forms of mirroring and replication if you deleted something from the other side it can be deleted on both (be careful and do some testing)
    • Keep costs in perspective as you still need to pay for your S3 storage at both locations as well as applicable internal data transfer and GET fees
    • Click here to see current AWS S3 fees for various regions

    S3 Cross-region replication and alternative approaches

    There are several regions around the world and up until today AWS customers could copy, sync or replicate S3 bucket contents between AWS regions manually (or via automation) using various tools such as Cloudberry, Cyberduck, S3browser and S3motion to name just a few as well as via various gateways and other technologies. Some of those tools and technologies are open-source or free, some are freemium and some are premium for a few that also vary by interface (some with GUI, others with CLI or APIs) including ability to mount an S3 bucket as a local network drive and use tools to sync or copy.

    However a catch with the above mentioned tools (among others) and approaches is that to replicate your data (e.g. objects in a bucket) can involve other AWS S3 fees. For example reading data (e.g. a GET which has a fee) from one AWS region and then copying out to the internet has fees. Likewise when copying data into another AWS S3 region (e.g. a PUT which are free) there is also the cost of storage at the destination.

    Storage I/O trends

    AWS S3 cross-region hands on experience (first look)

    For my first hands on (first look) experience with AWS cross-region replication today I enabled a bucket in the US Standard region (e.g. Northern Virginia) and created a new target destination bucket in the EU Ireland. Setup and configuration was very quick, literally just a few minutes with most of the time spent reading the text on the new AWS S3 dashboard properties configuration displays.

    I selected an existing test bucket to replicate and noticed that nothing had replicated over to the other bucket until I realized that new objects would be replicated. Once some new objects were added to the source bucket within a matter of moments (e.g. few minutes) they appeared across the pond in my EU Ireland bucket. When I deleted those replicated objects from my EU Ireland bucket and switched back to my view of the source bucket in the US, those new objects were already deleted from the source. Yes, just like regular mirroring or replication, pay attention to how you have things configured (e.g. synchronized vs. contribute vs. echo of changes etc.).

    While I was not able to do a solid quantifiable performance test, simply based on some quick copies and my network speed moving via S3 cross-region replication was faster than using something like s3motion with my server in the middle.

    It also appears from some initial testing today that a benefit of AWS S3 cross-region replication (besides being bundled and part of AWS) is that some fees to pull data out of AWS and transfer out via the internet can be avoided.

    Amazon Web Services AWS

    Where to learn more

    Here are some links to learn more about AWS S3 and related topics

    What this all means and wrap-up

    For those who are looking for a way to streamline replicating data (e.g. objects) from an AWS bucket in one region with a bucket in a different region you now have a new option. There are potential cost savings if that is your goal along with performance benefits in addition to using what ever might be working in your environment. Replicating objects provides a way of expanding your business continuance (BC), business resiliency (BR) and disaster recovery (DR) involving S3 across regions as well as a means for content cache or distribution among other possible uses.

    Overall, I like this ability for moving S3 objects within AWS, however I will continue to use other tools such as S3motion and s3sfs for moving data in and out of AWS as well as among other public cloud serves and local resources.

    Ok, nuff said, for now..

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Cloud conversations: If focused on cost you might miss other cloud storage benefits

    Storage I/O trends

    Cloud conversations: If focused on cost you might miss other cloud storage benefits

    Drew Robb (@robbdrew) has a good piece (e.g. article) over at InfoStor titled Eight Ways to Avoid Cloud Storage Pricing Surprises that you can read here.

    Drew start’s his piece out with this nice analogy or story:

    Let’s begin with a cautionary tale about pricing: a friend hired a moving company as they quoted a very attractive price for a complex move. They lured her in with a low-ball price then added more and more “extras” to the point where their price ended up higher than many of the other bids she passed up. And to make matters worse, they are already two weeks late with delivery of the furniture and are saying it might take another two weeks.

    Drew extends his example in his piece to compare how some cloud providers may start with pricing as low as some amount only for the customer to be surprised when they did not do their homework to learn about the various fees.

    Note that most reputable cloud providers do not hide their fees even though there are myths that all cloud vendors have hidden fees, instead they list what those costs are on their sites. However that means the smart shopper or person procuring cloud services needs to go look for those fee’s and what they mean to avoid surprises. On the other hand if you can not find what extra fee’s would be along with what is or is not included in a cloud service price, to quote Jenny’s line in the movie Forest Gump, "…Run, Forest! Run!…".

    In Drew’s piece he mentions five general areas to keep an eye on pertaining cloud storage costs including:

    • Be Duly Diligent
    • Trace Out Application Interaction
    • Avoid Fixed Usage Rates
    • Beware Lowballing
    • Demand Enterprise Visibility

    Beware Lowballing

    In Drew’s piece, he includes a comment from myself shown below.

    Just as in the moving business, lowballing is alive and well in cloud pricing. Greg Schulz, an analyst with StorageIO Group, warned users to pay attention to services that have very low-cost per GByte/TByte yet have extra fees and charges for use, activity or place service caps. Compare those with other services that have higher base fees and attempt to price it based on your real storage and usage patterns.

    “Watch out for usage and activity fees with lower cost services where you may get charged for looking at or visiting your data, not to mention for when you actually need to use it,” said Schulz. “Also be aware of limits or caps on performance that may apply to a particular class of service.”

    As a follow-up to Drew’s good article, I put together the following thoughts that appeared earlier this year over at InfoStor titled Cloud storage: Is It All About Cost? that you can read here. In that article I start out with the basic question of:

    So what is your take on cloud storage, and in what context?

    Is cloud storage all about removing cost, cost cutting, free storage?

    Or perhaps even getting something else in addition to free storage?

    I routinely talk with different people from various backgrounds, environments from around the world, and the one consistency I hear when it comes to cloud services including storage is that there is no consistency.

    What I mean by this is that there are the cloud crowd cheerleaders who view or cheer for anything cloud related, some of them actually use the cloud vs. simply cheering.

    What does this have to do with cloud costs

    Simple, how do you know if cloud is cheaper or more expensive if you do not know your own costs?

    How do you know if cloud storage is available, reliable, durable if you do not have a handle on your environment?

    Are you making apples to oranges comparisons or simple trading or leveraging hype and fud for or against?

    Similar to regular storage, how you choose to use and configure on-site traditional storage for high-availability, performance, security among other best practices should be applied to cloud solutions. After all, only you can prevent cloud (or on premise) data loss, granted it is a shared responsibility. Shared responsibility means your service provider or system vendor needs to deliver quality robust solution that you can then take responsibility for configure to use with resiliency.

    For some of you perhaps cloud might be about lowering, reducing or cutting storage costs, perhaps even getting some other service(s) in addition to free storage.

    On the other hand, some of you might be

    Yet another class of cloud storage (e.g. AWS EBS) are those intended or optimized to be accessed from within a cloud via cloud servers or compute instances (e.g. AWS EC2 among others) vs. those that are optimized for both inside the cloud as well as outside the cloud access (e.g. AWS S3 or Glacier with costs shown here). I am using AWS examples; however, you could use Microsoft Azure (pricing shown here), Google (including their new Nearline service with costs shown here), Rackspace, (calculator here or other cloud files pricing here), HP Cloud (costs shown here), IBM Softlayer (object storage costs here) and many others.

    Not all types of cloud storage are the same, which is similar to traditional storage you may be using or have used in your environment in the past. For example, there is high-capacity low-cost storage, including magnetic tape for data protection, archiving of in-active data along with near-line hard disk drives (HDD). There are different types of HDDs, as well as fast solid-state devices (SSD) along with hybrid or SSHD storage used for different purposes. This is where some would say the topic of cloud storage is highly complex.

    Where to learn more

    Data Protection Diaries
    Cloud Conversations: AWS overview and primer)
    Only you can prevent cloud data loss
    Is Computer Data Storage Complex? It Depends
    Eight Ways to Avoid Cloud Storage Pricing Surprises
    Cloud and Object Storage Center
    Cloud Storage: Is It All About Cost?
    Cloud conversations: Gaining cloud confidence from insights into AWS outages (Part II)
    Given outages, are you concerned with the security of the cloud?
    Is the cost of cloud storage really cheaper than traditional storage?
    Are more than five nines of availability really possible?
    What should I look for in an enterprise file sync-and-share app?
    How do primary storage clouds and cloud for backup differ?
    What should I consider when using SSD cloud?
    What’s most important to know about my cloud privacy policy?
    Data Archiving: Life Beyond Compliance
    My copies were corrupted: The 3-2-1 rule
    Take a 4-3-2-1 approach to backing up data

    What this means

    In my opinion there are cheap clouds (products, services, solutions) and there are low-cost options as well as there are value and premium offerings. Avoid confusing value with cheap or low-cost as something might have a higher cost, however including more capabilities or fees included that if useful can be more value. Look beyond the up-front cost aspects of clouds also considering ongoing recurring fees for actually using a server or solution.

    If you can find low-cost storage at or below a penny per GByte per month that could be a good value if it also includes many free access, retrieval GETS head and lists for management or reporting. On the other hand, if you find a service that is at or below a penny per GByte per month however charges for any access including retrieval, as well as network bandwidth fees along with reporting, that might not be as good of a value.

    Look beyond the basic price and watch out for statements like "…as low as…" to understand what is required to get that "..as low as.." price. Also understand what the extra fee’s are which most of the reputable providers list these on their sites, granted you have to look for them. If you are already using cloud services, pay attention to your monthly invoices and track what you are paying for to avoid surprises.

    From my InfoStor piece:

    For cloud storage, instead of simply focusing on lowest cost of storage per capacity, look for value, along with ability to configure or use with as much resiliency as you need. Value will mean different things depending on your needs and cloud storage servers, yet the solution should be cost-effective with availability including durability, secure and applicable performance.

    Shopping for cloud servers and storage is similar to acquiring regular servers and storage in that you need to understand what you are acquiring along with up-front and recurring fee’s to understand the total cost of ownership and cost of operations not to mention making apples to apples vs. apples to oranges comparisons.

    Btw, instead of simply using lower cost cloud services to cut cost, why not also use those capabilities to create or park another copy of your important data somewhere else just to be safe…

    What say you about cloud costs?

    Ok, nuff said, for now…

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    How to test your HDD SSD or all flash array (AFA) storage fundamentals

    How to test your HDD SSD AFA Hybrid or cloud storage

    server storage data infrastructure i/o hdd ssd all flash array afa fundamentals

    Updated 2/14/2018

    Over at BizTech Magazine I have a new article 4 Ways to Performance Test Your New HDD or SSD that provides a quick guide to verifying or learning what the speed characteristic of your new storage device are capable of.

    An out-take from the article used by BizTech as a "tease" is:

    These four steps will help you evaluate new storage drives. And … psst … we included the metrics that matter.

    Building off the basics, server storage I/O benchmark fundamentals

    The four basic steps in the article are:

    • Plan what and how you are going to test (what’s applicable for you)
    • Decide on a benchmarking tool (learn about various tools here)
    • Test the test (find bugs, errors before a long running test)
    • Focus on metrics that matter (what’s important for your environment)

    Server Storage I/O performance

    Where To Learn More

    View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    To some the above (read the full article here) may seem like common sense tips and things everybody should know otoh there are many people who are new to servers storage I/O networking hardware software cloud virtual along with various applications, not to mention different tools.

    Thus the above is a refresher for some (e.g. Dejavu) while for others it might be new and revolutionary or simply helpful. Interested in HDD’s, SSD’s as well as other server storage I/O performance along with benchmarking tools, techniques and trends check out the collection of links here (Server and Storage I/O Benchmarking and Performance Resources).

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    I/O, I/O how well do you know good bad ugly server storage I/O iops?

    How well do you know good bad ugly I/O iops?

    server storage i/o iops activity data infrastructure trends

    Updated 2/10/2018

    There are many different types of server storage I/O iops associated with various environments, applications and workloads. Some I/Os activity are iops, others are transactions per second (TPS), files or messages per time (hour, minute, second), gets, puts or other operations. The best IO is one you do not have to do.

    What about all the cloud, virtual, software defined and legacy based application that still need to do I/O?

    If no IO operation is the best IO, then the second best IO is the one that can be done as close to the application and processor as possible with the best locality of reference.

    Also keep in mind that aggregation (e.g. consolidation) can cause aggravation (server storage I/O performance bottlenecks).

    aggregation causes aggravation
    Example of aggregation (consolidation) causing aggravation (server storage i/o blender bottlenecks)

    And the third best?

    It’s the one that can be done in less time or at least cost or effect to the requesting application, which means moving further down the memory and storage stack.

    solving server storage i/o blender and other bottlenecks
    Leveraging flash SSD and cache technologies to find and fix server storage I/O bottlenecks

    On the other hand, any IOP regardless of if for block, file or object storage that involves some context is better than those without, particular involving metrics that matter (here, here and here [webinar] )

    Server Storage I/O optimization and effectiveness

    The problem with IO’s is that they are a basic operations to get data into and out of a computer or processor, so there’s no way to avoid all of them, unless you have a very large budget. Even if you have a large budget that can afford an all flash SSD solution, you may still meet bottlenecks or other barriers.

    IO’s require CPU or processor time and memory to set up and then process the results as well as IO and networking resources to move data too their destination or retrieve them from where they are stored. While IO’s cannot be eliminated, their impact can be greatly improved or optimized by, among other techniques, doing fewer of them via caching and by grouping reads or writes (pre-fetch, write-behind).

    server storage I/O STI and SUT

    Think of it this way: Instead of going on multiple errands, sometimes you can group multiple destinations together making for a shorter, more efficient trip. However, that optimization may also mean your drive will take longer. So, sometimes it makes sense to go on a couple of quick, short, low-latency trips instead of one larger one that takes half a day even as it accomplishes many tasks. Of course, how far you have to go on those trips (i.e., their locality) makes a difference about how many you can do in a given amount of time.

    Locality of reference (or proximity)

    What is locality of reference?

    This refers to how close (i.e., its place) data exists to where it is needed (being referenced) for use. For example, the best locality of reference in a computer would be registers in the processor core, ready to be acted on immediately. This would be followed by levels 1, 2, and 3 (L1, L2, and L3) onboard caches, followed by main memory, or DRAM. After that comes solid-state memory typically NAND flash either on PCIe cards or accessible on a direct attached storage (DAS), SAN, or NAS device. 

    server storage I/O locality of reference

    Even though a PCIe NAND flash card is close to the processor, there still remains the overhead of traversing the PCIe bus and associated drivers. To help offset that impact, PCIe cards use DRAM as cache or buffers for data along with meta or control information to further optimize and improve locality of reference. In other words, this information is used to help with cache hits, cache use, and cache effectiveness vs. simply boosting cache use.

    SSD to the rescue?

    What can you do the cut the impact of IO’s?

    There are many steps one can take, starting with establishing baseline performance and availability metrics.

    The metrics that matter include IOP’s, latency, bandwidth, and availability. Then, leverage metrics to gain insight into your application’s performance.

    Understand that IO’s are a fact of applications doing work (storing, retrieving, managing data) no matter whether systems are virtual, physical, or running up in the cloud. But it’s important to understand just what a bad IO is, along with its impact on performance. Try to identify those that are bad, and then find and fix the problem, either with software, application, or database changes. Perhaps you need to throw more software caching tools, hypervisors, or hardware at the problem. Hardware may include faster processors with more DRAM and faster internal busses.

    Leveraging local PCIe flash SSD cards for caching or as targets is another option.

    You may want to use storage systems or appliances that rely on intelligent caching and storage optimization capabilities to help with performance, availability, and capacity.

    Where to gain insight into your server storage I/O environment

    There are many tools that you can be used to gain insight into your server storage I/O environment across cloud, virtual, software defined and legacy as well as from different layers (e.g. applications, database, file systems, operating systems, hypervisors, server, storage, I/O networking). Many applications along with databases have either built-in or optional tools from their provider, third-party, or via other sources that can give information about work activity being done. Likewise there are tools to dig down deeper into the various data information infrastructure to see what is happening at the various layers as shown in the following figures.

    application storage I/O performance
    Gaining application and operating system level performance insight via different tools

    windows and linux storage I/O performance
    Insight and awareness via operating system tools on Windows and Linux

    In the above example, Spotlight on Windows (SoW) which you can download for free from Dell here along with Ubuntu utilities are shown, You could also use other tools to look at server storage I/O performance including Windows Perfmon among others.

    vmware server storage I/O
    Hypervisor performance using VMware ESXi / vsphere built-in tools

    vmware server storage I/O performance
    Using Visual ESXtop to dig deeper into virtual server storage I/O performance

    vmware server storage i/o cache
    Gaining insight into virtual server storage I/O cache performance

    Wrap up and summary

    There are many approaches to address (e.g. find and fix) vs. simply move or mask data center and server storage I/O bottlenecks. Having insight and awareness into how your environment along with applications is important to know to focus resources. Also keep in mind that a bit of flash SSD or DRAM cache in the applicable place can go along way while a lot of cache will also cost you cash. Even if you cant eliminate I/Os, look for ways to decrease their impact on your applications and systems.

    Where To Learn More

    View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    >Keep in mind: SSD including flash and DRAM among others are in your future, the question is where, when, with what, how much and whose technology or packaging.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    Revisiting RAID data protection remains relevant resource links

    Revisiting RAID data protection remains relevant and resources

    Storage I/O trends

    Updated 2/10/2018

    RAID data protection remains relevant including erasure codes (EC), local reconstruction codes (LRC) among other technologies. If RAID were really not relevant anymore (e.g. actually dead), why do some people spend so much time trying to convince others that it is dead or to use a different RAID level or enhanced RAID or beyond raid with related advanced approaches?

    When you hear RAID, what comes to mind?

    A legacy monolithic storage system that supports narrow 4, 5 or 6 drive wide stripe sets or a modern system support dozens of drives in a RAID group with different options?

    RAID means many things, likewise there are different implementations (hardware, software, systems, adapters, operating systems) with various functionality, some better than others.

    For example, which of the items in the following figure come to mind, or perhaps are new to your RAID vocabulary?

    RAID questions

    There are Many Variations of RAID Storage some for the enterprise, some for SMB, SOHO or consumer. Some have better performance than others, some have poor performance for example causing extra writes that lead to the perception that all parity based RAID do extra writes (some actually do write gathering and optimization).

    Some hardware and software implementations using WBC (write back cache) mirrored or battery backed-BBU along with being able to group writes together in memory (cache) to do full stripe writes. The result can be fewer back-end writes compared to other systems. Hence, not all RAID implementations in either hardware or software are the same. Likewise, just because a RAID definition shows a particular theoretical implementation approach does not mean all vendors have implemented it in that way.

    RAID is not a replacement for backup rather part of an overall approach to providing data availability and accessibility.

    data protection and durability

    What’s the best RAID level? The one that meets YOUR needs

    There are different RAID levels and implementations (hardware, software, controller, storage system, operating system, adapter among others) for various environments (enterprise, SME, SMB, SOHO, consumer) supporting primary, secondary, tertiary (backup/data protection, archiving).

    RAID comparison
    General RAID comparisons

    Thus one size or approach does fit all solutions, likewise RAID rules of thumbs or guides need context. Context means that a RAID rule or guide for consumer or SOHO or SMB might be different for enterprise and vise versa, not to mention on the type of storage system, number of drives, drive type and capacity among other factors.

    RAID comparison
    General basic RAID comparisons

    Thus the best RAID level is the one that meets your specific needs in your environment. What is best for one environment and application may be different from what is applicable to your needs.

    Key points and RAID considerations include:

    · Not all RAID implementations are the same, some are very much alive and evolving while others are in need of a rest or rewrite. So it is not the technology or techniques that are often the problem, rather how it is implemented and then deployed.

    · It may not be RAID that is dead, rather the solution that uses it, hence if you think a particular storage system, appliance, product or software is old and dead along with its RAID implementation, then just say that product or vendors solution is dead.

    · RAID can be implemented in hardware controllers, adapters or storage systems and appliances as well as via software and those have different features, capabilities or constraints.

    · Long or slow drive rebuilds are a reality with larger disk drives and parity-based approaches; however, you have options on how to balance performance, availability, capacity, and economics.

    · RAID can be single, dual or multiple parity or mirroring-based.

    · Erasure and other coding schemes leverage parity schemes and guess what umbrella parity schemes fall under.

    · RAID may not be cool, sexy or a fun topic and technology to talk about, however many trendy tools, solutions and services actually use some form or variation of RAID as part of their basic building blocks. This is an example of using new and old things in new ways to help each other do more without increasing complexity.

    ·  Even if you are not a fan of RAID and think it is old and dead, at least take a few minutes to learn more about what it is that you do not like to update your dead FUD.

    Wait, Isn’t RAID dead?

    There is some dead marketing that paints a broad picture that RAID is dead to prop up something new, which in some cases may be a derivative variation of parity RAID.

    data dispersal
    Data dispersal and durability

    RAID rebuild improving
    RAID continues to evolve with rapid rebuilds for some systems

    Otoh, there are some specific products, technologies, implementations that may be end of life or actually dead. Likewise what might be dead, dying or simply not in vogue are specific RAID implementations or packaging. Certainly there is a lot of buzz around object storage, cloud storage, forward error correction (FEC) and erasure coding including messages of how they cut RAID. Catch is that some object storage solutions are overlayed on top of lower level file systems that do things such as RAID 6, granted they are out of sight, out of mind.

    RAID comparison
    General RAID parity and erasure code/FEC comparisons

    Then there are advanced parity protection schemes which include FEC and erasure codes that while they are not your traditional RAID levels, they have characteristic including chunking or sharding data, spreading it out over multiple devices with multiple parity (or derivatives of parity) protection.

    Bottom line is that for some environments, different RAID levels may be more applicable and alive than for others.

    Via BizTech – How to Turn Storage Networks into Better Performers

    • Maintain Situational Awareness
    • Design for Performance and Availability
    • Determine Networked Server and Storage Patterns
    • Make Use of Applicable Technologies and Techniques

    If RAID is alive, what to do with it?

    If you are new to RAID, learn more about the past, present and future keeping mind context. Keeping context in mind means that there are different RAID levels and implementations for various environments. Not all RAID 0, 1, 1/0, 10, 2, 3, 4, 5, 6 or other variations (past, present and emerging) are the same for consumer vs. SOHO vs. SMB vs. SME vs. Enterprise, nor are the usage cases. Some need performance for reads, others for writes, some for high-capacity with low performance using hardware or software. RAID Rules of thumb are ok and useful, however keep them in context to what you are doing as well as using.

    What to do next?

    Take some time to learn, ask questions including what to use when, where, why and how as well as if an approach or recommendation are applicable to your needs. Check out the following links to read some extra perspectives about RAID and keep in mind, what might apply to enterprise may not be relevant for consumer or SMB and vise versa.

    Some advise needed on SSD’s and Raid (Via Spiceworks)
    RAID 5 URE Rebuild Means The Sky Is Falling (Via BenchmarkReview)
    Double drive failures in a RAID-10 configuration (Via SearchStorage)
    Industry Trends and Perspectives: RAID Rebuild Rates (Via StorageIOblog)
    RAID, IOPS and IO observations (Via StorageIOBlog)
    RAID Relevance Revisited (Via StorageIOBlog)
    HDDs Are Still Spinning (Rust Never Sleeps) (Via InfoStor)
    When and Where to Use NAND Flash SSD for Virtual Servers (Via TheVirtualizationPractice)
    What’s the best way to learn about RAID storage? (Via Spiceworks)
    Design considerations for the host local FVP architecture (Via Frank Denneman)
    Some basic RAID fundamentals and definitions (Via SearchStorage)
    Can RAID extend nand flash SSD life? (Via StorageIOBlog)
    I/O Performance Issues and Impacts on Time-Sensitive Applications (Via CMG)
    The original RAID white paper (PDF) that while over 20 years old, it provides a basis, foundation and some history by Katz, Gibson, Patterson et al
    Storage Interview Series (Via Infortrend)
    Different RAID methods (Via RAID Recovery Guide)
    A good RAID tutorial (Via TheGeekStuff)
    Basics of RAID explained (Via ZDNet)
    RAID and IOPs (Via VMware Communities)

    Where To Learn More

    View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    What is my favorite or preferred RAID level?

    That depends, for some things its RAID 1, for others RAID 10 yet for others RAID 4, 5, 6 or DP and yet other situations could be a fit for RAID 0 or erasure codes and FEC. Instead of being focused on just one or two RAID levels as the solution for different problems, I prefer to look at the environment (consumer, SOHO, small or large SMB, SME, enterprise), type of usage (primary or secondary or data protection), performance characteristics, reads, writes, type and number of drives among other factors. What might be a fit for one environment would not be a fit for others, thus my preferred RAID level along with where implemented is the one that meets the given situation. However also keep in mind is tying RAID into part of an overall data protection strategy, remember, RAID is not a replacement for backup.

    What this all means

    Like other technologies that have been declared dead for years or decades, aka the Zombie technologies (e.g. dead yet still alive) RAID continues to be used while the technologies evolves. There are specific products, implementations or even RAID levels that have faded away, or are declining in some environments, yet alive in others. RAID and its variations are still alive, however how it is used or deployed in conjunction with other technologies also is evolving.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    SNIA announces Cloud Data Management Initiative (CDMI) V1.1

    SNIA announces Cloud Data Management Initiative (CDMI) V1.1

    In case you missed it, the Storage Networking Industry Association (SNIA) recently released their version 1.1 of its Cloud Data Management Interface (CDMI) specification.

    Highlights of CDMI version 1.1 include:

  • New functionality to ease CDMI implementation with other cloud API’s (e.g. AWS S3, OpenStack Swift, etc.)
  • Expanded cloud data services along with backwards compatible to earlier versions among other enhancements.
  • Check out the full specification here.

    Speaking of SNIA and CDMI, check out this pod cast post of CDMI in a conversation with Wayne Adams and David Dale of SNIA.

    Ok, nuff said

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved