Congratulations to IBM for releasing XIV SPC results

Over the past several years I have done an annual post about IBM and their XIV storage system and this is the fourth in what has become a series. You can read the first one here, the second one here, and last years here and here after the announcement of the IBM V7000.

IBM XIV Gen3
IBM recently announced the generation 3 or Gen3 version of XIV along with releasing for the first time public performance comparison benchmarks using storage performance council (SPC) throughout SPC2 workload.

The XIV Gen3 is positioned by IBM as having up to four (4) times the performance of earlier generations of the storage system. In terms of speeds and feeds, the Gen3 XIV supports up to 180 2TB SAS hard disk drives (HDD) that provides up to 161TB of usable storage space capacity. For connectivity, the Gen3 XIV supports up to 24 8Gb Fibre Channel (8GFC) or for iSCSI 22 1Gb Ethernet (1 GbE) ports with a total of up to 360GBytes of system cache. In addition to the large cache to boost performance, other enhancements include leveraging multi core processors along with an internal InfiniBand  network to connect nodes replacing the former 1 GbE interconnect. Note, InfiniBand is only used to interconnect the various nodes in the XIV cluster and is not used for attachment to applications servers which is handled via iSCSI and Fibre Channel.

IBM and SPC storage performance history
IBM has a strong history if not leading the industry with benchmarking and workload simulation of their storage systems including Storage Performance Council (SPC) among others. The exception for IBM over the past couple of years has been the lack of SPC benchmarks for XIV. Last year when IBM released their new V7000 storage system benchmarks include SPC were available close to if not at the product launch. I have in the past commented about IBMs lack of SPC benchmarks for XIV to confirm their marketing claims given their history of publishing results for all of their other storage systems. Now that IBM has recently released SPC2 results for the XIV it is only fitting then that I compliment them for doing so.

Benchmark brouhaha
Performance workload simulation results can often lead to applies and oranges comparisons or benchmark brouhaha battles or storage performance games. For example a few years back NetApp submitted a SPC performance result on behalf of their competitor EMC. Now to be clear on something, Im not saying that SPC is the best or definitive benchmark or comparison tool for storage or other purpose as it is not. However it is representative and most storage vendors have released some SPC results for their storage systems in addition to TPC and Microsoft ESRP among others. SPC2 is focused on streaming such as video, backup or other throughput centric applications where SPC1 is centered around IOPS or transactional activity. The metrics for SPC2 are Megabytes per second (MBps) for large file processing (LFP), large database query (LDQ) and video on demand delivery (VOD) for a given price and protection level.

What is the best benchmark?
Simple, your own application in as close to as actual workload activity as possible. If that is not possible, then some simulation or workload simulation that closets resembles your needs.

Does this mean that XIV is still relevant?
Yes

Does this mean that XIV G3 should be used for every environment?
Generally speaking no. However its performance enhancements should allow it to be considered for more applications than in the past. Plus with the public comparisons now available, that should help to silence questions (including those from me) about what the systems can really do vs. marketing claims.

How does XIV compare to some other IBM storage systems using SPC2 comparisons?

System
SPC2 MBps
Cost per SPC2
Storage GBytes
Price tested
Discount
Protection
DS5300
5,634.17
$74.13
16,383
417,648
0%
R5
V7000
3,132.87
$71.32
29,914
$223,422
38-39%
R5
XIV G3
7,467.99
$152.34
154,619
1,137,641
63-64%
Mirror
DS8800
9,705.74
$270.38
71,537
2,624,257
40-50%
R5

In the above comparisons, the DS5300 (NetApp/Engenio based) is a dual controller (4GB of cache per controller) with 128 x 146.8GB 15K HDDs configured as RAID 5 with no discount applied to the price submitted. The V7000 system which is based on the IBM SVC along with other enhancements consists of dual controllers each with 8GB of cache and 120 x 10K 300GB HDDs configured as RAID 5 with just under a 40% discount off list price for system tested. For the XIV Gen3 system tested, discount off list price for the submission is about 63% with 15 nodes and a total of 360GB of cache and 180 2TB 7.2K SAS HDDs configured as mirrors. The DS8800 system with dual controllers has a 256GB of cache, 768 x 146GB 15K HDDs configured in RAID5 with a discount between 40 to 50% off of list.

What the various metrics do not show is the benefit of various features and functionality which should be considered to your particular needs. Likewise, if your applications are not centered around bandwidth or throughput, then the above performance comparisons would not be relevant. Also note that the systems above have various discount prices as submitted which can be a hint to a smart shopper where to begin negotiations at. You can also do some analysis of the various systems based on their performance, configuration, physical footprint, functionality and cost plus the links below take you to the complete reports with more information.

DS8800 SPC2 executive summary and full disclosure report

XIV SPC2 executive summary and full disclosure report

DS5300 SPC2 executive summary and full disclosure report

V7000 SPC2 executive summary and full disclosure report

Bottom line, benchmarks and performance comparisons are just that, a comparison that may or may not be relevant to your particular needs. Consequently they should be used as a tool combined with other information to see how a particular solution might be a fit for your specific needs. The best benchmark however is your own application running as close to possible realistic workload to get a representative perspective of a systems capabilities.

Ok, nuff said
Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

StorageIO Momentus Hybrid Hard Disk Drive (HHDD) Moments

This is the third in a series of posts that I have done about Hybrid Hard Disk Drives (HHDDs) along with pieces about Hard Disk Drives (HDD) and Solid State Devices (SSDs). Granted the HDD received its AARP card several years ago when it turned 50 and is routinely declared dead (or read here) even though it continues to evolve along SSD maturing and both expanding into different markets as well as usage roles.

For those who have not read previous posts about Hybrid Hard Disk Drives (HHDDs) and the Seagate Momentus XT you can find them here and here.

Since my last post, I have been using the HHDDs extensively and recently installed the latest firmware. The release of new HHDD firmware by Seagate for the Momentus XT (SD 25) like its predecessor SD24 cleaned up some annoyances and improved on overall stability. Here is a Seagate post by Mark Wojtasiak discussing SD25 and feedback obtained via the Momentus XT forum from customers.

If you have never done a HDD firmware update, its not as bad or intimidating as might be expected. The Seagate firmware update tools make it very easy, that is assuming you have a recent good backup of your data (one that can be restored) and about 10 to 15 minutes of time for a couple of reboots.

Speaking of stability, the Momentus XT HHDDs have been performing well helping to speed up accessing large documents on various projects including those for my new book. Granted an SSD would be faster across the board, however the large capacity at the price point of the HHDD is what makes it a hybrid value proposition. As I have said in previous posts, if you have the need for speed all of the time and time is money, get an SSD. Likewise if you need as much capacity as you can get and performance is not your primary objective, then leverage the high capacity HDDs. On the other hand, if you need a balance of some performance boost with capacity boost and a good value, then check out the HHDDs.

Image of Momentus XT courtesy of www.Seagate.com

Lets shift gears from that of the product or technology to that of common questions that I get asked about HHDDs.

Common questions I get asked about HHDDs include:

What is a Hybrid Hard Disk Drive?

A Hybrid Hard Disk Drive includes a combination of rotating HDD, solid state flash persistent memory along with volatile dynamic random access memory (DRAM) in an integrated package or product. The value proposition and benefit is a balance of performance and capacity at a good price for those environments, systems or applications that do not need all SSD performance (and cost) vs. those that need some performance in addition to large capacity.

How the Seagate Momentus XT differs from other Hybrid Disks?
One approach is to take a traditional HDD and pair it with a SSD using a controller packaged in various ways. For example on a large scale, HDDs and SSDs coexist in the same tiered storage system being managed by the controllers, storage processors or nodes in the solution including automated tiering and cache promotion or demotion. The main difference however between other storage systems, tiering and pairing and HHDDs is that in the case of the Momentus XT the HDD, SLC flash (SSD functionality) and RAM cache and their management are all integrated within the disk drive enclosure.

Do I use SSDs and HDDs or just HHDDs?
I have HHDDs installed internally in my laptops. I also have HDDs which are installed in servers, NAS and disk to disk (D2D) backup devices and Digital Video Recorders (DVRs) along with external SSD and Removable Hard Disk Drives (RHDDs). The RHDDs are used for archive and master or gold copy data protection that go offsite complimenting how I also use cloud backup services as part of my data protection strategy.

What are the technical specifications of a HHDD such as the Seagate Momentus XT?
3Gbs SATA interface, 2.5 inch 500GB 7,200 RPM HDD with 32MB RAM cache and integrated 4GByte SLC flash all managed via internal drive processor. Power consumption varies depending what the device is doing such as initial power up, idle, normal or other operating modes. You can view the Seagate Momentus XT 500GB (ST95005620AS which is what I have) specifications here as well as the product manual here.


One of my HHDDs on a note pad (paper) and other accessories

Do you need a special controller or management software?
Generally speaking no, the HHDD that I have been using plugged and played into my existing laptops internal bay replacing the HDD that came with those systems. No extra software was needed for Windows, no data movement or migration tools needed other than when initially copying from the source HDD to the new HHDD. The HHDD do their own caching, read ahead and write behind independent of the operating system or controller. Now the reason I say generally speaking is that like many devices, some operating systems or controllers may be able to leverage advanced features so check your particular system capabilities.

How come the storage system vendors are not talking about these HHDDs?
Good question which I assume it has a lot to do with the investment (people, time, engineering, money and marketing) that they have or are making in controller and storage system software functionality to effectively create hybrid tiered storage systems using SSD and HDDs on different scales. There have been some packaged HHDD systems or solutions brought to market by different vendors that combine HDD and SSD into a single physical package glued together with some software and controllers or processors to appear as a single system. I would not be surprised to see discrete HHDDs (where the HDD and flash SSD and RAM are all one integrated product) appear in lower end NAS or multifunction storage systems as well as for backup, dedupe or other system that requires large amounts of capacity space and performance boost now and then.

Why do I think this? Simple, say you have five HHDDs each with 500GB of capacity configured as a RAID5 set resulting in 2TByte of capacity. Using as a hypothetical example the Momentus XT yields 5 x 4GByte or 20GByte of flash cache helps accelerate write operations during data dumps, backup or other updates. Granted that is an overly simplified example and storage systems can be found with hundreds of GByte of cache, however think in terms of value or low cost balancing performance and capacity to cost for different usage scenarios. For example, applications such as bulk or scale out file and object storage including cloud or big data, entertainment, Server (Citrix/Xen, Microsoft/HyperV, VMware/vSphere) and Desktop virtualization or VDI, Disk to Disk (D2D) backup, business analytics among others. The common tenets of those applications and usage scenario is a combination of I/O and storage consolidation in a cost effective manner addressing the continuing storage capacity to I/O performance gap.

Data Center and I/O Bottlenecks

Storage and I/O performance gap

Do you have to backup HHDDs?
Yes, just as you would want to backup or protect any SSD or HHD device or system.

How does data get moved between the SSD and the HDD?
Other than the initial data migration from the old HDD (or SSD) to the HHDD, unless you are starting with a new system, once your data and applications exist on the HHDD, it automatically via the internal process of the device manages the RAM, flash and HDD activity. Unlike in a tiered storage system where data blocks or files may be moved between different types of storage devices, inside the HHDD, all data gets written to the HDD, however the flash and RAM are used as buffers for caching depending on activity needs. If you have sat through or listened to a NetApp or HDS use of cache for tiering discussion what the HHDDs do is similar in concept however on a smaller scale at the device level, potentially even in a complimentary mode in the future? Other functions performed inside the HHDD by its processor includes reading and writing, managing the caches, bad block replacement or re vectoring on the HDD, wear leveling of the SLC flash and other routine tasks such as integrity checks and diagnostics. Unlike paired storage solutions where data gets moved between tiers or types of devices, once data is stored in the HHDD, it is managed by the device similar to how a SSD or HDD would move blocks of data to and from the specific media along with leveraging RAM cache as a buffer.

Where is the controller that manages the SSD and HDD?
The HHDD itself is the controller per say in that the internal processor that manages the HDD also directly access the RAM and flash.

What type of flash is used and will it wear out?
The XT uses SLC (single level cell) flash which with wear leveling has a good duty cycle (life span) and is what is typically found in higher end flash SSD solutions vs. lower cost MLC (multi level cell)

Have I lost any data from it yet?
No, at least nothing that was not my own fault from saving the wrong file in the wrong place and having to recover from one of my recent D2D copies or the cloud. Oh, regarding what have I done with the HDDs that were replaced by the HHDDs? They are now an extra gold master backup copy as of a particular point in time and are being kept in a safe secure facility, encrypted of course.

Have you noticed a performance improvement?
Yes, performance will vary however in many cases I have seen performance comparable to SSD on both reads and writes as long as the HDDs keep up with the flash and RAM cache. Even as larger amounts of data are written, I have seen better performance than compared to HDDs. The caveat however is that initially you may see little to marginal performance improvement however over time, particularly on the same files, performance tends to improve. Working on large tens to hundreds of MByte size documents I noticed good performance when doing saves compared to working with them on a HDD.

What do the HHDDs cost?
Amazon.com has the 500GB model for about $100 which is about $40 to $50 less than when I bought my most recent one last fall. I have heard from other people that you can find them at even lower prices at other venues. In the theme of disclosures, I bought one of my HHDDs from Amazon and Seagate gave me one to test.

Will I buy more HHDDs or switch to SSDs?
Where applicable I will add SSDs as well as HDDs, however where possible and practical, I will also add HHDDs perhaps even replacing the HDDs in my NAS system with HHDDs at some time or maybe trying them in a DVR.

What is the down side to the HHDDs?
Im generating and saving more data on the devices at a faster rate which means that when I installed them I was wondering if I would ever fill up a 500GB drive. I still have hundreds of GBytes free or available for use, however I also am able to cary more reference data or information than in the past. In addition to more reference data including videos, audio, images, slide decks and other content, I have also been able to keep more versions or copies of documents which has been handy on the book project. Data that changes gets backed up D2D as well as to my cloud provider including while traveling. Leveraging compression and dedupe, given that many chapters or other content are similar, not as much data actually gets transmitted when doing cloud backups which has been handy when doing a backup from a airplane flying over the clouds. A wish for the XT type of HHDD that I have is for vendors such as Seagate to add Self Encrypting Disk (SED) capabilities to them along with applying continued intelligent power management (IPM) enhancements.

Why do I like the HHDD?
Simple, it solves both business and technology challenges while being an enabler, it gives me a balance of performance for productivity and capacity in a cost effective manner while being transparent to the systems it works with.

Here are some related links to additional material:
Data Center I/O Bottlenecks Performance Issues and Impacts
Has SSD put Hard Disk Drives (HDDs) On Endangered Species List?
Seagate Momentus XT SD 25 firmware
Seagate Momentus XT SD25 firmware update coming this week
A Storage I/O Momentus Moment
Another StorageIO Hybrid Momentus Moment
As the Hard Disk Drive (HDD) continues to spin
Has SSD put Hard Disk Drives (HDDs) On Endangered Species List?
Funeral for a Friend
As the Hard Disk Drive (HDD) continues to spin
Seagate Momentus XT product specifications
Seagate Momentus XT product manual
Technology Tiering, Servers Storage and Snow Removal
Self Encrypting Disks (SEDs)

Ok, nuff said for now

Cheers Gs

Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

As the Hard Disk Drive HDD continues to spin

As the Hard Disk Drive HDD continues to spin

server storage data infrastructure i/o iop hdd ssd trends

Updated 2/10/2018

Despite having been repeatedly declared dead at the hands of some new emerging technology over the past several decades, the Hard Disk Drive (HDD) continues to spin and evolve as it moves towards its 60th birthday.

More recently HDDs have been declared dead due to flash SSD that according to some predictions, should have caused the HDD to be extinct by now.

Meanwhile, having not yet died in addition to having qualified for its AARP membership a few years ago, the HDD continues to evolve in capacity, smaller form factor, performance, reliability, density along with cost improvements.

Back in 2006 I did an article titled Happy 50th, hard drive, but will you make it to 60?

IMHO it is safe to say that the HDD will be around for at least a few more years if not another decade (or more).

This is not to say that the HDD has outlived its usefulness or that there are not other tiered storage mediums to do specific jobs or tasks better (there are).

Instead, the HDD continues to evolve and is complimented by flash SSD in a way that HDDs are complimenting magnetic tape (another declared dead technology) each finding new roles to support more data being stored for longer periods of time.

After all, there is no such thing as a data or information recession!

What the importance of this is about technology tiering and resource alignment, matching the applicable technology to the task at hand.

Technology tiering (Servers, storage, networking, snow removal) is about aligning the applicable resource that is best suited to a particular need in a cost as well as productive manner. The HDD remains a viable tiered storage medium that continues to evolve while taking on new roles coexisting with SSD and tape along with cloud resources. These and other technologies have their place which ideally is finding or expanding into new markets instead of simply trying to cannibalize each other for market share.

Here is a link to a good story by Lucas Mearian on the history or evolution of the hard disk drive (HDD) including how a 1TB device that costs about $60 today would have cost about a trillion dollars back in the 1950s. FWIW, IMHO the 1 trillion dollars is low and should be more around 2 to 5 trillion for the one TByte if you apply common costs for management, people, care and feeding, power, cooling, backup, BC, DR and other functions.

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

IMHO, it is safe to say that the HDD is here to stay for at least a few more years (if not decades) or at least until someone decides to try a new creative marketing approach by declaring it dead (again).

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

Updated 10/9/2018

What is DFR or Data Footprint Reduction?

Data Footprint Reduction (DFR) is a collection of techniques, technologies, tools and best practices that are used to address data growth management challenges. Dedupe is currently the industry darling for DFR particularly in the scope or context of backup or other repetitive data.

However DFR expands the scope of expanding data footprints and their impact to cover primary, secondary along with offline data that ranges from high performance to inactive high capacity.

Consequently the focus of DFR is not just on reduction ratios, its also about meeting time or performance rates and data protection windows.

This means DFR is about using the right tool for the task at hand to effectively meet business needs, and cost objectives while meeting service requirements across all applications.

Examples of DFR technologies include Archiving, Compression, Dedupe, Data Management and Thin Provisioning among others.

Read more about DFR in Part I and Part II of a two part series found here and here.

Where to learn more

Learn more about data footprint reducton (DFR), data footprint overhead and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

That is all for now, hope you find these ongoing series of current or emerging Industry Trends and Perspectives posts of interest.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

SSD and Storage System Performance

Jacob Gsoedl has a new article over at SearchStorage titled How to add solidstate storage to your enterprise data storage systems.

In his article which includes some commentary by me, Jacob lays out various options on where and how to deploy solid state devices (SSD) in and with enterprise storage systems.

While many vendors have jumped on the latest SSD bandwagon adding flash based devices to storage systems, where and how they implement the technologies varies.

Some vendors take a simplistic approach of qualify flash SSD devices for attachment to their storage controllers similar to how any other Fibre Channel, SAS or SATA hard disk drive (HDD) would be.

Yet others take a more in depth approach including optimizing controller software, firmware or micro code to leverage flash SSD devices along with addressing wear leveling, read and write performance among other capabilities.

Performance is another area where on paper a flash SSD device might appear to be fast and enable a storage system to be faster.

However, systems that are not optimized for higher throughput and or increased IOPs needing lower latency may end up placing restrictions on the number of flash SSD devices or other configuration constraints. Even worse is when expected performance improvements are not realized as after all, fast controllers need fast devices, and fast devices need fast controllers.

RAM and flash based SSD are great enabling technologies for boosting performance, productivity and enabling a green efficient environment however do your homework.

Look at how various vendors implement and support SSD particularly flash based products with enhancements to storage controllers for optimal performance.

Likewise check out the activity of  the SNIA Solid State Storage Initiative (SSSI) among other industry trade group or vendor initiatives around enhancing along with best practices for SSD.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

How to win approval for upgrades: Link them to business benefits

Drew Rob has another good article over at Processor.com about various tips and strategies on how to gain approval for hardware (or software) purchases with some comments by yours truly.

My tips and advice that are quoted in the story include to link technology resources to business needs impact which may be common sense, however still a time tested effective technique.

Instead of speaking tech talk such as Performance, capacity, availability, IOPS, bandwidth, GHz, frames or packets per second, VMs to PM or dedupe ratio, map them to business speak, that is things that finance, accountants, MBAs or other management personal understand.

For example, how many transactions at a given response time can be supported by a given type of server, storage or networking device.

Or, put a different way, with a given device, how much work can be done and what is the associated monetary or business benefit.

Likewise, if you do not have a capacity plan for servers, storage, I/O and networking along with software and facilities covering performance, availability, capacity and energy demands now is the time to put one in place.

More on capacity and performance planning later, however for now, if you want to learn more, check Chapter 10 (Performance and Capacity Planning) in my book Resilient Storage Networks: Designing Flexible and Scalable Data Infrastructure: Elsevier).

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Optimize Data Storage for Performance and Capacity Efficiency

This post builds on a recent article I did that can be read here.

Even with tough economic times, there is no such thing as a data recession! Thus the importance of optimizing data storage efficiency addressing both performance and capacity without impacting availability in a cost effective way to do more with what you have.

What this means is that even though budgets are tight or have been cut resulting in reduced spending, overall net storage capacity is up year over year by double digits if not higher in some environments.

Consequently, there is continued focus on stretching available IT and storage related resources or footprints further while eliminating barriers or constraints. IT footprint constraints can be physical in a cabinet or rack as well as floorspace, power or cooling thresholds and budget among others.

Constraints can be due to lack of performance (bandwidth, IOPS or transactions), poor response time or lack of availability for some environments. Yet for other environments, constraints can be lack of capacity, limited primary or standby power or cooling constraints. Other constraints include budget, staffing or lack of infrastructure resource management (IRM) tools and time for routine tasks.

Look before you leap
Before jumping into an optimization effort, gain insight if you do not already have it as to where the bottlenecks exist, along with the cause and effect of moving or reconfiguring storage resources. For example, boosting capacity use to more fully use storage resources can result in a performance issue or data center bottlenecks for other environments.

An alternative scenario is that in the quest to boost performance, storage is seen as being under-utilized, yet when capacity use is increased, low and behold, response time deteriorates. The result can be a vicious cycle hence the need to address the issue as opposed to moving problems by using tools to gain insight on resource usage, both space and activity or performance.

Gaining insight means looking at capacity use along with performance and availability activity and how they use power, cooling and floor-space. Consequently an important tool is to gain insight and knowledge of how your resources are being used to deliver various levels of service.

Tools include storage or system resource management (SRM) tools that report on storage space capacity usage, performance and availability with some tools now adding energy usage metrics along with storage or system resource analysis (SRA) tools.

Cooling Off
Power and cooling are commonly talked about as constraints, either from a cost standpoint, or availability of primary or secondary (e.g. standby) energy and cooling capacity to support growth. Electricity is essential for powering IT equipment including storage enabling devices to do their specific tasks of storing data, moving data, processing data or a combination of these attributes.

Thus, power gets consumed, some work or effort to move and store data takes place and the by product is heat that needs to be removed. In a typical IT data center, cooling on average can account for about 50% of energy used with some sites using less.

With cooling being a large consumer of electricity, a small percentage change to how cooling consumes energy can yield large results. Addressing cooling energy consumption can be to discuss budget or cost issues, or to enable cooling capacity to be freed up to support installation of extra storage or other IT equipment.

Keep in mind that effective cooling relies on removing heat from as close to the source as possible to avoid over cooling which requires more energy. If you have not done so, have a facilities review or assessment performed that can range from a quick walk around, to a more in-depth review and thermal airflow analysis. A means of removing heat close to the sort are techniques such as intelligent, precision or smart cooling also known by other marketing names.

Powering Up, or, Powering Down
Speaking of energy or power, in addition to addressing cooling, there are a couple of ways of addressing power consumption by storage equipment (Figure 1). The most popular discussed approach towards efficiency is energy avoidance involving powering down storage when not used such as first generation MAID at the cost of performance.

For off-line storage, tape and other removable media give low-cost capacity per watt with low to no energy needed when not in use. Second generation (e.g. MAID 2.0) solutions with intelligent power management (IPM) capabilities have become more prevalent enabling performance or energy savings on a more granular or selective basis often as a standard feature in common storage systems.

GreenOptionsBalance
Figure 1:  How various RAID levels and configuration impact or benefit footprint constraints

Another approach to energy efficiency is seen in figure 1 which is doing more work for active applications per watt of energy to boost productivity. This can be done by using same amount of energy however doing more work, or, same amount of work with less energy.

For example instead of using larger capacity disks to improve capacity per watt metrics, active or performance sensitive storage should be looked at on an activity basis such as IOP, transactions, videos, emails or throughput per watt. Hence, a fast disk drive doing work can be more energy-efficient in terms of productivity than a higher capacity slower disk drive for active workloads, where for idle or inactive, the inverse should hold true.

On a go forward basis the trend already being seen with some servers and storage systems is to do both more work, while using less energy. Thus a larger gap between useful work (for active or non idle storage) and amount of energy consumed yields a better efficiency rating, or, take the inverse if that is your preference for smaller numbers.

Reducing Data Footprint Impact
Data footprint impact reduction tools or techniques for both on-line as well as off-line storage include archiving, data management, compression, deduplication, space-saving snapshots, thin provisioning along with different RAID levels among other approaches. From a storage access standpoint, you can also include bandwidth optimization, data replication optimization, protocol optimizers along with other network technologies including WAFS/WAAS/WADM to help improve efficiency of data movement or access.

Thin provisioning for capacity centric environments can be used to achieving a higher effective storage use level by essentially over booking storage similar to how airlines oversell seats on a flight. If you have good historical information and insight into how storage capacity is used and over allocated, thin provisioning enables improved effective storage use to occur for some applications.

However, with thin provisioning, avoid introducing performance bottlenecks by leveraging solutions that work closely with tools that providing historical trending information (capacity and performance).

For a technology that some have tried to declare as being dead to prop other new or emerging solutions, RAID remains relevant given its widespread deployment and transparent reliance in organizations of all size. RAID also plays a role in storage performance, availability, capacity and energy constraints as well as a relief tool.

The trick is to align the applicable RAID configuration to the task at hand meeting specific performance, availability, capacity or energy along with economic requirements. For some environments a one size fits all approach may be used while others may configure storage using different RAID levels along with number of drives in RAID sets to meet specific requirements.


Figure 2:  How various RAID levels and configuration impact or benefit footprint constraints

Figure 2 shows a summary and tradeoffs of various RAID levels. In addition to the RAID levels, how many disks can also have an impact on performance or capacity, such as, by creating a larger RAID 5 or RAID 6 group, the parity overhead can be spread out, however there is a tradeoff. Tradeoffs can be performance bottlenecks on writes or during drive rebuilds along with potential exposure to drive failures.

All of this comes back to a balancing act to align to your specific needs as some will go with a RAID 10 stripe and mirror to avoid risks, even going so far as to do triple mirroring along with replication. On the other hand, some will go with RAID 5 or RAID 6 to meet cost or availability requirements, or, some I have talked with even run RAID 0 for data and applications that need the raw speed, yet can be restored rapidly from some other medium.

Lets bring it all together with an example
Figure 3 shows a generic example of a before and after optimization for a mixed workload environment, granted you can increase or decrease the applicable capacity and performance to meet your specific needs. In figure 3, the storage configuration consists of one storage system setup for high performance (left) and another for high-capacity secondary (right), disk to disk backup and other near-line needs, again, you can scale the approach up or down to your specific need.

For the performance side (left), 192 x 146GB 15K RPM (28TB raw) disks provide good performance, however with low capacity use. This translates into a low capacity per watt value however with reasonable IOPs per watt and some performance hot spots.

On the capacity centric side (right), there are 192 x 1TB disks (192TB raw) with good space utilization, however some performance hot spots or bottlenecks, constrained growth not to mention low IOPS per watt with reasonable capacity per watt. In the before scenario, the joint energy use (both arrays) is about 15 kWh or 15,000 watts which translates to about $16,000 annual energy costs (cooling excluded) assuming energy cost of 12 cents per kWh.

Note, your specific performance, availability, capacity and energy mileage will vary based on particular vendor solution, configuration along with your application characteristics.


Figure 3: Baseline before and after storage optimization (raw hardware) example

Building on the example in figure 3, a combination of techniques along with technologies yields a net performance, capacity and perhaps feature functionality (depends on specific solution) increase. In addition, floor-space, power, cooling and associated footprints are also reduced. For example, the resulting solution shown (middle) comprises 4 x 250GB flash SSD devices, along with 32 x 450GB 15.5K RPM and 124 x 2TB 7200RPM enabling an 53TB (raw) capacity increase along with performance boost.

The previous example are based on raw or baseline capacity metrics meaning that further optimization techniques should yield improved benefits. These examples should also help to discuss the question or myth that it costs more to power storage than to buy it which the answer should be it depends.

If you can buy the above solution for say under $50,000 (cost to power), or, let alone, $100,000 (power and cool) for three years which would also be a good acquisition, then the myth of buying is more expensive than powering holds true. However, if a solution as described above costs more, than the story changes along with other variables include energy costs for your particular location re-enforcing the notion that your mileage will vary.

Another tip is that more is not always better.

That is, more disks, ports, processors, controllers or cache do not always equate into better performance. Performance is the sum of how those and other pieces working together in a demonstrable way, ideally your specific application workload compared to what is on a product data sheet.

Additional general tips include:

  • Align the applicable tool, technique or technology to task at hand
  • Look to optimize for both performance and capacity, active and idle storage
  • Consolidated applications and servers need fast servers
  • Fast servers need fast I/O and storage devices to avoid bottlenecks
  • For active storage use an activity per watt metric such as IOP or transaction per watt
  • For in-active or idle storage, a capacity per watt per footprint metric would apply
  • Gain insight and control of how storage resources are used to meet service requirements

It should go without saying, however sometimes what is understood needs to be restated.

In the quest to become more efficient and optimized, avoid introducing performance, quality of service or availability issues by moving problems.

Likewise, look beyond storage space capacity also considering performance as applicable to become efficient.

Finally, it is all relative in that what might be applicable to one environment or application need may not apply to another.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

SPC and Storage Benchmarking Games

Storage I/O trends

There is a post over in one of the LinkedIn Discussion forums about storage performance council (SPC) benchmarks being miss-leading that I just did a short response post to. Here’s the full post as LinkedIn has a short post response limit.

While the SPC is far from perfect, it is at least for block, arguably better than doing nothing.

For the most part, SPC has become a de facto standard for at least block storage benchmarks independent of using IOmeter or other tools or vendor specific simulations, similar how MSFT ESRP is for exchange, TPC for database, SPEC for NFS and so forth. In fact, SPC even recently rather quietly rolled out a new set of what could be considered the basis for Green storage benchmarks. I would argue that SPC results in themselves are not misleading, particularly if you take the time to look at both the executive and full disclosures and look beyond the summary.

Some vendors have taken advantage of the SPC results playing games with discounting on prices (something that’s allowed under SPC rules) to show and make apples to oranges comparisons on cost per IOP or other ploys. This proactive is nothing new to the IT industry or other industries for that matter, hence benchmark games.

Where the misleading SPC issue can come into play is for those who simply look at what a vendor is claiming and not looking at the rest of the story, or taking the time to look at the results and making apples to apples, instead of believing the apples to oranges comparison. After all, the results are there for a reason. That reason is for those really interested to dig in and sift through the material, granted not everyone wants to do that.

For example, some vendors can show a highly discounted list price to get a better IOP per cost on an apple to oranges basis, however, when processes are normalized, the results can be quite different. However here’s the real gem for those who dig into the SPC results, including looking at the configurations and that is that latency under workload is also reported.

The reason that latency is a gem is that generally speaking, latency does not lie.

What this means is that if vendor A doubles the amount of cache, doubles the number of controllers, doubles the number of disk drives, plays games with actual storage utilization (ASU), utilizes fast interfaces from 10 GbE  iSCSI to 8Gb FC or FCoE or SAS to get a better cost per IOP number with discounting, look at the latency numbers. There have been some recent examples of this where vendor A has a better cost per IOP while achieving a higher number of IOPS at a lower cost compared to vendor B, which is what is typically reported in a press release or news story. (See a blog entry that also points to a CMG presentation discussion around this topic here.

Then go and look at the two results, vendor B may be at list price while vendor A is severely discounted which is not a bad thing, as that is then the starting list price as to which customers should start negotiations. However to be fair, normalize the pricing for fun, look at how much more equipment vendor A may need while having to discount to get the price to offset the increased amount of hardware, then look at latency.

In some of the recent record reported results, the latency results are actually better for a vendor B than for a vendor A and why does latency matter? Beyond showing what a controller can actually do in terms of levering  the number of disks, cache, interface ports and so forth, the big kicker is for those talking about SSD (RAM or FLASH) in that SSD generally is about latency. To fully effectively utilize SSD which is a low latency device, you would want a controller that can do a decent job at handling IOPS; however you also need a controller that can do a decent job of handling IOPS with low latency under heavy workload conditions.

Thus the SPC again while far from perfect, at least for a thumb nail sketch and comparison is not necessarily misleading, more often than not it’s how the results are utilized that is misleading. Now in the quest for the SPC administrators to try and gain more members and broader industry participation and thus secure their own future, is the SPC organization or administration opening itself up to being used more and more as a marketing tool in ways that potentially compromise all the credibility (I know, some will dispute the validity of SPC, however that’s reserved for a different discussion ;) )?

There is a bit of Déjà here for those involved with RAID and storage who recall how the RAID Advisory Board (RAB) in its quest to gain broader industry adoption and support succumbed to marketing pressures and use or what some would describe as miss-use and is now a member of the “Where are they now” club!

Don’t get me wrong here; I like the SPC tests/results/format, there is a lot of good information in the SPC. The various vendor folks who work very hard behind the scenes to make the SPC actually work and continue to evolve it also all deserve a great big kudos, an “atta boy” or “atta girl” for the fine work that have been doing, work that I hope does not become lost in the quest to gain market adoption for the SPC.

Ok, so then this should all then beg the question of what is the best benchmark. Simple, the one that most closely resembles your actual applications, workload, conditions, configuration and environment.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Server Storage I/O Network Virtualization Whats Next?

Server Storage I/O Network Virtualization Whats Next?
Server Storage I/O Network Virtualization Whats Next?
Updated 9/28/18

There are many faces and thus functionalities of virtualization beyond the one most commonly discussed which is consolidation or aggregation. Other common forms of virtualization include emulation (which is part of enabling consolidation) which can be in the form of a virtual tape library for storage to bridge new disk technology to old software technology, processes, procedures and skill sets. Other forms of virtualization functionality for life beyond consolidation include abstraction for transparent movement of applications or operating systems on servers, or data on storage to support planned and un-planned maintenance, upgrades, BC/DR and other activities.

So the gist is that there are many forms of virtualization technologies and techniques for servers, storage and even I/O networks to address different issues including life beyond consolidation. However the next wave of consolidation could and should be that of reducing the number of logical images, or, the impact of the multiple operating systems and application images, along with their associated management costs.

This may be easier said than done, however, for those looking to cut costs even further than from what can be realized by reducing physical footprints (e.g. going from 10 to 1 or from 250 to 25 physical servers), there could be upside however it will come at a cost. The cost is like that of reducing data and storage footprint impacts with such as data management and archiving.

Savings can be realized by archiving and deleting data via data management however that is easier said than done given the cost in terms of people time and ability to decide what to archive, even for non-compliance data along with associated business rules and policies to be defined (for automation) along with hardware, software and services (managed services, consulting and/or cloud and SaaS).

Where To Learn More

View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

The Many Faces of Solid State Devices/Disks (SSD)

Storage I/O trends

Here’s a link to a recent article I wrote for Enterprise Storage Forum titled “Not a Flash in the PAN” providing a synopsis of the many faces, implementations and forms of SSD based technologies that includes several links to other related content.

A popular topic over the past year or so has been SSD with FLASH based storage for laptops, also sometimes referred to as hybrid disk drives along with announcements late last year by companies such as Texas Memory Systems (TMS) of a FLASH based storage system combining DRAM for high speed cache in their RAMSAN-500 and more recently EMC adding support for FLASH based SSD devices in their DMX4 systems as a tier-0 to co-exist with other tier-1 (fast FC) and tier-2 (SATA) drives.

Solid State Disks/Devices (SSD) or memory based storage mediums have been around for decades, they continue to evolve using different types of memory ranging from volatile dynamic random access (DRAM) memory to persistent or non-volatile RAM (NVRAM) and various derivatives of NAND FLASH among other users. Likewise, the capacity cost points, performance, reliability, packaging, interfaces and power consumption all continue to improve.

SSD in general, is a technology that has been miss-understood over the decades particularly when simply compared on a cost per capacity (e.g. dollar per GByte) basis which is an unfair comparison. The more approaches comparison is to look at how much work or amount of activity for example transactions per second, NFS operations per second, IOPS or email messages that can be processed in a given amount of time and then comparing the amount of power and number of devices to achieve a desired level of performance. Granted SSD and in particular DRAM based systems cost more on a GByte or TByte basis than magnetic hard disk drives however it also requires more HDDs and controllers to achieve the same level of performance not to mention requiring more power and cooling than compared to a typical SSD based device.

The many faces of SSD range from low cost consumer grade products based on consumer FLASH products to high performance DRAM based caches and devices for enterprise storage applications. Over the past year or so, SSD have re-emerged for those who are familiar with the technology, and emerged or appeared for those new to the various implementations and technologies leading to another up swinging in the historic up and down cycles of SSD adoption and technology evolution in the industry.

This time around, a few things are different and I believe that SSD in general, that is, the many difference faces of SSD will have staying power and not fade away into the shadows only to re-emerge a few years later as has been the case in the past.

The reason I have this opinion is based on two basic premises which are economics and ecological”. Given the focus on reducing or containing costs, doing more with what you have and environmental or ecological awareness in the race to green the data center and green storage, improving on the economics with more energy efficiency storage, that is, enabling your storage to do more work with less energy as opposed to avoiding energy consumption, has the by product of improved economics (cost savings and improved resource utilization and better service delivery) along with ecological (better use of energy or less use of energy).

Current implementations of SSD based solutions are addressing both the energy efficiency topics to enable better energy efficiency ranging from maximizing battery life to boosting performance while drawing less power. Consequently we are now seeing SSD in general are not only being used for boosting performance, also we are seeing it as one of many different tools to address power, cooling, floor space and environmental or green storage issues.

Here’s a link to a StorageIO industry trends and perspectives white paper at www.storageio.com/xreports.htm.

Here’s the bottom line, there are many faces to SSD. SSD (FLASH or DRAM) based solutions and devices have a place in a tiered storage environment as a Tier-0 or as an alternative in some laptop or other servers where appropriate. SSD compliments other technologies and SSD benefits from being paired with other technologies including high performance storage for tier-1 and near-line or tier-2 storage implementing intelligent power management (IPM).

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved