How well do you know good bad ugly I/O iops?
Updated 2/10/2018
There are many different types of server storage I/O iops associated with various environments, applications and workloads. Some I/Os activity are iops, others are transactions per second (TPS), files or messages per time (hour, minute, second), gets, puts or other operations. The best IO is one you do not have to do.
What about all the cloud, virtual, software defined and legacy based application that still need to do I/O?
If no IO operation is the best IO, then the second best IO is the one that can be done as close to the application and processor as possible with the best locality of reference.
Also keep in mind that aggregation (e.g. consolidation) can cause aggravation (server storage I/O performance bottlenecks).
Example of aggregation (consolidation) causing aggravation (server storage i/o blender bottlenecks)
And the third best?
It’s the one that can be done in less time or at least cost or effect to the requesting application, which means moving further down the memory and storage stack.
Leveraging flash SSD and cache technologies to find and fix server storage I/O bottlenecks
On the other hand, any IOP regardless of if for block, file or object storage that involves some context is better than those without, particular involving metrics that matter (here, here and here [webinar] )
Server Storage I/O optimization and effectiveness
The problem with IO’s is that they are a basic operations to get data into and out of a computer or processor, so there’s no way to avoid all of them, unless you have a very large budget. Even if you have a large budget that can afford an all flash SSD solution, you may still meet bottlenecks or other barriers.
IO’s require CPU or processor time and memory to set up and then process the results as well as IO and networking resources to move data too their destination or retrieve them from where they are stored. While IO’s cannot be eliminated, their impact can be greatly improved or optimized by, among other techniques, doing fewer of them via caching and by grouping reads or writes (pre-fetch, write-behind).
Think of it this way: Instead of going on multiple errands, sometimes you can group multiple destinations together making for a shorter, more efficient trip. However, that optimization may also mean your drive will take longer. So, sometimes it makes sense to go on a couple of quick, short, low-latency trips instead of one larger one that takes half a day even as it accomplishes many tasks. Of course, how far you have to go on those trips (i.e., their locality) makes a difference about how many you can do in a given amount of time.
Locality of reference (or proximity)
What is locality of reference?
This refers to how close (i.e., its place) data exists to where it is needed (being referenced) for use. For example, the best locality of reference in a computer would be registers in the processor core, ready to be acted on immediately. This would be followed by levels 1, 2, and 3 (L1, L2, and L3) onboard caches, followed by main memory, or DRAM. After that comes solid-state memory typically NAND flash either on PCIe cards or accessible on a direct attached storage (DAS), SAN, or NAS device.
Even though a PCIe NAND flash card is close to the processor, there still remains the overhead of traversing the PCIe bus and associated drivers. To help offset that impact, PCIe cards use DRAM as cache or buffers for data along with meta or control information to further optimize and improve locality of reference. In other words, this information is used to help with cache hits, cache use, and cache effectiveness vs. simply boosting cache use.
SSD to the rescue?
What can you do the cut the impact of IO’s?
There are many steps one can take, starting with establishing baseline performance and availability metrics.
The metrics that matter include IOP’s, latency, bandwidth, and availability. Then, leverage metrics to gain insight into your application’s performance.
Understand that IO’s are a fact of applications doing work (storing, retrieving, managing data) no matter whether systems are virtual, physical, or running up in the cloud. But it’s important to understand just what a bad IO is, along with its impact on performance. Try to identify those that are bad, and then find and fix the problem, either with software, application, or database changes. Perhaps you need to throw more software caching tools, hypervisors, or hardware at the problem. Hardware may include faster processors with more DRAM and faster internal busses.
Leveraging local PCIe flash SSD cards for caching or as targets is another option.
You may want to use storage systems or appliances that rely on intelligent caching and storage optimization capabilities to help with performance, availability, and capacity.
Where to gain insight into your server storage I/O environment
There are many tools that you can be used to gain insight into your server storage I/O environment across cloud, virtual, software defined and legacy as well as from different layers (e.g. applications, database, file systems, operating systems, hypervisors, server, storage, I/O networking). Many applications along with databases have either built-in or optional tools from their provider, third-party, or via other sources that can give information about work activity being done. Likewise there are tools to dig down deeper into the various data information infrastructure to see what is happening at the various layers as shown in the following figures.
Gaining application and operating system level performance insight via different tools
Insight and awareness via operating system tools on Windows and Linux
In the above example, Spotlight on Windows (SoW) which you can download for free from Dell here along with Ubuntu utilities are shown, You could also use other tools to look at server storage I/O performance including Windows Perfmon among others.
Hypervisor performance using VMware ESXi / vsphere built-in tools
Using Visual ESXtop to dig deeper into virtual server storage I/O performance
Gaining insight into virtual server storage I/O cache performance
Wrap up and summary
There are many approaches to address (e.g. find and fix) vs. simply move or mask data center and server storage I/O bottlenecks. Having insight and awareness into how your environment along with applications is important to know to focus resources. Also keep in mind that a bit of flash SSD or DRAM cache in the applicable place can go along way while a lot of cache will also cost you cash. Even if you cant eliminate I/Os, look for ways to decrease their impact on your applications and systems.
Where To Learn More
View additional NAS, NVMe, SSD, NVM, SCM, Data Infrastructure and HDD related topics via the following links.
- Can we get a side of context with them IOPS and other storage metrics?
- WHEN AND WHERE TO USE NAND FLASH SSD FOR VIRTUAL SERVERS
- Revisiting RAID storage remains relevant and resources
- NVMe overview and primer – Part I
- Part 1 of HDD for content servers series Trends and Content Application Servers
- Part 2 of HDD for content servers series Content application server decisions and testing plans
- Part 3 of HDD for content servers series Test hardware and software configuration
- Part 4 of HDD for content servers series Large file I/O processing
- Part 5 of HDD for content servers series Small file I/O processing
- Part 6 of HDD for content servers series General I/O processing
- Part 7 of HDD for content servers series How HDD continue to evolve over different generations and wrap up
- As the platters spin, HDD’s for cloud, virtual and traditional storage environments
- How many IOPS can a HDD, HHDD or SSD do?
- Hard Disk Drives (HDD) for Virtual Environments
- Server and Storage I/O performance and benchmarking tools
- Server storage I/O performance benchmark workload scripts Part I and Part II
- How to test your HDD, SSD or all flash array (AFA) storage fundamentals
- What is the best server storage I/O workload benchmark? It depends
- I/O, I/O how well do you know about good or bad server and storage I/Os?
- Big Files Lots of Little File Processing Benchmarking with Vdbench
- Part II – NVMe overview and primer (Different Configurations)
- Part III – NVMe overview and primer (Need for Performance Speed)
- Part IV – NVMe overview and primer (Where and How to use NVMe)
- Part V – NVMe overview and primer (Where to learn more, what this all means)
- PCIe Server I/O Fundamentals
- If NVMe is the answer, what are the questions?
- NVMe Wont Replace Flash By Itself
- Via Computerweekly – NVMe discussion: PCIe card vs U.2 and M.2
- Intel and Micron unveil new 3D XPoint Non Volatie Memory (NVM) for servers and storage
- Part II – Intel and Micron new 3D XPoint server and storage NVM
- Part III – 3D XPoint new server storage memory from Intel and Micron
- Server storage I/O benchmark tools, workload scripts and examples (Part I) and (Part II)
- Data Infrastructure Overview, Its Whats Inside of Data Centers
- All You Need To Know about Remote Office/Branch Office Data Protection Backup (free webinar with registration)
- Software Defined, Converged Infrastructure (CI), Hyper-Converged Infrastructure (HCI) resources
- The SSD Place (SSD, NVM, PM, SCM, Flash, NVMe, 3D XPoint, MRAM and related topics)
- The NVMe Place (NVMe related topics, trends, tools, technologies, tip resources)
- Data Protection Diaries (Archive, Backup/Restore, BC, BR, DR, HA, RAID/EC/LRC, Replication, Security)
- Software Defined Data Infrastructure Essentials (CRC Press 2017) including SDDC, Cloud, Container and more
- Various Data Infrastructure related events, webinars and other activities
- www.objectstoragecenter.com and Software Defined, Cloud, Bulk and Object Storage Fundamentals
- Server Storage I/O Network PCIe Fundamentals
Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.
What This All Means
>Keep in mind: SSD including flash and DRAM among others are in your future, the question is where, when, with what, how much and whose technology or packaging.
Ok, nuff said, for now.
Gs
Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.
All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.