ToE NVMeoF TCP Performance Line Boost Performance Reduce Costs

The ToE NVMeoF TCP Performance Line Boost Performance Reduce Costs

ToE NVMeoF TCP Performance Line Boost Performance Reduce Costs.

Yes, you read that correct; leverage TCP offload Engines (TOE) to boost the performance of TCP-based NVMeoF (e.g., NVMe over Fabrics) while reducing costs. Keep in mind that there is a difference between cutting costs (something that causes or moves problems and complexities elsewhere) and reducing and removing costs (e.g., finding, fixing, removing complexities).

Reducing or cutting costs can be easy by simply removing items for lower-priced items and introducing performance bottlenecks or some other compromise. Likewise, boosting performance can be addressed by throwing (deploying) more hardware (and or software) at the problem resulting in higher costs or some other compromise.

On the other hand, as mentioned above, finding, fixing, removing the complexity and overhead results in cost savings while doing the same work or enabling more work done via the same costs, maximizing hardware, software, and network costs. In other words, a better return on investment (ROI) and a lower total cost of ownership (TCO).

Software Defined Storage and Networks Need Hardware

With the continued shift towards software-defined data centers, software-defined data infrastructures, software-defined storage, software-defined networking, and software-defined everything, those all need something in common, and that is hardware-based compute processing.

In the case of software-defined storage, including standalone, shared fabric or networked-based, converged infrastructure (CI) or hyper-converged infrastructure (HCI) deployment models, there is the need for CPU compute, memory, and I/O, in addition to storage devices. This means that the software to create, manage, and perform storage tasks needs to run on a server’s CPU, along with I/O networking software stacks.

However, it should be evident that sometimes the obvious needs to be restarted, which is that software-defined anything requires hardware somewhere in the solution stack. Likewise, depending on how the software is implemented, it may require more hardware resources, including server compute, memory, I/O, and network and storage capabilities.

Keep in mind that networking stacks, including upper and lower-level protocols and interfaces, leverage software to implement their functionality. Therefore, the value proposition of using standard networks such as Ethernet and TCP is the ability to leverage lower-cost network interface cards (or chips), also known as NICs combined with server-based software stacks.

On the one hand, costs can be reduced by using less expensive NICs and using the generally available server CPU compute capabilities to run the TCP and other networking stack software. On systems with a lower application or other software performance demands, this can work out ok. However, for workloads and systems using software-defined storage and other applications that compete for server resources (CPU, memory, I/O), this can result in performance bottlenecks and problems.

Many Server Storage I/O Networking Bottlenecks Are CPU Problems

There is a classic saying that the best I/O is the one that you do not have to do. Likewise, the second-best I/O is the one with the most negligible overhead (and cost) as well as best performance. Another saying is that many application, database, server, and storage I/O problems are actually due to CPU bottlenecks. Fast storage devices need fast applications on fast servers with fast networks. This means finding and removing blockages, including offloading server CPU from performing network I/O processing using TOEs.

Wait a minute, isn’t the value proposition of using software-defined storage or networking to use low-cost general-purpose servers instead of more expensive hardware devices? With some caveats, Yup understands how much server CPU us being used to run the software-defined storage and software stacks and handle upper-level functionality. To support higher performance or larger workloads can be putting in more extensive (scale-up) and more (scale-out) servers and their increased connectivity and management overhead.

This is where the TOEs come into play by leveraging the best of both worlds to run software-defined storage (and networking) stacks, and other software and applications on general-purpose compute servers. The benefit is the TCP network I/O processing gets offloaded from the server CPU to the TOE, thereby freeing up the server CPU to do more work or enabling a smaller, lower-cost CPU to be used.

After all, many servers, storage, and I/O networking problems are often server CPU problems. An example of this is running the TCP networking software stack using CPU cycles on a host server that competes with the other software and applications. In addition, as an application does more I/O, for example, issuing reads and write requests to network and fabric-based storage, the server’s CPUs are also becoming busier with more overhead of running the lower-layer TCP and networking stack.

The result is server resources (CPU, memory) are running at higher utilization; however, there is more overhead. Higher resource utilization with low or no overhead, low latency, and high productivity are good things resulting in lower cost per work done. On the other hand, high CPU utilization, server operating system or kernel mode overhead, poor latency, and low productivity are not good things resulting in host per work done.

This means there is a loss of productivity as more time is spent waiting, and the cost to do a unit of work, for example, an I/O or transaction, increases (there is more overhead). Thus, offload engines (chips, cards, adapters) come into play to shift some software processing from the server CPU to a specialized processor. The result is lower server CPU overhead leaving more server resources for the main application or software-defined storage (and networking) while boosting performance and lowering overall costs.

Graphics, Compute, Network, TCP Offload Engines

Offload engines are not new, they have been around for a while, and in some cases, more common than some realize going by different names. For example, graphical Processing Units (GPUs) are used for offloading graphic and compute-intensive tasks to special chips and adapter cards. Other examples of offload processors include networks such as TCP Offload Engine (TOE), compression, and storage processing, among others.

The basic premise of offload engines is to move or shift processing of specific functions from having their software running on a general-purpose server CPU to a specialized processor (ASIC, FPGA, adapter, or mezzanine card). By moving the processing of functions to the offload or unique processing device, performance can be boosted while freeing up a server’s primary processor (CPU) to do other useful (and productive) work.

There is a cost associated with leveraging offloads and specialized processors; however, the business benefit should be offset by reducing primary server compute expenses or doing more work with available resources and driving network bandwidth line rates performance. The above should result in a net TCO reduction and boost your ROI for a given system or bill of material, including hardware, software, networking, and management.

Cloud File Data Storage Consolidation and Economic Comparison Model

Fast Storage Needs Fast Servers and I/O Networks

Ethernet network TOEs became popular in the industry back in the early 2000s, focusing on networked storage and storage networks that relied on TCP (e.g., iSCSI).

Fast forward to today, and there is continued use of networked (ok, fabric) storage over various interfaces, including Ethernet supporting different protocols. One of those protocols is NVMe in NVMe over Fabrics (NVMeoF) using TCP and underlying Ethernet-based networks for accessing fast Solid State Devices (SSDs).

Chelsio Communications T6 TOE for NVMeoF

An example of server storage I/O network TOEs, including those to support NVMeoF, are those from Chelsio Communications, such as the T6 25/100Gb devices. Chelsio announced today server storage I/O benchmark proof points for TCP based NVMe over Fabric (NVMeoF) TOE accelerated performance. StorageIO had the opportunity to look at the performance-boosting ability and CPU savings benefit of the Chelsio T6 prior to todays announcement.

After reviewing and validating the Chelsio proof points, test methodology, and results, it is clear that the T6 TOE enabled solution boosts server storage I/O performance while reducing host server CPU usage. The Chelsio T6 solution combined with Storage Performance Development Kit (SPDK) software, provides local-like performance of network fabric distributed NVMe (using TCP based NVMeoF) attached SSD storage while reducing host server CPU consumption.

“Boosting application performance, efficiency, and effectiveness of server CPUs are key priorities for legacy and software defined datacenter environments,” said Greg Schulz, Sr. Analyst Server Storage. “The Chelsio NVMe over Fabrics 100GbE NVMe/TCP (TOE) demonstration provides solid proof of how high-performance NVMe SSDs can help datacenters boost performance and productivity, while getting the best return on investment of datacenter infrastructure assets, not to mention optimize cost-of-ownership at the same time. It’s like getting a three for one bonus value from your server CPUs, your network, and your application perform better, now that’s a trifecta!”

You can read more about the technical and business benefits of the Chelsio T6 TOE enabled solution along with associated proof points (benchmarks) in the PDF white paper found here and their Press Release here. Note that the best measure, benchmark, proof point, or test is your application and workload, so contact Chelsio to arrange an evaluation of the T6 using your workload, software, and platform.

Where to learn more

Learn more about TOE, server, compute, GPU, ASIC, FPGA, storage, I/O networking, TCP, data infrastructure and software defined and related topics, trends, techniques, tools via the following links:

Chelsio Communications T6 Performance Press Release (PDF)
Chelsio Communications T6 TOE White Paper (PDF)
Application Data Value Characteristics Everything Is Not the Same
PACE your Infrastructure decision-making, it’s about application requirements
Data Infrastructure Server Storage I/O Tradecraft Trends
Data Infrastructure Overview, Its What’s Inside of Data Centers
Data Infrastructure Management (Insight and Strategies)
Hyper-V and Windows Server 2025 Enhancements

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

The large superscalar web services and other large environments leverage offload engines and specialized processing technologies (chips, ASICs, FPGAs, GPUs, adapters) to boost performance while reducing server compute costs or getting more value out of a given server platform. If it works for the large superscalars, it can also work for your environment or your software-defined platform.

The benefits are reducing the number and cost of your software-defined platform bill of materials (BoM). Another benefit is to free up server CPU cycles to run your storage or network or other software to get more performance and work done. Yet another benefit is the ability to further stretch your software license investments, getting more work done per software license unit.

Have a look at the Chelsio Communications T6 line of TOE for NVMeoF and other workloads to boost performance, reduce CPU usage and lower costs. See for yourself The TOE NVMeoF TCP Performance Line Boost Performance Reduce Costs benefit.

Ok, nuff said, for now.

Cheers GS

Greg Schulz – Microsoft MVP Cloud and Data Center Management, previous 10 time VMware vExpert. Author of Software Defined Data Infrastructure Essentials (CRC Press), Data Infrastructure Management (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Chelsio Storage over IP and other Networks Enable Data Infrastructures

Chelsio Storage over IP Enable Data Infrastructures

server storage I/O data infrastructure trends

Chelsio and Storage over IP (SoIP) continue to enable Data Infrastructures from legacy to software defined virtual, container, cloud as well as converged. This past week I had a chance to visit with Chelsio to discuss data infrastructures, server storage I/O networking along with other related topics. More on Chelsio later in this post, however, for now lets take a quick step back and refresh what is SoIP (Storage over IP) along with Storage over Ethernet (among other networks).

Data Infrastructures Protect Preserve Secure and Serve Information
Various IT and Cloud Infrastructure Layers including Data Infrastructures

Server Storage over IP Revisited

There are many variations of SoIP from network attached storage (NAS) file based processing including NFS, SAMBA/SMB (aka Windows File sharing) among others. In addition there is various block such as SCSI over IP (e.g. iSCSI), along with object via HTTP/HTTPS, not to mention the buzzword bingo list of RoCE, iSER, iWARP, RDMA, DDPK, FTP, FCoE, IFCP, and SMB3 direct to name a few.

Who is Chelsio

For those who are not aware or need a refresher, Chelsio is involved with enabling server storage I/O by creating ASICs (Application Specific Integrated Circuits) that do various functions offloading those from the host server processor. What this means for some is a throw back to the early 2000s of the TCP Offload Engine (TOE) era where various processing to handle regular along with iSCSI and other storage over Ethernet and IP could be accelerated.

Chelsio data infrastructure focus

Chelsio ecosystem across different data infrastructure focus areas and application workloads

As seen in the image above, certainly there is a server and storage I/O network play with Chelsio, along with traffic management, packet inspection, security (encryption, SSL and other offload), traditional, commercial, web, high performance compute (HPC) along with high profit or productivity compute (the other HPC). Chelsio also enables data infrastructures that are part of physical bare metal (BM), software defined virtual, container, cloud, serverless among others.

Chelsio server storage I/O focus

The above image shows how Chelsio enables initiators on server and storage appliances as well as targets via various storage over IP (or Ethernet) protocols.

Chelsio enabling various data center resources

Chelsio also plays in several different sectors from *NIX to Windows, Cloud to Containers, Various processor architectures and hypervisors.

Chelsio ecosystem

Besides diverse server storage I/O enabling capabilities across various data infrastructure environments, what caught my eye with Chelsio is how far they, and storage over IP have progressed over the past decade (or more). Granted there are faster underlying networks today, however the offload and specialized chip sets (e.g. ASICs) have also progressed as seen in the above and next series of images via Chelsio.

The above showing TCP and UDP acceleration, the following show Microsoft SMB 3.1.1 performance something important for doing Storage Spaces Direct (S2D) and Windows-based Converged Infrastructure (CI) along with Hyper Converged Infrastructures (HCI) deployments.

Chelsio software environments

Something else that caught my eye was iSCSI performance which in the following shows 4 initiators accessing a single target doing about 4 million IOPs (reads and writes), various size and configurations. Granted that is with a 100Gb network interface, however it also shows that potential bottlenecks are removed enabling that faster network to be more effectively used.

Chelsio server storage I/O performance

Moving on from TCP, UDP and iSCSI, NVMe and in particular NVMe over Fabric (NVMeoF) have become popular industry topics so check out the following. One of my comments to Chelsio is to add host or server CPU usage to the following chart to help show the story and value proposition of NVMe in general to do more I/O activity while consuming less server-side resources. Lets see what they put out in the future.

Chelsio

Ok, so Chelsio does storage over IP, storage over Ethernet and other interfaces accelerating performance, as well as regular TCP and UDP activity. One of the other benefits of what Chelsio and others are doing with their ASICs (or FPGA by some) is to also offload processing for security among other topics. Given the increased focus around server storage I/O and data infrastructure security from encryption to SSL and related usage that requires more resources, these new ASIC such as from Chelsio help to offload various specialized processing from the server.

The customer benefit is that more productive application work can be done by their servers (or storage appliances). For example, if you have a database server, that means more product ivy data base transactions per second per licensed software. Put another way, want to get more value out of your Oracle, Microsoft or other vendors software licenses, simple, get more work done per server that is licensed by offloading and eliminate waits or other bottlenecks.

Using offloads and removing server bottlenecks might seem like common sense however I’m still amazed that the number of organizations who are more focused on getting extra value out of their hardware vs. getting value out of their software licenses (which might be more expensive).

Chelsio

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

Data Infrastructures Protect Preserve Secure and Serve Information
Various IT and Cloud Infrastructure Layers including Data Infrastructures

What This All Means

Data Infrastructures exist to protect, preserve, secure and serve information along with the applications and data they depend on. With more data being created at a faster rate, along with the size of data becoming larger, increased application functionality to transform data into information means more demands on data infrastructures and their underlying resources.

This means more server I/O to storage system and other servers, along with increased use of SoIP as well as storage over Ethernet and other interfaces including NVMe. Chelsio (and others) are addressing the various application and workload demands by enabling more robust, productive, effective and efficient data infrastructures.

Check out Chelsio and how they are enabling storage over IPO (SoIP) to enable Data Infrastructures from legacy to software defined virtual, container, cloud as well as converged, oh, and thanks Chelsio for being able to use the above images.

Ok, nuff said, for now.
Gs

Greg Schulz – Multi-year Microsoft MVP Cloud and Data Center Management, VMware vExpert (and vSAN). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio.

Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO. All Rights Reserved.

Part II: Seagate 1200 12Gbs Enterprise SAS SSD StorgeIO lab review

Part II: Seagate 1200 12Gbs Enterprise SAS SSD StorgeIO lab review

This is the second post of a two part series, read the first post here.

Earlier this year I had the opportunity to test drive some Seagate 1200 12Gbs Enterprise SAS SSD’s as a follow-up to some earlier activity trying their Enterprise TurboBoost Drives. Disclosure: Seagate has been a StorageIO client and was also the sponsor of this white paper and associated proof-points mentioned in this post.

The Server Storage I/O Blender Effect Bottleneck

The earlier proof-points focused on SSD as a target or storage device. In the following proof-points, the Seagate Enterprise 1200 SSD is used as a shared read cache (write-through). Using a write-through cache enables a given amount of SSD to give a performance benefit to other local and networked storage devices.

traditional server storage I/O
Non-virtualized servers with dedicated storage and I/O paths.

Aggregation causes aggravation with I/O bottlenecks because of consolidation using server virtualization. The following figure shows non-virtualized servers with their own dedicated physical machine (PM) and I/O resources. When various servers are virtualized and hosted by a common host (physical machine), their various workloads compete for I/O and other resources. In addition to competing for I/O performance resources, these different servers also tend to have diverse workloads.

virtual server storage I/O blender
Virtual server storage I/O blender bottleneck (aggregation causes aggravation)

The figure above shows aggregation causing aggravation with the result being I/O bottlenecks as various applications performance needs converge and compete with each other. The aggregation and consolidation result is a blend of random, sequential, large, small, read and write characteristics. These different storage I/O characteristics are mixed up and need to be handled by the underlying I/O capabilities of the physical machine and hypervisor. As a result, a common deployment for SSD in addition to as a target device for storing data is as a cache to cut bottlenecks for traditional spinning HDD.

In the following figure a solution is shown introducing I/O caching with SSD to help mitigate or cut the effects of server consolation causing performance aggravations.

Creating a server storage I/O blender bottleneck

xxxxx
Addressing the VMware Server Storage I/O blender with cache

Addressing server storage I/O blender and other bottlenecks

For these proof-points, the goal was to create an I/O bottleneck resulting from multiple VMs in a virtual server environment performing application work. In this proof-point, multiple competing VMs including a SQL Server 2012 database and an Exchange server shared the same underlying storage I/O infrastructure including HDD’s The 6TB (Enterprise Capacity) HDD was configured as a VMware dat and allocated as virtual disks to the VMs. Workloads were then run concurrently to create an I/O bottleneck for both cached and non-cached results.

xxxxx
Server storage I/O with virtualization roof-point configuration topology

The following figure shows two sets of proof points, cached (top) and non-cached (bottom) with three workloads. The workloads consisted of concurrent Exchange and SQL Server 2012 (TPC-B and TPC-E) running on separate virtual machine (VM) all on the same physical machine host (SUT) with database transactions being driven by two separate servers. In these proof-points, the applications data were placed onto the 6TB SAS HDD to create a bottleneck, and a portion of the SSD used as a cache. Note that the Virtunet cache software allows you to use a part of a SSD device for cache with the balance used as a regular storage target should you want to do so.

If you have paid attention to the earlier proof-points, you might notice that some of the results below are not as good as those seen in the Exchange, TPC-B and TPC-E results about. The reason is simply that the earlier proof-points were run without competing workloads, and database along with log or journal files were placed on separate drives for performance. In the following proof-point as part of creating a server storage I/O blender bottleneck the Exchange, TPC-B as well as TPC-E workloads were all running concurrently with all data on the 6TB drive (something you normally would not want to do).

storage I/O blender solved
Solving the VMware Server Storage I/O blender with cache

The cache and non-cached mixed workloads shown above prove how an SSD based read-cache can help to reduce I/O bottlenecks. This is an example of addressing the aggravation caused by aggregation of different competing workloads that are consolidated with server virtualization.

For the workloads shown above, all data (database tables and logs) were placed on VMware virtual disks created from a dat using a single 7.2K 6TB 12Gbps SAS HDD (e.g. Seagate Enterprise Capacity).

The guest VM system disks which included paging, applications and other data files were virtual disks using a separate dat mapped to a single 7.2K 1TB HDD. Each workload ran for eight hours with the TPC-B and TPC-E having 50 simulated users. For the TPC-B and TPC-E workloads, two separate servers were used to drive the transaction requests to the SQL Server 2012 database.

For the cached tests, a Seagate Enterprise 1200 400GB 12Gbps SAS SSD was used as the backing store for the cache software (Virtunet Systems Virtucache) that was installed and configured on the VMware host.

During the cached tests, the physical HDD for the data files (e.g. 6TB HDD) and system volumes (1TB HDD) were read cache enabled. All caching was disabled for the non-cached workloads.

Note that this was only a read cache, which has the side benefit of off-loading those activities enabling the HDD to focus on writes, or read-ahead. Also note that the combined TPC-E, TPC-B and Exchange databases, logs and associated files represented over 600GB of data, there was also the combined space and thus cache impact of the two system volumes and their data. This simple workload and configuration is representative of how SSD caching can complement high-capacity HDD’s

Seagate 6TB 12Gbs SAS high-capacity HDD

While the star and focus of these series of proof-points is the Seagate 1200 Enterprise 12Gbs SAS SSD, the caching software (virtunet) and Enterprise TurboBoost drives also play key supporting and favorable roles. However the 6TB 12Gbs SAS high-capacity drive caught my attention from a couple of different perspectives. Certainly the space capacity was interesting along with a 12Gbs SAS interface well suited for near-line, high-capacity and dense tiered storage environments. However for a high-capacity drive its performance is what really caught my attention both in the standard exchange, TPC-B and TPC-E workloads, as well as when combined with SSD and cache software.

This opens the door for a great combination of leveraging some amount of high-performance flash-based SSD (or TurboBoost drives) combined with cache software and high-capacity drives such as the 6TB device (Seagate now has larger versions available). Something else to mention is that the 6TB HDD in addition to being available in either 12Gbs SAS, 6Gbs SAS or 6Gbs SATA also has enhanced durability with a Read Bit Error Rate of 10 ^15 (e.g. 1 second read error per 10^15 average attempts) and an AFR (annual failure rate) of 0.63% (See more speeds and feeds here). Hence if you are concerned about using large capacity HDD’s and them failing, make sure you go with those that have a high Read Bit Error Rate and a low AFR which are more common with enterprise class vs. lower cost commodity or workstation drives. Note that these high-capacity enterprise HDD’s are also available with Self-Encrypting Drive (SED) options.

Summary

Read more in this StorageIO Industry Trends and Perspective (ITP) white paper compliments of Seagate 1200 12Gbs SAS SSD’s and visit the Seagate Enterprise 1200 12Gbs SAS SSD page here. Moving forward there is the notion that flash SSD will be everywhere. There is a difference between all data on flash SSD vs. having some amount of SSD involved in preserving, serving and protecting (storing) information.

Key themes to keep in mind include:

  • Aggregation can cause aggravation which SSD can alleviate
  • A relative small amount of flash SSD in the right place can go a long way
  • Fast flash storage needs fast server storage I/O access hardware and software
  • Locality of reference with data close to applications is a performance enabler
  • Flash SSD everywhere does not mean everything has to be SSD based
  • Having some amount of flash in different places is important for flash everywhere
  • Different applications have various performance characteristics
  • SSD as a storage device or persistent cache can speed up IOPs and bandwidth

Flash and SSD are in your future, this comes back to the questions of how much flash SSD do you need, along with where to put it, how to use it and when.

Ok, nuff said (for now).

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved