PCIe Fundamentals Server Storage I/O Network Essentials

Updated 8/31/19

PCIe Fundamentals Server Storage I/O Network Essentials

PCIe fundamentals data infrastructure trends

This piece looks at PCIe Fundamentals topics for server, storage, I/O network data infrastructure environments. Peripheral Computer Interconnect (PCI) Express aka PCIe is a Server, Storage, I/O networking fundamentals component. This post is an excerpt from chapter 4 (Chapter 4: Servers: Physical, Virtual, Cloud, and Containers) of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017) Available via Amazon.com and other global venues. In this post, we look various PCIe fundamentals to learn and expand or refresh your server, storage, and I/O and networking tradecraft skills experience.

PCIe fundamentals Server Storage I/O Fundamentals

PCIe fundamental common server I/O component

Common to all servers is some form of a main system board, which can range from a few square meters in supercomputers, data center rack, tower, and micro towers converged or standalone, to small Intel NUC (Next Unit of Compute), MSI and Kepler-47 footprint, or Raspberry Pi-type desktop servers and laptops. Likewise, PCIe is commonly found in storage and networking systems, appliances among other devices.

For example, a blade server will have multiple server blades or modules, each with its motherboard, which shares a common back plane for connectivity. Another variation is a large server such as an IBM “Z” mainframe, Cray, or another supercomputer that consists of many specialized boards that function similar to a smaller-sized motherboard on a larger scale.

Some motherboards also have mezzanine or daughter boards for attachment of additional I/O networking or specialized devices. The following figure shows a generic example of a two-socket, with eight-memory-channel-type server architecture.

PCIe fundamentals SDDC, SDI, SDDI Server fundamentals
Generic computer server hardware architecture. Source: Software Defined Data Infrastructure Essentials (CRC Press 2017)

The above figure shows several PCIe, USB, SAS, SATA, 10 GbE LAN, and other I/O ports. Different servers will have various combinations of processor, and Dual Inline Memory Module (DIMM) Dynamic RAM (DRAM) sockets along with other features. What will also vary are the type and some I/O and storage expansion ports, power and cooling, along with management tools or included software.

PCIe, Including Mini-PCIe, NVMe, U.2, M.2, and GPU

At the heart of many servers I/O and connectivity solutions are the PCIe industry-standard interface (see PCIsig.com). PCIe is used to communicate with CPUs and the outside world of I/O networking devices. The importance of a faster and more efficient PCIe bus is to support more data moving in and out of servers while accessing fast external networks and storage.

For example, a server with a 40-GbE NIC or adapter would have to have a PCIe port capable of 5 GB per second. If multiple 40-GbE ports are attached to a server, you can see where the need for faster PCIe interfaces come into play.

As more VM are consolidated onto PM, as applications place more performance demand either regarding bandwidth or activity (IOPS, frames, or packets) per second, more 10-GbE adapters will be needed until the price of 40-GbE (also 25, 50 or 100 Gbe) becomes affordable. It is not if, but rather when you will grow into the performance needs on either a bandwidth/throughput basis or to support more activity and lower latency per interface.

PCIe is a serial interface specified for how servers communicate between CPUs, memory, and motherboard-mounted as well as AiC devices. This communication includes support attachment of onboard and host bus adapter (HBA) server storage I/O networking devices such as Ethernet, Fibre Channel, InfiniBand, RapidIO, NVMe (cards, drives, and fabrics), SAS, and SATA, among other interfaces.

In addition to supporting attachment of traditional LAN, SAN, MAN, and WAN devices, PCIe is also used for attaching GPU and video cards to servers. Traditionally, PCIe has been focused on being used inside of a given server chassis. Today, however, PCIe is being deployed on servers spanning nodes in dual, quad, or CiB, CI, and HCI or Software Defined Storage (SDS) deployments. Another variation of PCIe today is that multiple servers in the same rack or proximity can attach to shared devices such as storage via PCIe switches.

PCIe components (hardware and software) include:

  • Hardware chipsets, cabling, connectors, endpoints, and adapters
  • Root complex and switches, risers, extenders, retimers, and repeaters
  • Software drivers, BIOS, and management tools
  • HBAs, RAID, SSD, drives, GPU, and other AiC devices
  • Mezzanine, mini-PCIe, M.2, NVMe U.2 (8639 drive form factor)

There are many different implementations of PCIe, corresponding to generations representing speed improvements as well as physical packing options. PCIe can be deployed in various topologies, including a traditional model where an AiC such as GbE or Fibre Channel HBA connects the server to a network or storage device.

Another variation is for a server to connect to a PCIe switch, or in a shared PCIe configuration between two or more servers. In addition to different generations and topologies, there are also various PCIe form factors and physical connectors (see the following figure), ranging from AiC of various length and height, as well as M.2 small-form-factor devices and U.2 (8639) drive form-factor device for NVMe, among others.

Note that the presence of M.2 does not guarantee PCIe NVMe, as it also supports SATA.

Likewise, different NVMe devices run at various PCIe speeds based on the number of lanes. For example, in the following figure, the U.2 (8639) device (looks like a SAS device) shown is a PCIe x4.

SDDC, SDI, SDDI PCIe NVMe U.2 8639 drive fundamentals
PCIe devices NVMe U.2, M.2, and NVMe AiC. (Source: StorageIO Labs.)

PCIe leverages multiple serial unidirectional point-to-point links, known as lanes, compared to traditional PCI, which used a parallel bus design. PCIe interfaces can have one (x1), four (x4), eight (x8), sixteen (x16), or thirty-two (x32) lanes for data movement. Those PCIe lanes can be full-duplex, meaning data is sent and received at the same time, providing improved effective performance.

PCIe cards are upward-compatible, meaning that an x4 can work in an x8, an x8 in an x16, and so forth. Note, however, that the cards will not perform any faster than their specified speed; an x4 in an x8 slot will only run at x8. PCIe cards can also have single, dual, or multiple external ports and interfaces. Also, note that there are still some motherboards with legacy PCI slots that are not interoperable with PCIe cards and vice versa.

Note that PCIe cards and slots can be mechanically x1, x4, x8, x16, or x32, yet electrically (or signal) wired to a slower speed, based on the type and capabilities of the processor sockets and corresponding chipsets being used. For example, you can have a PCIe x16 slot (mechanical) that is wired for x8, which means it will only run at x8 speed.

In addition to the differences between electrical and mechanical slots, also pay attention to what generation the PCIe slots are, such as Gen 2 or Gen 3 or higher. Also, some motherboards or servers will advertise multiple PCIe slots, but those are only active with a second or additional processor socket occupied by a CPU. For example, a PCIe card that has dual x4 external PCIe ports requiring full PCIe bandwidth will need at least PCIe x8 attachment in the server slot. In other words, for full performance, the external ports on a PCIe card or device need to match the external electrical and mechanical card type and vice versa.

Recall big “B” as in Bytes vs. little “b” as in bits; for example, a PCIe Gen 3 x4 electrical could provide up to 4 GB/s bandwidth (your mileage and performance will vary), which translates to 8 × 4 GB or 32 Gbits/s. In the following table below, there is a mix of Big “B” Bytes per second and small “b” bits per second.

Each generation of PCIe has improved on the previous one by increasing the effective speed of the links. Some of the speed improvements have come from faster clock rates while implementing lower overhead encoding (e.g., from 8 b/10 b to 128 b/130 b).

For example, PCIe Gen 3 raw bit or line rate is 8 GT/s or 8 Gbps or about 2 GBps by using a 128 b/130 b encoding scheme that is very efficient compared to PCIe Gen 2 or Gen 1, which used an 8 b/10 b encoding scheme. With 8 b/10 b, there is a 20% overhead vs. a 1.5% overhead with 128 b/130 b (i.e., of 130 bits sent, 128 bits contain data, and 2 bits are for overhead).

PCIe Gen 1

PCIe Gen 2

PCIe Gen 3

PCIe Gen 4

PCIe Gen 5

Raw bit rate

2.5 GT/s

5 GT/s

8 GT/s

16 GT/s

32 GT/s

Encoding

8 b/10 b

8 b/10 b

128 b/130 b

128 b/130 b

128 b/130 b

x1 Lane bandwidth

2 Gb/s

4 Gb/s

8 Gb/s

16 Gb/s

32 Gb/s

x1 Single lane (one-way)

~250 MB/s

~500 MB/s

~1 GB/s

~2 GB/s

~4GB/s

x16 Full duplex (both ways)

~8 GB/s

~16 GB/s

~32 GB/s

~64 GB/s

~128 GB/s

Above Table: PCIe Generation and Sample Lane Comparison

Note that PCIe Gen 3 is the currently generally available shipping technology with PCIe Gen 4 appearing in the not so distant future, with PCIe Gen 5 in the wings appearing a few more years down the road.

By contrast, older generations of Fibre Channel and Ethernet also used 8 b/10 b, having switched over to 64 b/66 b encoding with 10 Gb and higher. PCIe, like other serial interfaces and protocols, can support full-duplex mode, meaning that data can be sent and received concurrently.

PCIe Bit Rate, Encoding, Giga Transfers, and Bandwidth

Let’s clarify something about data transfer or movement both internal and external to a server. At the core of a server, there is data movement within the sockets of the processors and its cores, as well as between memory and other devices (internal and external). For example, the QPI bus is used for moving data between some Intel processors whose performance is specified in giga transfers (GT).

PCIe is used for moving data between processors, memory, and other devices, including internal and external facing devices. Devices include host bus adapters (HBAs), host channel adapters (HCAs), converged network adapters (CNAs), network interface cards (NICs) or RAID cards, and others. PCIe performance is specified in multiple ways, given that it has a server processor focus which involves GT for raw bit rate as well as effective bandwidth per lane.

Note to keep in perspective PCIe mechanical as well as electrical lanes in that a card or slot may be advertised as say x8 mechanical (e.g., its physical slot form factor) yet only be x4 electrical (how many of those lanes are used or enabled). Also in the case of an adapter that has two or more ports, if the device is advertised as x8 does that mean it is x8 per port or x4 per port with an x8 connection to the PCIe bus.

Effective bandwidth per lane can be specified as half- or full-duplex (data moving in one or both directions for send and receive). Also, effective bandwidth can be specified as a single lane (x1), four lanes (x4), eight lanes (x8), sixteen lanes (x16), or 32 lanes (x32), as shown in the above table. The difference in speed or bits moved per second between the raw bit or line rate, and the effective bandwidth per lane in a single direction (i.e., half-duplex) is the encoding that is common to all serial data transmissions.

When data gets transmitted, the serializer/deserializer, or serdes, convert the bytes into a bit stream via encoding. There are different types of encoding, ranging from 8 b/10 b to 64 b/66 b and 128 b//130 b, shown in the following table.

Single 1542-byte frame

64 × 1542-byte frames

Encoding Scheme

Overhead

Data Bits

Encoding Bits

Bits Transmitted

Data Bits

Encoding Bits

Bits Transferred

8 b/10 b

20%

12,336

3,084

15,420

789,504

197,376

986,880

64 b/66 b

3%

12,336

386

12,738

789,504

24,672

814,176

128 b/130 b

1.5%

12,336

194

12,610

789,504

12,336

801,840

Above Table: Low-Level Serial Encoding Data Transmit Efficiency

In these encoding schemes, the smaller number represents the amount of data being sent, and the difference is the overhead. Note that this is different yet related to what occurs at a higher level with the various network protocols such as TCP/IP (IP). With IP, there is a data payload plus addressing and other integrity and management features in a given packet or frame.

The 8-b/10-b, 64-b/66-b or 128-b/130-b encoding is at the lower physical layer. Thus, a small change there has a big impact and benefit when optimized. Table 4.2 shows comparisons of various encoding schemes using the example of moving a single 1542-byte packet or frame, as well as sending (or receiving) 64 packets or frames that are 1542 bytes in size.

Why 1542? That is a standard IP packet including data and protocol framing without using jumbo frames (MTU or maximum transmission units).

What does this have to do with PCIe? GbE, 10-GbE, 40-GbE, and other physical interfaces that are used for moving TCP/IP packets and frames interface with servers via PCIe.

This encoding is important as part of server storage I/O tradecraft regarding understanding the impact of performance and network or resource usage. It also means understanding why there are fewer bits per second of effective bandwidth (independent of compression or deduplication) vs. line rate in either half- or full-duplex mode.

Another item to note is that looking at encoding such as the example given in the above table shows how a relatively small change at a large scale can have a big effective impact benefit. If the bits and bytes encoding efficiency and effectiveness scenario in Table 4.2 do not make sense, then try imagining 13 MINI Cooper automobiles each with eight people in it (yes, that would be a tight fit) end to end on the same road.

Now imagine a large bus that takes up much less length on the road than the 13 MINI Coopers. The bus holds 128 people, who would still be crowded but nowhere near as cramped as eight people in a MINI, plus 24 additional people can be carried on the bus. That is an example of applying basic 8-b/10-b encoding (the MINI) vs. applying 128-b/130-b encoding (the bus) and is also similar to PCIe G3 and G4, which use 128-b/130-b encoding for data movement.

PCIe Topologies

The basic PCIe topology configuration has one or more devices attached to the root complex shown in the following figure via an AiC or onboard device connector. Examples of AiC and motherboard-mounted devices that attach to PCIe root include LAN or SAN HBA, networking, RAID, GPU, NVM or SSD, among others. At system start-up, the server initializes the PCIe bus and enumerates the devices found with their addresses.

PCIe devices attach (shown in the following figure) to a bus that communicates with the root complex that connects with processor CPUs and memory. At the other end of a PCIe device is an end-point target, a PCIe switch that in turn has end-point targets attached. From a software standpoint, hypervisor or operating system device drivers communicate with the PCI devices that in turn send or receive data or perform other functions.

SDDC, SDI, SDDI PCIe fundamentals
Basic PCIe root complex with a PCIe switch or expander.

Note that in addition to PCIe AiC such as HBAs, GPU, and NVM SSD, among others that install into PCIe slots, servers also have converged storage or disk drive enclosures that support a mix of SAS, SATA, and PCIe. These enclosure backplanes have a connector that attaches to a SAS or SATA onboard port, or a RAID card, as well as to a PCIe riser card or motherboard connector. Depending on what type of drive is installed in the connector, either the SAS, SATA, or NVMe (AiC, U.2, and M2) using PCIe communication paths are used.

In addition to traditional and switched PCIe, using PCIe switches as well as nontransparent bridging (NTB), various other configurations can be deployed. These include server to server for clustering, failover, or device sharing as well as fabrics. Note that this also means that while traditionally found inside a server, PCIe can today use an extender, retimer, and repeaters extended across servers within a rack or cabinet.

A nontransparent bridge (NTB) is a point-to-point connection between two PCIe-based systems that provide electrical isolation yet functions as a transport bridge between two different address domains. Hosts on either side of the NTB see their respective memory or I/O address space. The NTB presents an endpoint exposed to the local system where writes are mirrored to memory on the remote system to allow the systems to communicate and share devices using associated device drivers. For example, in the following figure, two servers, each with a unique PCIe root complex, address, and memory map, are shown using NTB to any communication between the systems while maintaining data integrity.

SDDC, SDI, SDDI PCIe two server fundamentals
PCIe dual server example using NTB along with switches.

General PCIe considerations (slots and devices) include:

  • Power consumption (and heat dissipation)
  • Physical and software plug-and-play (good interoperability)
  • Drivers (in-the-box, built into the OS, or add-in)
  • BIOS, UEFI, and firmware being current versions
  • Power draw per card or adapters
  • Type of processor, socket, and support chip (if not an onboard processor)
  • Electrical signal (lanes) and mechanical form factor per slot
  • Nontransparent bridge and root port (RP)
  • PCI multi-root (MR), single-root (SR), and hot plug
  • PCIe expansion chassis (internal or external)
  • External PCIe shared storage

Various operating system and hypervisor commands are available for viewing and managing PCIe devices. For example, on Linux, the “lspci” and “lshw–c pci” commands displays PCIe devices and associated information. On a VMware ESXi host, the “esxcli hardware pci list” command will show various PCIe devices and information, while on Microsoft Windows systems, “device manager” (GUI) or “devcon” (command line) will show similar information.

Who Are Some PCIe Fundamentals Vendors and Service Providers

While not an exhaustive list, here is a sampling of some vendors and service providers involved in various ways with PCIe from solutions to components to services to trade groups include Amphenol (connectors and cables), AWS (cloud data infrastructure services), Broadcom (PCIe components), Cisco (servers), DataOn (servers), Dell EMC (servers, storage, software), E8 (storage software), Excelero (storage software), HPE (storage, servers), Huawei (storage, servers), IBM, Intel (storage, servers, adapters), Keysight (test equipment and tools).

Others include Lenovo (servers), Liqid (composable data infrastructure), Mellanox (server and storage adapters), Micron (storage devices), Microsemi (PCIe components), Microsoft (Cloud and Software including S2D), Molex (connectors, cables), NetApp, NVMexpress.org (NVM Express trade group organizations), Open Compute Project (server, storage, I/O network industry group), Oracle, PCISIG (PCIe industry trade group), Samsung (storage devices), ScaleMP (composable data infrastructure), Seagate (storage devices), SNIA (industry trade group), Supermicro (servers), Tidal (composable data infrastructure), Vantar (formerly known as HDS), VMware (Software including vSAN), and WD among others.

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

PCIe fundamentals are resources for building legacy and software-defined data infrastructures (SDDI), software-defined infrastructures (SDI), data centers and other deployments from laptop to large scale, hyper-scale cloud service providers. Learn more about Servers: Physical, Virtual, Cloud, and Containers in chapter 4 of my new book Software Defined Data Infrastructure Essentials (CRC Press 2017) Available via Amazon.com and other global venues. Meanwhile, PCIe fundamentals continues to evolve as a Server, Storage, I/O networking fundamental component.

Ok, nuff said, for now.
Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio.

Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO. All Rights Reserved.

Chelsio Storage over IP and other Networks Enable Data Infrastructures

Chelsio Storage over IP Enable Data Infrastructures

server storage I/O data infrastructure trends

Chelsio and Storage over IP (SoIP) continue to enable Data Infrastructures from legacy to software defined virtual, container, cloud as well as converged. This past week I had a chance to visit with Chelsio to discuss data infrastructures, server storage I/O networking along with other related topics. More on Chelsio later in this post, however, for now lets take a quick step back and refresh what is SoIP (Storage over IP) along with Storage over Ethernet (among other networks).

Data Infrastructures Protect Preserve Secure and Serve Information
Various IT and Cloud Infrastructure Layers including Data Infrastructures

Server Storage over IP Revisited

There are many variations of SoIP from network attached storage (NAS) file based processing including NFS, SAMBA/SMB (aka Windows File sharing) among others. In addition there is various block such as SCSI over IP (e.g. iSCSI), along with object via HTTP/HTTPS, not to mention the buzzword bingo list of RoCE, iSER, iWARP, RDMA, DDPK, FTP, FCoE, IFCP, and SMB3 direct to name a few.

Who is Chelsio

For those who are not aware or need a refresher, Chelsio is involved with enabling server storage I/O by creating ASICs (Application Specific Integrated Circuits) that do various functions offloading those from the host server processor. What this means for some is a throw back to the early 2000s of the TCP Offload Engine (TOE) era where various processing to handle regular along with iSCSI and other storage over Ethernet and IP could be accelerated.

Chelsio data infrastructure focus

Chelsio ecosystem across different data infrastructure focus areas and application workloads

As seen in the image above, certainly there is a server and storage I/O network play with Chelsio, along with traffic management, packet inspection, security (encryption, SSL and other offload), traditional, commercial, web, high performance compute (HPC) along with high profit or productivity compute (the other HPC). Chelsio also enables data infrastructures that are part of physical bare metal (BM), software defined virtual, container, cloud, serverless among others.

Chelsio server storage I/O focus

The above image shows how Chelsio enables initiators on server and storage appliances as well as targets via various storage over IP (or Ethernet) protocols.

Chelsio enabling various data center resources

Chelsio also plays in several different sectors from *NIX to Windows, Cloud to Containers, Various processor architectures and hypervisors.

Chelsio ecosystem

Besides diverse server storage I/O enabling capabilities across various data infrastructure environments, what caught my eye with Chelsio is how far they, and storage over IP have progressed over the past decade (or more). Granted there are faster underlying networks today, however the offload and specialized chip sets (e.g. ASICs) have also progressed as seen in the above and next series of images via Chelsio.

The above showing TCP and UDP acceleration, the following show Microsoft SMB 3.1.1 performance something important for doing Storage Spaces Direct (S2D) and Windows-based Converged Infrastructure (CI) along with Hyper Converged Infrastructures (HCI) deployments.

Chelsio software environments

Something else that caught my eye was iSCSI performance which in the following shows 4 initiators accessing a single target doing about 4 million IOPs (reads and writes), various size and configurations. Granted that is with a 100Gb network interface, however it also shows that potential bottlenecks are removed enabling that faster network to be more effectively used.

Chelsio server storage I/O performance

Moving on from TCP, UDP and iSCSI, NVMe and in particular NVMe over Fabric (NVMeoF) have become popular industry topics so check out the following. One of my comments to Chelsio is to add host or server CPU usage to the following chart to help show the story and value proposition of NVMe in general to do more I/O activity while consuming less server-side resources. Lets see what they put out in the future.

Chelsio

Ok, so Chelsio does storage over IP, storage over Ethernet and other interfaces accelerating performance, as well as regular TCP and UDP activity. One of the other benefits of what Chelsio and others are doing with their ASICs (or FPGA by some) is to also offload processing for security among other topics. Given the increased focus around server storage I/O and data infrastructure security from encryption to SSL and related usage that requires more resources, these new ASIC such as from Chelsio help to offload various specialized processing from the server.

The customer benefit is that more productive application work can be done by their servers (or storage appliances). For example, if you have a database server, that means more product ivy data base transactions per second per licensed software. Put another way, want to get more value out of your Oracle, Microsoft or other vendors software licenses, simple, get more work done per server that is licensed by offloading and eliminate waits or other bottlenecks.

Using offloads and removing server bottlenecks might seem like common sense however I’m still amazed that the number of organizations who are more focused on getting extra value out of their hardware vs. getting value out of their software licenses (which might be more expensive).

Chelsio

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

Data Infrastructures Protect Preserve Secure and Serve Information
Various IT and Cloud Infrastructure Layers including Data Infrastructures

What This All Means

Data Infrastructures exist to protect, preserve, secure and serve information along with the applications and data they depend on. With more data being created at a faster rate, along with the size of data becoming larger, increased application functionality to transform data into information means more demands on data infrastructures and their underlying resources.

This means more server I/O to storage system and other servers, along with increased use of SoIP as well as storage over Ethernet and other interfaces including NVMe. Chelsio (and others) are addressing the various application and workload demands by enabling more robust, productive, effective and efficient data infrastructures.

Check out Chelsio and how they are enabling storage over IPO (SoIP) to enable Data Infrastructures from legacy to software defined virtual, container, cloud as well as converged, oh, and thanks Chelsio for being able to use the above images.

Ok, nuff said, for now.
Gs

Greg Schulz – Multi-year Microsoft MVP Cloud and Data Center Management, VMware vExpert (and vSAN). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio.

Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO. All Rights Reserved.

March 2017 Server StorageIO Data Infrastructure Update Newsletter

Volume 17, Issue III

Hello and welcome to the March 2017 issue of the Server StorageIO update newsletter.

First a reminder world backup (and recovery) day is on March 31. Following up from the February Server StorageIO update newsletter that had a focus on data protection this edition includes some additional posts, articles, tips and commentary below.

Other data infrastructure (and tradecraft) topics in this edition include cloud, virtual, server, storage and I/O including NVMe as well as networks. Industry trends include new technology and services announcements, cloud services, HPE buying Nimble among other activity. Check out the Converged Infrastructure (CI), Hyper-Converged (HCI) and Cluster in Box (or Cloud in Box) coverage including a recent SNIA webinar I was invited to be the guest presenter for, along with companion post below.

In This Issue

Enjoy this edition of the Server StorageIO update newsletter.

Cheers GS

Data Infrastructure and IT Industry Activity Trends

Some recent Industry Activities, Trends, News and Announcements include:

Dell EMC has discontinued the NVMe direct attached shared DSSD D5 all flash array has been discontinued. At about the same time Dell EMC is shutting down the DSSD D5 product, it has also signaled they will leverage the various technologies including NVMe across their broad server storage portfolio in different ways moving forward. While Dell EMC is shutting down DSSD D5, they are also bringing additional NVMe solutions to the market including those they have been shipping for years (e.g. on the server-side). Learn more about DSSD D5 here and here including perspectives of how it could have been used (plays for playbooks).

Meanwhile NVMe industry activity continues to expand with different solutions from startups such as E8, Excelero, Everspin, Intel, Mellanox, Micron, Samsung and WD SANdisk among others. Also keep in mind, if the answer is NVMe, then what were and are the questions to ask, as well as what are some easy to use benchmark scripts (using fio, diskspd, vdbench, iometer).

Speaking of NVMe, flash and SSDs, Amazon Web Services (AWS) have added new Elastic Cloud Compute (EC2) storage and I/O optimized i3 instances. These new instances are available in various configurations with different amounts of vCPU (cores or logical processors), memory and NVMe SSD capacities (and quantity) along with price.

Note that the price per i3 instance varies not only by its configuration, also for image and region deployed in. The flash SSD capacities range from an entry-level (i3.large) with 2 vCPU (logical processors), 15.25GB of RAM and a single 475GB NVMe SSD that for example in the US East Region was recently priced at $0.156 per hour. At the high-end there is the i3.16xlarge with 64 vCPU (logical processors), 488GB RAM and 8 x 1900GB NVMe SSDs with a recent US East Region price of $4.992 per hour. Note that the vCPU refers to the available number of logical processors available and not necessarily cores or sockets.

Also note that your performance will vary, and while NVMe protocol tends to use less CPU per I/O, if generating a large number of I/Os you will need some CPU. What this means is that if you find your performance limited compared to expectations with the lower end i3 instances, move up to a larger instance and see what happens. If you have a Windows-based environment, you can use a tool such as Diskspd to see what happens with I/O performance as you decrease the number of CPUs used.

Chelsio has announced they are now Microsoft Azure Stack Certified with their iWARP RDMA host adapter solutions, as well as for converged infrastructure (CI), hyper-converged (HCI) and legacy server storage deployments. As part of the announcement, Chelsio is also offering a 30 day no cost trial of their adapters for Microsoft Azure Stack, Windows Server 2016 and Windows 10 client environments. Learn more about the Chelsio trial offer here.

Everspin (the MRAM Spintorque, persistent RAM folks) have announced a new Storage Class Memory (SCM) NVMe accessible family (nvNITRO) of storage accelerator devices (PCIe AiC, U.2). Whats interesting about Everspin is that they are using NVMe for accessing their persistent RAM (e.g. MRAM) making it easily plug compatible with existing operating systems or hypervisors. This means using standard out of the box NVMe drivers where the Everspin SCM appears as a block device (for compatibility) functioning as a low latency, high performance persistent write cache.

Something else interesting besides making the new memory compatible with existing servers CPU complex via PCIe, is how Everspin is demonstrating that NVMe as a general access protocol is not just exclusive to nand flash-based SSDs. What this means is that instead of using non-persistent DRAM, or slower NAND flash (or 3D XPoint SCM), Everspin nvNITRO enables high endurance write cache with persistent to compliment existing NAND flash as well as emerging 3D XPoint based storage. Keep an eye on Everspin as they are doing some interesting things for future discussions.

Google Cloud Services has added additional regions (cloud locations) and other enhancements.

HPE continued buying into server storage I/O data infrastructure technologies announcing an all cash (e.g. no stock) acquisition of Nimble Storage (NMBL). The cash acquisition for a little over $1B USD amounts to $12.50 USD per Nimble share, double what it had traded at. As a refresh, or overview, Nimble is an all flash shared storage system leverage NAND flash solid storage device (SSD) performance. Note that Nimble also partners with Cisco and Lenovo platforms that compete with HPE servers for converged systems.View additional perspectives here.

Riverbed has announced the release of Steelfusion 5 which while its name implies physical hardware metal, the solution is available as tin wrapped (e.g. hardware appliance) software. However the solution is also available for deployment as a VMware virtual appliance for remote office branch office (ROBO) among others. Enhancements include converged functionality such as NAS support along with network latency as well as bandwidth among other features.

Check out other industry news, comments, trends perspectives here.

Server StorageIOblog Posts

Recent and popular Server StorageIOblog posts include:

View other recent as well as past StorageIOblog posts here

Server StorageIO Commentary in the news

Recent Server StorageIO industry trends perspectives commentary in the news.

Via InfoStor: 8 Big Enterprise SSD Trends to Expect in 2017
Watch for increased capacities at lower cost, differentiation awareness of high-capacity, low-cost and lower performing SSDs versus improved durability and performance along with cost capacity enhancements for active SSD (read and write optimized). You can also expect increased support for NVMe both as a back-end storage device with different form factors (e.g., M.2 gum sticks, U.2 8639 drives, PCIe cards) as well as front-end (e.g., storage systems that are NVMe-attached) including local direct-attached and fiber-attached. This means more awareness around NVMe both as front-end and back-end deployment options.

Via SearchITOperations: Storage performance bottlenecks
Sometimes it takes more than an aspirin to cure a headache. There may be a bottleneck somewhere else, in hardware, software, storage system architecture or something else.

Via SearchDNS: Parsing through the software-defined storage hype
Beyond scalability, SDS technology aims for freedom from the limits of proprietary hardware.

Via InfoStor: Data Storage Industry Braces for AI and Machine Learning
AI could also lead to untapped hidden or unknown value in existing data that has no or little perceived value

Via SearchDataCenter: New options to evolve data backup recovery

View more Server, Storage and I/O trends and perspectives comments here

Various Tips, Tools, Technology and Tradecraft Topics

Recent Data Infrastructure Tradecraft Articles, Tips, Tools, Tricks and related topics.

Via ComputerWeekly: Time to restore from backup: Do you know where your data is?
Via IDG/NetworkWorld: Ensure your data infrastructure remains available and resilient
Via IDG/NetworkWorld: Whats a data infrastructure?

Check out Scott Lowe @Scott_Lowe of VMware fame who while having a virtual networking focus has a nice roundup of related data infrastructure topics cloud, open source among others.

Want to take a break from reading or listening to tech talk, check out some of the fun videos including aerial drone (and some technology topics) at www.storageio.tv.

View more tips and articles here

Events and Activities

Recent and upcoming event activities.

May 8-10, 2017 – Dell EMCworld – Las Vegas

April 3-7, 2017 – Seminars – Dutch workshop seminar series – Nijkerk Netherlands

March 15, 2017 – Webinar – SNIA/BrightTalkHyperConverged and Storage – 10AM PT

January 26 2017 – Seminar – Presenting at Wipro SDx Summit London UK

See more webinars and activities on the Server StorageIO Events page here.


Cheers
Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert (and vSAN). Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier) and twitter @storageio. Watch for the spring 2017 release of his new book Software-Defined Data Infrastructure Essentials(CRC Press).

Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO. All Rights Reserved.

NVMe Place NVM Non Volatile Memory Express Resources

Updated 8/31/19
NVMe place server Storage I/O data infrastructure trends

Welcome to NVMe place NVM Non Volatile Memory Express Resources. NVMe place is about Non Volatile Memory (NVM) Express (NVMe) with Industry Trends Perspectives, Tips, Tools, Techniques, Technologies, News and other information.

Disclaimer

Please note that this NVMe place resources site is independent of the industry trade and promoters group NVM Express, Inc. (e.g. www.nvmexpress.org). NVM Express, Inc. is the sole owner of the NVM Express specifications and trademarks.

NVM Express Organization
Image used with permission of NVM Express, Inc.

Visit the NVM Express industry promoters site here to learn more about their members, news, events, product information, software driver downloads, and other useful NVMe resources content.

 

The NVMe Place resources and NVM including SCM, PMEM, Flash

NVMe place includes Non Volatile Memory (NVM) including nand flash, storage class memories (SCM), persistent memories (PM) are storage memory mediums while NVM Express (NVMe) is an interface for accessing NVM. This NVMe resources page is a companion to The SSD Place which has a broader Non Volatile Memory (NVM) focus including flash among other SSD topics. NVMe is a new server storage I/O access method and protocol for fast access to NVM based storage and memory technologies. NVMe is an alternative to existing block based server storage I/O access protocols such as AHCI/SATA and SCSI/SAS devices commonly used for access Hard Disk Drives (HDD) along with SSD among other things.

Server Storage I/O NVMe PCIe SAS SATA AHCI
Comparing AHCI/SATA, SCSI/SAS and NVMe all of which can coexist to address different needs.

Leveraging the standard PCIe hardware interface, NVMe based devices (that have an NVMe controller) can be accessed via various operating systems (and hypervisors such as VMware ESXi) with both in the box drivers or optional third-party device drivers. Devices that support NVMe can be 2.5″ drive format packaged that use a converged 8637/8639 connector (e.g. PCIe x4) coexisting with SAS and SATA devices as well as being add-in card (AIC) PCIe cards supporting x4, x8 and other implementations. Initially, NVMe is being positioned as a back-end to servers (or storage systems) interface for accessing fast flash and other NVM based devices.

NVMe as back-end storage
NVMe as a “back-end” I/O interface for NVM storage media

NVMe as front-end server storage I/O interface
NVMe as a “front-end” interface for servers or storage systems/appliances

NVMe has also been shown to work over low latency, high-speed RDMA based network interfaces including RoCE (RDMA over Converged Ethernet) and InfiniBand (read more here, here and here involving Mangstor, Mellanox and PMC among others). What this means is that like SCSI based SAS which can be both a back-end drive (HDD, SSD, etc) access protocol and interface, NVMe can also being used for back-end can also be used as a front-end of server to storage interface like how Fibre Channel SCSI_Protocol (aka FCP), SCSI based iSCSI, SCSI RDMA Protocol via InfiniBand (among others) are used.

NVMe features

Main features of NVMe include among others:

  • Lower latency due to improve drivers and increased queues (and queue sizes)
  • Lower CPU used to handle larger number of I/Os (more CPU available for useful work)
  • Higher I/O activity rates (IOPs) to boost productivity unlock value of fast flash and NVM
  • Bandwidth improvements leveraging various fast PCIe interface and available lanes
  • Dual-pathing of devices like what is available with dual-path SAS devices
  • Unlock the value of more cores per processor socket and software threads (productivity)
  • Various packaging options, deployment scenarios and configuration options
  • Appears as a standard storage device on most operating systems
  • Plug-play with in-box drivers on many popular operating systems and hypervisors

Shared external PCIe using NVMe
NVMe and shared PCIe (e.g. shared PCIe flash DAS)

NVMe related content and links

The following are some of my tips, articles, blog posts, presentations and other content, along with material from others pertaining to NVMe. Keep in mind that the question should not be if NVMe is in your future, rather when, where, with what, from whom and how much of it will be used as well as how it will be used.

  • How to Prepare for the NVMe Server Storage I/O Wave (Via Micron.com)
  • Why NVMe Should Be in Your Data Center (Via Micron.com)
  • NVMe U2 (8639) vs. M2 interfaces (Via Gamersnexus)
  • Enmotus FuzeDrive MicroTiering (StorageIO Lab Report)
  • EMC DSSD D5 Rack Scale Direct Attached Shared SSD All Flash Array Part I (Via StorageIOBlog)
  • Part II – EMC DSSD D5 Direct Attached Shared AFA (Via StorageIOBlog)
  • NAND, DRAM, SAS/SCSI & SATA/AHCI: Not Dead, Yet! (Via EnterpriseStorageForum)
  • Non Volatile Memory (NVM), NVMe, Flash Memory Summit and SSD updates (Via StorageIOblog)
  • Microsoft and Intel showcase Storage Spaces Direct with NVM Express at IDF ’15 (Via TechNet)
  • MNVM Express solutions (Via SuperMicro)
  • Gaining Server Storage I/O Insight into Microsoft Windows Server 2016 (Via StorageIOblog)
  • PMC-Sierra Scales Storage with PCIe, NVMe (Via EEtimes)
  • RoCE updates among other items (Via InfiniBand Trade Association (IBTA) December Newsletter)
  • NVMe: The Golden Ticket for Faster Flash Storage? (Via EnterpriseStorageForum)
  • What should I consider when using SSD cloud? (Via SearchCloudStorage)
  • MSP CMG, Sept. 2014 Presentation (Flash back to reality – Myths and Realities – Flash and SSD Industry trends perspectives plus benchmarking tips)– PDF
  • Selecting Storage: Start With Requirements (Via NetworkComputing)
  • PMC Announces Flashtec NVMe SSD NVMe2106, NVMe2032 Controllers With LDPC (Via TomsITpro)
  • Exclusive: If Intel and Micron’s “Xpoint” is 3D Phase Change Memory, Boy Did They Patent It (Via Dailytech)
  • Intel & Micron 3D XPoint memory — is it just CBRAM hyped up? Curation of various posts (Via Computerworld)
  • How many IOPS can a HDD, HHDD or SSD do (Part I)?
  • How many IOPS can a HDD, HHDD or SSD do with VMware? (Part II)
  • I/O Performance Issues and Impacts on Time-Sensitive Applications (Via CMG)
  • Via EnterpriseStorageForum: 5 Hot Storage Technologies to Watch
  • Via EnterpriseStorageForum: 10-Year Review of Data Storage

Non-Volatile Memory (NVM) Express (NVMe) continues to evolve as a technology for enabling and improving server storage I/O for NVM including nand flash SSD storage. NVMe streamline performance enabling more work to be done (e.g. IOPs), data to be moved (bandwidth) at a lower response time using less CPU.

NVMe and SATA flash SSD performance

The above figure is a quick look comparing nand flash SSD being accessed via SATA III (6Gbps) on the left and NVMe (x4) on the right. As with any server storage I/O performance comparisons there are many variables and take them with a grain of salt. While IOPs and bandwidth are often discussed, keep in mind that with the new protocol, drivers and device controllers with NVMe that streamline I/O less CPU is needed.

Additional NVMe Resources

Also check out the Server StorageIO companion micro sites landing pages including thessdplace.com (SSD focus), data protection diaries (backup, BC/DR/HA and related topics), cloud and object storage, and server storage I/O performance and benchmarking here.

If you are in to the real bits and bytes details such as at device driver level content check out the Linux NVMe reflector forum. The linux-nvme forum is a good source if you are developer to stay up on what is happening in and around device driver and associated topics.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

Disclaimer

Disclaimer: Please note that this site is independent of the industry trade and promoters group NVM Express, Inc. (e.g. www.nvmexpress.org). NVM Express, Inc. is the sole owner of the NVM Express specifications and trademarks. Check out the NVM Express industry promoters site here to learn more about their members, news, events, product information, software driver downloads, and other useful NVMe resources content.

NVM Express Organization
Image used with permission of NVM Express, Inc.

Wrap Up

Watch for updates with more content, links and NVMe resources to be added here soon.

Ok, nuff said (for now)

Cheers
Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.