PCIe Fundamentals Server Storage I/O Network Essentials

Updated 8/31/19

PCIe Fundamentals Server Storage I/O Network Essentials

PCIe fundamentals data infrastructure trends

This piece looks at PCIe Fundamentals topics for server, storage, I/O network data infrastructure environments. Peripheral Computer Interconnect (PCI) Express aka PCIe is a Server, Storage, I/O networking fundamentals component. This post is an excerpt from chapter 4 (Chapter 4: Servers: Physical, Virtual, Cloud, and Containers) of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017) Available via Amazon.com and other global venues. In this post, we look various PCIe fundamentals to learn and expand or refresh your server, storage, and I/O and networking tradecraft skills experience.

PCIe fundamentals Server Storage I/O Fundamentals

PCIe fundamental common server I/O component

Common to all servers is some form of a main system board, which can range from a few square meters in supercomputers, data center rack, tower, and micro towers converged or standalone, to small Intel NUC (Next Unit of Compute), MSI and Kepler-47 footprint, or Raspberry Pi-type desktop servers and laptops. Likewise, PCIe is commonly found in storage and networking systems, appliances among other devices.

For example, a blade server will have multiple server blades or modules, each with its motherboard, which shares a common back plane for connectivity. Another variation is a large server such as an IBM “Z” mainframe, Cray, or another supercomputer that consists of many specialized boards that function similar to a smaller-sized motherboard on a larger scale.

Some motherboards also have mezzanine or daughter boards for attachment of additional I/O networking or specialized devices. The following figure shows a generic example of a two-socket, with eight-memory-channel-type server architecture.

PCIe fundamentals SDDC, SDI, SDDI Server fundamentals
Generic computer server hardware architecture. Source: Software Defined Data Infrastructure Essentials (CRC Press 2017)

The above figure shows several PCIe, USB, SAS, SATA, 10 GbE LAN, and other I/O ports. Different servers will have various combinations of processor, and Dual Inline Memory Module (DIMM) Dynamic RAM (DRAM) sockets along with other features. What will also vary are the type and some I/O and storage expansion ports, power and cooling, along with management tools or included software.

PCIe, Including Mini-PCIe, NVMe, U.2, M.2, and GPU

At the heart of many servers I/O and connectivity solutions are the PCIe industry-standard interface (see PCIsig.com). PCIe is used to communicate with CPUs and the outside world of I/O networking devices. The importance of a faster and more efficient PCIe bus is to support more data moving in and out of servers while accessing fast external networks and storage.

For example, a server with a 40-GbE NIC or adapter would have to have a PCIe port capable of 5 GB per second. If multiple 40-GbE ports are attached to a server, you can see where the need for faster PCIe interfaces come into play.

As more VM are consolidated onto PM, as applications place more performance demand either regarding bandwidth or activity (IOPS, frames, or packets) per second, more 10-GbE adapters will be needed until the price of 40-GbE (also 25, 50 or 100 Gbe) becomes affordable. It is not if, but rather when you will grow into the performance needs on either a bandwidth/throughput basis or to support more activity and lower latency per interface.

PCIe is a serial interface specified for how servers communicate between CPUs, memory, and motherboard-mounted as well as AiC devices. This communication includes support attachment of onboard and host bus adapter (HBA) server storage I/O networking devices such as Ethernet, Fibre Channel, InfiniBand, RapidIO, NVMe (cards, drives, and fabrics), SAS, and SATA, among other interfaces.

In addition to supporting attachment of traditional LAN, SAN, MAN, and WAN devices, PCIe is also used for attaching GPU and video cards to servers. Traditionally, PCIe has been focused on being used inside of a given server chassis. Today, however, PCIe is being deployed on servers spanning nodes in dual, quad, or CiB, CI, and HCI or Software Defined Storage (SDS) deployments. Another variation of PCIe today is that multiple servers in the same rack or proximity can attach to shared devices such as storage via PCIe switches.

PCIe components (hardware and software) include:

  • Hardware chipsets, cabling, connectors, endpoints, and adapters
  • Root complex and switches, risers, extenders, retimers, and repeaters
  • Software drivers, BIOS, and management tools
  • HBAs, RAID, SSD, drives, GPU, and other AiC devices
  • Mezzanine, mini-PCIe, M.2, NVMe U.2 (8639 drive form factor)

There are many different implementations of PCIe, corresponding to generations representing speed improvements as well as physical packing options. PCIe can be deployed in various topologies, including a traditional model where an AiC such as GbE or Fibre Channel HBA connects the server to a network or storage device.

Another variation is for a server to connect to a PCIe switch, or in a shared PCIe configuration between two or more servers. In addition to different generations and topologies, there are also various PCIe form factors and physical connectors (see the following figure), ranging from AiC of various length and height, as well as M.2 small-form-factor devices and U.2 (8639) drive form-factor device for NVMe, among others.

Note that the presence of M.2 does not guarantee PCIe NVMe, as it also supports SATA.

Likewise, different NVMe devices run at various PCIe speeds based on the number of lanes. For example, in the following figure, the U.2 (8639) device (looks like a SAS device) shown is a PCIe x4.

SDDC, SDI, SDDI PCIe NVMe U.2 8639 drive fundamentals
PCIe devices NVMe U.2, M.2, and NVMe AiC. (Source: StorageIO Labs.)

PCIe leverages multiple serial unidirectional point-to-point links, known as lanes, compared to traditional PCI, which used a parallel bus design. PCIe interfaces can have one (x1), four (x4), eight (x8), sixteen (x16), or thirty-two (x32) lanes for data movement. Those PCIe lanes can be full-duplex, meaning data is sent and received at the same time, providing improved effective performance.

PCIe cards are upward-compatible, meaning that an x4 can work in an x8, an x8 in an x16, and so forth. Note, however, that the cards will not perform any faster than their specified speed; an x4 in an x8 slot will only run at x8. PCIe cards can also have single, dual, or multiple external ports and interfaces. Also, note that there are still some motherboards with legacy PCI slots that are not interoperable with PCIe cards and vice versa.

Note that PCIe cards and slots can be mechanically x1, x4, x8, x16, or x32, yet electrically (or signal) wired to a slower speed, based on the type and capabilities of the processor sockets and corresponding chipsets being used. For example, you can have a PCIe x16 slot (mechanical) that is wired for x8, which means it will only run at x8 speed.

In addition to the differences between electrical and mechanical slots, also pay attention to what generation the PCIe slots are, such as Gen 2 or Gen 3 or higher. Also, some motherboards or servers will advertise multiple PCIe slots, but those are only active with a second or additional processor socket occupied by a CPU. For example, a PCIe card that has dual x4 external PCIe ports requiring full PCIe bandwidth will need at least PCIe x8 attachment in the server slot. In other words, for full performance, the external ports on a PCIe card or device need to match the external electrical and mechanical card type and vice versa.

Recall big “B” as in Bytes vs. little “b” as in bits; for example, a PCIe Gen 3 x4 electrical could provide up to 4 GB/s bandwidth (your mileage and performance will vary), which translates to 8 × 4 GB or 32 Gbits/s. In the following table below, there is a mix of Big “B” Bytes per second and small “b” bits per second.

Each generation of PCIe has improved on the previous one by increasing the effective speed of the links. Some of the speed improvements have come from faster clock rates while implementing lower overhead encoding (e.g., from 8 b/10 b to 128 b/130 b).

For example, PCIe Gen 3 raw bit or line rate is 8 GT/s or 8 Gbps or about 2 GBps by using a 128 b/130 b encoding scheme that is very efficient compared to PCIe Gen 2 or Gen 1, which used an 8 b/10 b encoding scheme. With 8 b/10 b, there is a 20% overhead vs. a 1.5% overhead with 128 b/130 b (i.e., of 130 bits sent, 128 bits contain data, and 2 bits are for overhead).

PCIe Gen 1

PCIe Gen 2

PCIe Gen 3

PCIe Gen 4

PCIe Gen 5

Raw bit rate

2.5 GT/s

5 GT/s

8 GT/s

16 GT/s

32 GT/s

Encoding

8 b/10 b

8 b/10 b

128 b/130 b

128 b/130 b

128 b/130 b

x1 Lane bandwidth

2 Gb/s

4 Gb/s

8 Gb/s

16 Gb/s

32 Gb/s

x1 Single lane (one-way)

~250 MB/s

~500 MB/s

~1 GB/s

~2 GB/s

~4GB/s

x16 Full duplex (both ways)

~8 GB/s

~16 GB/s

~32 GB/s

~64 GB/s

~128 GB/s

Above Table: PCIe Generation and Sample Lane Comparison

Note that PCIe Gen 3 is the currently generally available shipping technology with PCIe Gen 4 appearing in the not so distant future, with PCIe Gen 5 in the wings appearing a few more years down the road.

By contrast, older generations of Fibre Channel and Ethernet also used 8 b/10 b, having switched over to 64 b/66 b encoding with 10 Gb and higher. PCIe, like other serial interfaces and protocols, can support full-duplex mode, meaning that data can be sent and received concurrently.

PCIe Bit Rate, Encoding, Giga Transfers, and Bandwidth

Let’s clarify something about data transfer or movement both internal and external to a server. At the core of a server, there is data movement within the sockets of the processors and its cores, as well as between memory and other devices (internal and external). For example, the QPI bus is used for moving data between some Intel processors whose performance is specified in giga transfers (GT).

PCIe is used for moving data between processors, memory, and other devices, including internal and external facing devices. Devices include host bus adapters (HBAs), host channel adapters (HCAs), converged network adapters (CNAs), network interface cards (NICs) or RAID cards, and others. PCIe performance is specified in multiple ways, given that it has a server processor focus which involves GT for raw bit rate as well as effective bandwidth per lane.

Note to keep in perspective PCIe mechanical as well as electrical lanes in that a card or slot may be advertised as say x8 mechanical (e.g., its physical slot form factor) yet only be x4 electrical (how many of those lanes are used or enabled). Also in the case of an adapter that has two or more ports, if the device is advertised as x8 does that mean it is x8 per port or x4 per port with an x8 connection to the PCIe bus.

Effective bandwidth per lane can be specified as half- or full-duplex (data moving in one or both directions for send and receive). Also, effective bandwidth can be specified as a single lane (x1), four lanes (x4), eight lanes (x8), sixteen lanes (x16), or 32 lanes (x32), as shown in the above table. The difference in speed or bits moved per second between the raw bit or line rate, and the effective bandwidth per lane in a single direction (i.e., half-duplex) is the encoding that is common to all serial data transmissions.

When data gets transmitted, the serializer/deserializer, or serdes, convert the bytes into a bit stream via encoding. There are different types of encoding, ranging from 8 b/10 b to 64 b/66 b and 128 b//130 b, shown in the following table.

Single 1542-byte frame

64 × 1542-byte frames

Encoding Scheme

Overhead

Data Bits

Encoding Bits

Bits Transmitted

Data Bits

Encoding Bits

Bits Transferred

8 b/10 b

20%

12,336

3,084

15,420

789,504

197,376

986,880

64 b/66 b

3%

12,336

386

12,738

789,504

24,672

814,176

128 b/130 b

1.5%

12,336

194

12,610

789,504

12,336

801,840

Above Table: Low-Level Serial Encoding Data Transmit Efficiency

In these encoding schemes, the smaller number represents the amount of data being sent, and the difference is the overhead. Note that this is different yet related to what occurs at a higher level with the various network protocols such as TCP/IP (IP). With IP, there is a data payload plus addressing and other integrity and management features in a given packet or frame.

The 8-b/10-b, 64-b/66-b or 128-b/130-b encoding is at the lower physical layer. Thus, a small change there has a big impact and benefit when optimized. Table 4.2 shows comparisons of various encoding schemes using the example of moving a single 1542-byte packet or frame, as well as sending (or receiving) 64 packets or frames that are 1542 bytes in size.

Why 1542? That is a standard IP packet including data and protocol framing without using jumbo frames (MTU or maximum transmission units).

What does this have to do with PCIe? GbE, 10-GbE, 40-GbE, and other physical interfaces that are used for moving TCP/IP packets and frames interface with servers via PCIe.

This encoding is important as part of server storage I/O tradecraft regarding understanding the impact of performance and network or resource usage. It also means understanding why there are fewer bits per second of effective bandwidth (independent of compression or deduplication) vs. line rate in either half- or full-duplex mode.

Another item to note is that looking at encoding such as the example given in the above table shows how a relatively small change at a large scale can have a big effective impact benefit. If the bits and bytes encoding efficiency and effectiveness scenario in Table 4.2 do not make sense, then try imagining 13 MINI Cooper automobiles each with eight people in it (yes, that would be a tight fit) end to end on the same road.

Now imagine a large bus that takes up much less length on the road than the 13 MINI Coopers. The bus holds 128 people, who would still be crowded but nowhere near as cramped as eight people in a MINI, plus 24 additional people can be carried on the bus. That is an example of applying basic 8-b/10-b encoding (the MINI) vs. applying 128-b/130-b encoding (the bus) and is also similar to PCIe G3 and G4, which use 128-b/130-b encoding for data movement.

PCIe Topologies

The basic PCIe topology configuration has one or more devices attached to the root complex shown in the following figure via an AiC or onboard device connector. Examples of AiC and motherboard-mounted devices that attach to PCIe root include LAN or SAN HBA, networking, RAID, GPU, NVM or SSD, among others. At system start-up, the server initializes the PCIe bus and enumerates the devices found with their addresses.

PCIe devices attach (shown in the following figure) to a bus that communicates with the root complex that connects with processor CPUs and memory. At the other end of a PCIe device is an end-point target, a PCIe switch that in turn has end-point targets attached. From a software standpoint, hypervisor or operating system device drivers communicate with the PCI devices that in turn send or receive data or perform other functions.

SDDC, SDI, SDDI PCIe fundamentals
Basic PCIe root complex with a PCIe switch or expander.

Note that in addition to PCIe AiC such as HBAs, GPU, and NVM SSD, among others that install into PCIe slots, servers also have converged storage or disk drive enclosures that support a mix of SAS, SATA, and PCIe. These enclosure backplanes have a connector that attaches to a SAS or SATA onboard port, or a RAID card, as well as to a PCIe riser card or motherboard connector. Depending on what type of drive is installed in the connector, either the SAS, SATA, or NVMe (AiC, U.2, and M2) using PCIe communication paths are used.

In addition to traditional and switched PCIe, using PCIe switches as well as nontransparent bridging (NTB), various other configurations can be deployed. These include server to server for clustering, failover, or device sharing as well as fabrics. Note that this also means that while traditionally found inside a server, PCIe can today use an extender, retimer, and repeaters extended across servers within a rack or cabinet.

A nontransparent bridge (NTB) is a point-to-point connection between two PCIe-based systems that provide electrical isolation yet functions as a transport bridge between two different address domains. Hosts on either side of the NTB see their respective memory or I/O address space. The NTB presents an endpoint exposed to the local system where writes are mirrored to memory on the remote system to allow the systems to communicate and share devices using associated device drivers. For example, in the following figure, two servers, each with a unique PCIe root complex, address, and memory map, are shown using NTB to any communication between the systems while maintaining data integrity.

SDDC, SDI, SDDI PCIe two server fundamentals
PCIe dual server example using NTB along with switches.

General PCIe considerations (slots and devices) include:

  • Power consumption (and heat dissipation)
  • Physical and software plug-and-play (good interoperability)
  • Drivers (in-the-box, built into the OS, or add-in)
  • BIOS, UEFI, and firmware being current versions
  • Power draw per card or adapters
  • Type of processor, socket, and support chip (if not an onboard processor)
  • Electrical signal (lanes) and mechanical form factor per slot
  • Nontransparent bridge and root port (RP)
  • PCI multi-root (MR), single-root (SR), and hot plug
  • PCIe expansion chassis (internal or external)
  • External PCIe shared storage

Various operating system and hypervisor commands are available for viewing and managing PCIe devices. For example, on Linux, the “lspci” and “lshw–c pci” commands displays PCIe devices and associated information. On a VMware ESXi host, the “esxcli hardware pci list” command will show various PCIe devices and information, while on Microsoft Windows systems, “device manager” (GUI) or “devcon” (command line) will show similar information.

Who Are Some PCIe Fundamentals Vendors and Service Providers

While not an exhaustive list, here is a sampling of some vendors and service providers involved in various ways with PCIe from solutions to components to services to trade groups include Amphenol (connectors and cables), AWS (cloud data infrastructure services), Broadcom (PCIe components), Cisco (servers), DataOn (servers), Dell EMC (servers, storage, software), E8 (storage software), Excelero (storage software), HPE (storage, servers), Huawei (storage, servers), IBM, Intel (storage, servers, adapters), Keysight (test equipment and tools).

Others include Lenovo (servers), Liqid (composable data infrastructure), Mellanox (server and storage adapters), Micron (storage devices), Microsemi (PCIe components), Microsoft (Cloud and Software including S2D), Molex (connectors, cables), NetApp, NVMexpress.org (NVM Express trade group organizations), Open Compute Project (server, storage, I/O network industry group), Oracle, PCISIG (PCIe industry trade group), Samsung (storage devices), ScaleMP (composable data infrastructure), Seagate (storage devices), SNIA (industry trade group), Supermicro (servers), Tidal (composable data infrastructure), Vantar (formerly known as HDS), VMware (Software including vSAN), and WD among others.

Where To Learn More

Learn more about related technology, trends, tools, techniques, and tips with the following links.

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What This All Means

PCIe fundamentals are resources for building legacy and software-defined data infrastructures (SDDI), software-defined infrastructures (SDI), data centers and other deployments from laptop to large scale, hyper-scale cloud service providers. Learn more about Servers: Physical, Virtual, Cloud, and Containers in chapter 4 of my new book Software Defined Data Infrastructure Essentials (CRC Press 2017) Available via Amazon.com and other global venues. Meanwhile, PCIe fundamentals continues to evolve as a Server, Storage, I/O networking fundamental component.

Ok, nuff said, for now.
Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio.

Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO. All Rights Reserved.

Broadcom aka Avago aka LSI announces SAS SATA NVMe Adapters with RAID

server storage I/O trends

Broadcom aka Avago aka LSI announces SAS SATA NVMe Adapters with RAID

In case you missed it, Broadcom formerly known as Avago who bought the LSI adapter and RAID card business announced shipping new SAS, SATA and NVMe devices.

While SAS and SATA are well established continuing to be deployed for both HDD as well as flash SSD, NVMe continues to evolve with a bright future. Likewise, while there is a focus on software-defined storage (SDS), software defined data centers (SDDC) and software defined data infrastructures (SDDI) along with advanced parity RAID including erasure codes, object storage among other technologies, there is still a need for adapter cards including traditional RAID.

Keep in mind that while probably not meeting the definition of some software-defined aficionados, the many different variations, permutations along with derivatives of RAID from mirror and replication to basic parity to advanced erasure codes (some based on Reed Solomon aka RAID 2) rely on software. Granted, some of that software is run on regular primary server processors, some on packaged in silicon via ASICs or FPGAs, or System on Chips (SOC), RAID on Chip (RoC) as well as BIOS, firmware, drivers as well as management tools.

SAS, SATA and NVMe adapters

For some environments cards such as those announced by Broadcom are used in passthru mode effectively as adapters for attaching SAS, SATA and NVMe storage devices to servers. Those servers may be deployed as converged infrastructures (CI), hyper-converged infrastructures (HCI), Cluster or Cloud in Box (CiB) among other variations. To name names you might find the above (or in the not so distant future) in VMware vSAN or regular vSphere based environments, Microsoft Windows Server, Storage Spaces Direct (S2D) or Azure Stack, OpenStack among other deployments (check your vendors Hardware Compatibility Lists aka HCLs). In some cases these cards may be adapters in passthru mode, or using their RAID (support various by different software stacks). Meanwhile in other environments, the more traditional RAID features are still used spanning Windows to Linux among others.

Who Is Broadcom?

Some of you may know of Broadcom having been around for many years with a focus on networking related technologies. However some may not realize that Avago bought Broadcom and changed their name to Broadcom. Here is a history that includes more recent acquisitions such as Brocade, PLX, Emulex as well as LSI. Some of you may recall Avago buying LSI (the SAS, SATA, PCIe HBA, RAID and components) business not sold to NetApp as part of Engenio. Also recall that Avago sold the LSI flash SSD business unit to Seagate a couple of years ago as part of its streamlining. That’s how we get to where we are at today with Broadcom aka formerly known as Avago who bought the LSI adapter and RAID business announcing new SAS, SATA, NVMe cards.

What Was Announced?

Broadcom has announced cards that are multi-protocol supporting Serial Attached SCSI (SAS), SATA/AHCI as well as NVM Express (NVMe) as basic adapters for attaching storage (HDD, SSD, storage systems) along with optional RAID as well as cache support. These cards can be used in application servers for traditional, as well as virtualized SDDC environments, as well as storage systems or appliances for software-defined storage among other uses. The basic functionality of these cards is to provide high performance (IOPs and other activity, as well as bandwidth) along with low latency combined with data protection as well as dense connectivity.

Specific features include:

  • Broadcom’s Tri-Mode SerDes Technology enables the operation of NVMe, SAS or SATA devices in a single drive bay, allowing for endless design flexibility.
  • Management software including LSI Storage Authority (LSA), StorCLI, HII (UEFI)
  • Optional CacheVault(R) flash cache protection
  • Physical dimension Low Profile 6.127” x 2.712”
  • Host bus type x8 lane PCIe Express 3.1
  • Data transfer rates SAS-3 12Gbs; NVMe up to 8 GT/s PCIe Gen 3
  • Various OS and hypervisors host platform support
  • Warranty 3 yrs, free 5×8 phone support, advanced replacement option
  • RAID levels 0, 1, 5, 6, 10, 50, and 60

Note that some of the specific feature functionality may be available at a later date, check with your preferred vendors HCL

Specification

9480 8i8e

9440 8i

9460 8i

9460 16i

Image

Internal Ports

8

 

8

16

Internal Connectors

2 x Mini-SAS HD x4 SFF-8643

2 x Mini-SAS HD x4 SFF-8643

2 x Mini-SAS HD x4 SFF-8643

4 Mini-SAS HD x4
SFF-8643

External Ports

8

 

 

 

External Connectors

2 x Mini-SAS HD SFF8644

 

 

 

Cache Protection

CacheVault CVPM05

 

CacheVault CVPM05

CacheVault CVPM05

Cache Memory

2GB 2133 MHz DDR4 SDRAM

 

2GB 2133 MHz DDR4 SDRAM

4GB 2133 MHz DDR4 SDRAM

Devices Supported

SAS/SATA: 255, NVMe: 4 x4, up to 24 x2 or x4*

SAS/SATA: 63, NVMe: 4 x4, up to 24 x2 or x4*

SAS/SATA: 255, NVMe: 4 x4, up to 24 x2 or x4*

SAS/SATA: 255, NVMe: 4 x4, up to 24 x2 or x4*

I/O Processors (SAS Controller)

SAS3516 dual-core RAID-on-Chip (ROC)

SAS3408 I/O controller (IOC)

SAS3508 dual-core RAID-on-Chip (ROC)

SAS3516 dual-core RAID-on-Chip (ROC)

In case you need a refresher on SFF cable types, click on the following two images which take you to Amazon.com where you can learn more, as well as order various cable options. PC Pit Stop has a good selection of cables (See other SFF types), connectors and other accessories that I have used, along with those from Amazon.com and others.

Available via Amazon.com sff 8644 8643 sas mini hd cable
Left: SFF 8644 Mini SAS HD (External), Right SFF-8643 Mini SAS HD (internal) Image via Amazon.com

Available via Amazon.com sff 8644 8642 sas mini hd cable
Left: SFF 8643 Mini SAS HD (Internal), Right SFF-8642 SATA with power (internal) Image via Amazon.com

Wait, Doesnt NVMe use PCIe

For those who are not familiar with NVMe and in particular U.2 aka SFF 8639 based devices, physically they look the same (almost) as a SAS device connector. The slight variation is if you look at a SAS drive, there is a small tab to prevent plugging into a SATA port (recall you can plug SATA into SAS. For SAS drives that tab is blank, however on the NVMe 8639 aka U.2 drives (below left) that tab has several connectors which are PCIe x4 (single or dual path).

What this means is that the PCIe x4 bus electrical signals are transferred via a connector, to backplane chassis to 8639 drive slot to the drive. Those same 8639 drive slots can also have a SAS SATA connection using their traditional connectors enabling a converged or hybrid drive slot so to speak. Learn more about NVMe here (If the Answer is NVMe, then what were and are the questions?) as well as at www.thenvmeplace.com.

NVMe U.2 8639 driveNVMe U.2 8639 sas sata nvme drive
Left NVMe U.2 drive showing PCIe x4 connectors, right, NVMe U.2 8639 connector

Who Is This For?

These cards are applicable for general purpose IT and other data infrastructure environments in traditional servers among others uses. They are also applicable for systems builders, integrators and OEMs whom you may be buying your current systems from, or future ones.

Where to Learn More

The following are additional resources to learn more about vSAN and related technologies.

What this all means

Even as the industry continues to talk and move towards more software-defined focus, even for environments that are serverless, there is still need for hardware somewhere. These adapters are a good sign of the continued maturing cycle of NVMe to be well positioned into the next decade and beyond, while also being relevant today. Likewise, even though the future involves NVMe, there is a still a place for SAS along with SATA to coexist in many environments. For some environment there is a need for traditional RAID while for others simply the need for attachment of SAS, SATA and NVMe devices. Overall, a good set of updates, enhancements and new technology for today and tomorrow, now, when do I get some to play with? ;).

Ok, nuff said (for now…).

Cheers
Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert (and vSAN). Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Watch for the spring 2017 release of his new book "Software-Defined Data Infrastructure Essentials" (CRC Press).

Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2023 Server StorageIO(R) and UnlimitedIO. All Rights Reserved.

I/O Virtualization (IOV) Revisited

Is I/O Virtualization (IOV) a server topic, a network topic, or a storage topic (See previous post)?

Like server virtualization, IOV involves servers, storage, network, operating system, and other infrastructure resource management areas and disciplines. The business and technology value proposition or benefits of converged I/O networks and I/O virtualization are similar to those for server and storage virtualization.

Additional benefits of IOV include:

    • Doing more with what resources (people and technology) already exist or reduce costs
    • Single (or pair for high availability) interconnect for networking and storage I/O
    • Reduction of power, cooling, floor space, and other green efficiency benefits
    • Simplified cabling and reduced complexity for server network and storage interconnects
    • Boosting servers performance to maximize I/O or mezzanine slots
    • reduce I/O and data center bottlenecks
    • Rapid re-deployment to meet changing workload and I/O profiles of virtual servers
    • Scaling I/O capacity to meet high-performance and clustered application needs
    • Leveraging common cabling infrastructure and physical networking facilities

Before going further, lets take a step backwards for a few moments.

To say that I/O and networking demands and requirements are increasing is an understatement. The amount of data being generated, copied, and retained for longer periods of time is elevating the importance of the role of data storage and infrastructure resource management (IRM). Networking and input/output (I/O) connectivity technologies (figure 1) tie facilities, servers, storage tools for measurement and management, and best practices on a local and wide area basis to enable an environmentally and economically friendly data center.

TIERED ACCESS FOR SERVERS AND STORAGE
There is an old saying that the best I/O, whether local or remote, is an I/O that does not have to occur. I/O is an essential activity for computers of all shapes, sizes, and focus to read and write data in and out of memory (including external storage) and to communicate with other computers and networking devices. This includes communicating on a local and wide area basis for access to or over Internet, cloud, XaaS, or managed services providers such as shown in figure 1.

PCI SIG IOV (C) 2009 The Green and Virtual Data Center (CRC)
Figure 1 The Big Picture: Data Center I/O and Networking

The challenge of I/O is that some form of connectivity (logical and physical), along with associated software is required along with time delays while waiting for reads and writes to occur. I/O operations that are closest to the CPU or main processor should be the fastest and occur most frequently for access to main memory using internal local CPU to memory interconnects. In other words, fast servers or processors need fast I/O, either in terms of low latency, I/O operations along with bandwidth capabilities.

PCI SIG IOV (C) 2009 The Green and Virtual Data Center (CRC)
Figure 2 Tiered I/O and Networking Access

Moving out and away from the main processor, I/O remains fairly fast with distance but is more flexible and cost effective. An example is the PCIe bus and I/O interconnect shown in Figure 2, which is slower than processor-to-memory interconnects but is still able to support attachment of various device adapters with very good performance in a cost effective manner.

Farther from the main CPU or processor, various networking and I/O adapters can attach to PCIe, PCIx, or PCI interconnects for backward compatibility to support various distances, speeds, types of devices, and cost factors.

In general, the faster a processor or server is, the more prone to a performance impact it will be when it has to wait for slower I/O operations.

Consequently, faster servers need better-performing I/O connectivity and networks. Better performing means lower latency, more IOPS, and improved bandwidth to meet application profiles and types of operations.

Peripheral Component Interconnect (PCI)
Having established that computers need to perform some form of I/O to various devices, at the heart of many I/O and networking connectivity solutions is the Peripheral Component Interconnect (PCI) interface. PCI is an industry standard that specifies the chipsets used to communicate between CPUs and memory and the outside world of I/O and networking device peripherals.

Figure 3 shows an example of multiple servers or blades each with dedicated Fibre Channel (FC) and Ethernet adapters (there could be two or more for redundancy). Simply put the more servers and devices to attach to, the more adapters, cabling and complexity particularly for blade servers and dense rack mount systems.
PCI SIG IOV (C) 2009 The Green and Virtual Data Center (CRC)
Figure 3 Dedicated PCI adapters for I/O and networking devices

Figure 4 shows an example of a PCI implementation including various components such as bridges, adapter slots, and adapter types. PCIe leverages multiple serial unidirectional point to point links, known as lanes, in contrast to traditional PCI, which used a parallel bus design.

PCI SIG IOV (C) 2009 The Green and Virtual Data Center (CRC)

Figure 4 PCI IOV Single Root Configuration Example

In traditional PCI, bus width varied from 32 to 64 bits; in PCIe, the number of lanes combined with PCIe version and signaling rate determine performance. PCIe interfaces can have 1, 2, 4, 8, 16, or 32 lanes for data movement, depending on card or adapter format and form factor. For example, PCI and PCIx performance can be up to 528 MB per second with a 64 bit, 66 MHz signaling rate, and PCIe is capable of over 4 GB (e.g., 32 Gbit) in each direction using 16 lanes for high-end servers.

The importance of PCIe and its predecessors is a shift from multiple vendors’ different proprietary interconnects for attaching peripherals to servers. For the most part, vendors have shifted to supporting PCIe or early generations of PCI in some form, ranging from native internal on laptops and workstations to I/O, networking, and peripheral slots on larger servers.

The most current version of PCI, as defined by the PCI Special Interest Group (PCISIG), is PCI Express (PCIe). Backwards compatibility exists by bridging previous generations, including PCIx and PCI, off a native PCIe bus or, in the past, bridging a PCIe bus to a PCIx native implementation. Beyond speed and bus width differences for the various generations and implementations, PCI adapters also are available in several form factors and applications.

Traditional PCI was generally limited to a main processor or was internal to a single computer, but current generations of PCI Express (PCIe) include support for PCI Special Interest Group (PCI) I/O virtualization (IOV), enabling the PCI bus to be extended to distances of a few feet. Compared to local area networking, storage interconnects, and other I/O connectivity technologies, a few feet is very short distance, but compared to the previous limit of a few inches, extended PCIe provides the ability for improved sharing of I/O and networking interconnects.

I/O VIRTUALIZATION(IOV)
On a traditional physical server, the operating system sees one or more instances of Fibre Channel and Ethernet adapters even if only a single physical adapter, such as an InfiniBand HCA, is installed in a PCI or PCIe slot. In the case of a virtualized server for example, Microsoft HyperV or VMware ESX/vSphere the hypervisor will be able to see and share a single physical adapter, or multiple adapters, for redundancy and performance to guest operating systems. The guest systems see what appears to be a standard SAS, FC or Ethernet adapter or NIC using standard plug-and-play drivers.

Virtual HBA or virtual network interface cards (NICs) and switches are, as their names imply, virtual representations of a physical HBA or NIC, similar to how a virtual machine emulates a physical machine with a virtual server. With a virtual HBA or NIC, physical NIC resources are carved up and allocated as virtual machines, but instead of hosting a guest operating system like Windows, UNIX, or Linux, a SAS or FC HBA, FCoE converged network adapter (CNA) or Ethernet NIC is presented.

In addition to virtual or software-based NICs, adapters, and switches found in server virtualization implementations, virtual LAN (VLAN), virtual SAN (VSAN), and virtual private network (VPN) are tools for providing abstraction and isolation or segmentation of physical resources. Using emulation and abstraction capabilities, various segments or sub networks can be physically connected yet logically isolated for management, performance, and security purposes. Some form of routing or gateway functionality enables various network segments or virtual networks to communicate with each other when appropriate security is met.

PCI-SIG IOV
PCI SIG IOV consists of a PCIe bridge attached to a PCI root complex along with an attachment to a separate PCI enclosure (Figure 5). Other components and facilities include address translation service (ATS), single-root IOV (SR IOV), and multiroot IOV (MR IOV). ATS enables performance to be optimized between an I/O device and a servers I/O memory management. Single root, SR IOV enables multiple guest operating systems to access a single I/O device simultaneously, without having to rely on a hypervisor for a virtual HBA or NIC.

PCI SIG IOV (C) 2009 The Green and Virtual Data Center (CRC)

Figure 5 PCI SIG IOV

The benefit is that physical adapter cards, located in a physically separate enclosure, can be shared within a single physical server without having to incur any potential I/O overhead via virtualization software infrastructure. MR IOV is the next step, enabling a PCIe or SR IOV device to be accessed through a shared PCIe fabric across different physically separated servers and PCIe adapter enclosures. The benefit is increased sharing of physical adapters across multiple servers and operating systems not to mention simplified cabling, reduced complexity and resource utilization.

PCI SIG IOV (C) 2009 The Green and Virtual Data Center (CRC)
Figure 6 PCI SIG MR IOV

Figure 6 shows an example of a PCIe switched environment, where two physically separate servers or blade servers attach to an external PCIe enclosure or card cage for attachment to PCIe, PCIx, or PCI devices. Instead of the adapter cards physically plugging into each server, a high performance short-distance cable connects the servers PCI root complex via a PCIe bridge port to a PCIe bridge port in the enclosure device.

In figure 6, either SR IOV or MR IOV can take place, depending on specific PCIe firmware, server hardware, operating system, devices, and associated drivers and management software. For a SR IOV example, each server has access to some number of dedicated adapters in the external card cage, for example, InfiniBand, Fibre Channel, Ethernet, or Fibre Channel over Ethernet (FCoE) and converged networking adapters (CNA) also known as HBAs. SR IOV implementations do not allow different physical servers to share adapter cards. MR IOV builds on SR IOV by enabling multiple physical servers to access and share PCI devices such as HBAs and NICs safely with transparency.

The primary benefit of PCI IOV is to improve utilization of PCI devices, including adapters or mezzanine cards, as well as to enable performance and availability for slot-constrained and physical footprint or form factor-challenged servers. Caveats of PCI IOV are distance limitations and the need for hardware, firmware, operating system, and management software support to enable safe and transparent sharing of PCI devices. Examples of PCIe IOV vendors include Aprius, NextIO and Virtensys among others.

InfiniBand IOV
InfiniBand based IOV solutions are an alternative to Ethernet-based solutions. Essentially, InfiniBand approaches are similar, if not identical, to converged Ethernet approaches including FCoE, with the difference being InfiniBand as the network transport. InfiniBand HCAs with special firmware are installed into servers that then see a Fibre Channel HBA and Ethernet NIC from a single physical adapter. The InfiniBand HCA also attaches to a switch or director that in turn attaches to Fibre Channel SAN or Ethernet LAN networks.

The value of InfiniBand converged networks are that they exist today, and they can be used for consolidation as well as to boost performance and availability. InfiniBand IOV also provides an alternative for those who do not choose to deploy Ethernet.

From a power, cooling, floor-space or footprint standpoint, converged networks can be used for consolidation to reduce the total number of adapters and the associated power and cooling. In addition to removing unneeded adapters without loss of functionality, converged networks also free up or allow a reduction in the amount of cabling, which can improve airflow for cooling, resulting in additional energy efficiency. An example of a vendor using InfiniBand as a platform for I/O virtualization is Xsigo.

General takeaway points include the following:

  • Minimize the impact of I/O delays to applications, servers, storage, and networks
  • Do more with what you have, including improving utilization and performance
  • Consider latency, effective bandwidth, and availability in addition to cost
  • Apply the appropriate type and tiered I/O and networking to the task at hand
  • I/O operations and connectivity are being virtualized to simplify management
  • Convergence of networking transports and protocols continues to evolve
  • PCIe IOV is complimentary to converged networking including FCoE

Moving forward, a revolutionary new technology may emerge that finally eliminates the need for I/O operations. However until that time, or at least for the foreseeable future, several things can be done to minimize the impacts of I/O for local and remote networking as well as to simplify connectivity.

PCIe Fundamentals Server Storage I/O Network Essentials

Learn more about IOV, converged networks, LAN, SAN, MAN and WAN related topics in Chapter 9 (Networking with your servers and storage) of The Green and Virtual Data Center (CRC) as well as in Resilient Storage Networks: Designing Flexible Scalable Data Infrastructures (Elsevier).

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Brocade to Buy Foundry Networks – Prelude to Upcoming Converged Ethernet and FCoE Battle

Storage I/O trends

The emerging and maturing Fibre Channel over Ethernet (FCoE) and Converged Ethernet, aka Data Center Ethernet, Converged Enhanced Ethernet, Enterprise Ethernet among others marketing names activity is picking up. Today Brocade took a major step to shore up its already announced FCoE and converged Ethernet story which includes new directors and converged host bus adapters
by announcing intentions of buying

Ethernet high performance switching vendor Foundry Networks in a deal valued around $3B USD and some change. Not a bad deal for Foundry, some would say an expensive deal for Brocade, perhaps paying to much, however given some of the recent storage and networking related deals. For example IBM spending around $300M for a startup called XIV who claims to have shipped a few storage systems to a few customers, or, Dell spending about $1.3B to buy EqualLogic who had a few thousand customers (Could be the deal of the century for Dell compared to IBM and XIV, however time will tell), or EMC and some of its recent purchases like RSA, Avamar or bargains like WysDM, Mozy and Iomega not to mention Cisco having not been bashful about dropping some serious coin for standalone companies like Neuspeed (where are they now) for iSCSI as well as Andimao and more recently Nuovo. Regardless of if Mike Klayko (Brocade CEO) paid too much or not, he did what he had to do as part of his continuing activities to re-invent Brocade and leverage their core DNA and business focus of data infrastructures.

Brocade could probably have made a nice business for a few more years like some of the companies they have recently acquired tried to do including McData, CNT, INRANGE and so forth. However the reality is that sooner or later, they too (Brocade) would probably have been acquired by someone perhaps. With the acquisition of Foundry Networks, along with previous announcements for FCoE technologies and their existing products for NAS or file based storage management and iSCSI solutions, Brocade is signaling that they want to fight for survival as opposed to circle the wagons and guard their installed base and wheel house.

With the up-coming Converged Ethernet and FCoE battle royal shaping up to start in about 12 to 18 months, sooner for the early adopters who like to test and kick around technology early, or for those who want to go right to 10GbE today instead of 8Gb Fibre Channel, or, for those who like bleeding edge solutions. The reality even with recent proof of life plug-fest demos and claims of being ready for primetime, core Brocade customers particularly at the high-end of the market tend to be rather risk averse and cautions with their data infrastructure thus moving at a slower pace. For them, upgrading to 8Gb Fibre Channel may be the near term future while watching FCoE and converged Ethernet or converged enhanced Ethernet evolve and being transitioning in a couple of years. For these risk adverse type customers, bleeding edge technology means having a blood bank nearby and on call as downtime and disruption is not an option.

Rest assured, with Ciscopushing hard to stimulate the FCoE market and get people to skip 8Gb FC and switch over to 10GbE, there will be plenty more plug fest and proof of life demos, plenty of trash talking by both sides that will rival some of the best heavy weight match-ups.

Buyers beware, do your home work and if being an early adopter of FCoE and converged networks is right for you, with due diligence do some testing to see how everything really works in your environment from storage systems, to adapters, to switches, to protocol converters and gateways to management and diagnostic software. How does the whole ecosystem that matches your environment work for your scenario. If you are not comfortable with where the FCoE and converged Ethernet technologies and more importantly supporting ecosystem are at, take your time, monitor the situation as it unfolds over the next year or so leading up to the big battle royal between Brocade and Cisco.

Something that I think is interesting is that here we have Brocade and Cisco squaring off in a convergence battle between a general networking vendor (Cisco) and storage centric networking vendor (Brocade), both of whom have been built on organic growth as well as acquisitions. What?s even more interesting is that around 10 years ago back when Brocade was just getting started and Cisco was still trying to figure out Fibre Channel and iSCSI, 3COM had at the time the foresight to put together an alliance of Storage related partners to get into the then emerging SAN market place. The alliance was to include various storage vendors, switch and HBA as well as router or gateway vendors along with data and backup software vendors. Before the program could be officially launched, it was canceled just as all of the promotional material was about to be distributed due to poor finical health of 3COM. With a few exceptions, most of the participants in that early program, which was probably a year or two ahead of its time have either been bought or disappeared altogether. 3COM could have been a major force in a converged LAN and SAN market place instead of now watching Brocade and Cisco form the sidelines.

For now, congratulations to Mike Klayko and crew for demonstrating that they want to put up a fight and provide an alternative for their customers to Cisco and that they are serious about being a serious contender in the data infrastructure solution provider fight. For Cisco, looks like two of your competitors have now become one. Good luck and best wishes to both sides, Brocade and Cisco and I will be watching this battle from ring side as both parties line up and re-align their partner ecosystems.

Cheers
gs