little data Archives

November 14, 2013December 4, 2019

Part II: EMC announces XtremIO General Availability, speeds and feeds

Storage I/O trends

XtremIO flash SSD more than storage I/O speed

Following up part I of this two-part series, here are more more details, insights and perspectives about EMC XtremIO and it’s generally availability that were announced today.

XtremIO the basics

All flash Solid State Device (SSD) based solution
Cluster of up to four X-Brick nodes today
X-Bricks available in 10TB increments today, 20TB in January 2014
25 eMLC SSD drives per X-Brick with redundant dual processor controllers
Provides server-side iSCSI and Fibre Channel block attachment
Integrated data footprint reduction (DFR) including global dedupe and thin provisioning
Designed for extending duty cycle, minimizing wear of SSD
Removes need for dedicated hot spare drives
Capable of sustained performance and availability with multiple drive failure
Only unique data blocks are saved, others tracked via in-memory meta data pointers
Reduces overhead of data protection vs. traditional small RAID 5 or RAID 6 configurations
Eliminates overhead of back-end functions performance impact on applications
Deterministic storage I/O performance (IOPs, latency, bandwidth) over life of system

When would you use XtremIO vs. another storage system?

If you need all enterprise like data services including thin provisioning, dedupe, resiliency with deterministic performance on an all-flash system with raw capacity from 10-40TB (today) then XtremIO could be a good fit. On the other hand, if you need a mix of SSD based storage I/O performance (IOPS, latency or bandwidth) along with some HDD based space capacity, then a hybrid or traditional storage system could be the solution. Then there are hybrid scenarios where a hybrid storage system, array or appliance (mix of SSD and HDD) are used for most of the applications and data, with an XtremIO handling more tasks that are demanding.

How does XtremIO compare to others?

EMC with XtremIO is taking a different approach than some of their competitors whose model is to compare their faster flash-based solutions vs. traditional mid-market and enterprise arrays, appliances or storage systems on a storage I/O IOP performance basis. With XtremIO there is improved performance measured in IOPs or database transactions among other metrics that matter. However there is also an emphasis on consistent, predictable, quality of service (QoS) or what is known as deterministic storage I/O performance basis. This means both higher IOPs with lower latency while doing normal workload along with background data services (snapshots, data footprint reduction, etc).

Some of the competitors focus on how many IOPs or work they can do, however without context or showing impact to applications when back-ground tasks or other data services are in use. Other differences include how cluster nodes are interconnected (for scale out solutions) such as use of Ethernet and IP-based networks vs dedicated InfiniBand or PCIe fabrics. Host server attachment will also differ as some are only iSCSI or Fibre Channel block, or NAS file, or give a mix of different protocols and interfaces.

An industry trend however is to expand beyond the flash SSD need for speed focus by adding context along with QoS, deterministic behavior and addition of data services including snapshots, local and remote replication, multi-tenancy, metering and metrics, security among other items.

Storage I/O trends

Who or what are XtremIO competition?

To some degree vendors who only have PCIe flash SSD cards might place themselves as the alternative to all SSD or hybrid mixed SSD and HDD based solutions. FusionIO used to take that approach until they acquired NexGen (a storage system) and now have taken a broader more solution balanced approach of use the applicable tool for the task or application at hand.

Other competitors include the all SSD based storage arrays, systems or appliance vendors which includes legacy existing as well as startups vendors that include among others IBM who bought TMS (flashsystems), NetApp (EF540), Solidfire, Pure, Violin (who did a recent IPO) and Whiptail (bought by Cisco). Then there are the hybrid which is a long list including Cloudbyte (software), Dell, EMCs other products, HDS, HP, IBM, NetApp, Nexenta (Software), Nimble, Nutanix, Oracle, Simplivity and Tintri among others.

What’s new with this XtremIO announcement

10TB X-Bricks enable 10 to 40TB (physical space capacity) per cluster (available on 11/19/13). 20TB X-Bricks (larger capacity drives) will double the space capacity in January 2014. If you are doing the math, that means either a single brick (dual controller) system, or up to four bricks (nodes, each with dual controllers) configurations. Common across all system configurations are data features such as thin provisioning, inline data footprint reduction (e.g. dedupe) and XtremIO Data Protection (XDP).

What does XtremIO look like?

XtremIO consists of up to four nodes (today) based on what EMC calls X-Bricks.
EMC XtremIO X-Brick
25 SSD drive X-Brick

Each 4U X-Brick has 25 eMLC SSD drives in a standard EMC 2U DAE (disk enclosure) like those used with the VNX and VMAX for SSD and Hard Disk Drives (HDD). In addition to the 2U drive shelve, there are a pair of 1U storage processors (e.g. controllers) that give redundancy and shared access to the storage shelve.

XtremIO Architecture
XtremIO X-Brick block diagram

XtremIO storage processors (controllers) and drive shelve block diagram. Each X-Brick and their storage processors or controllers communicate with each other and other X-Bricks via a dedicated InfiniBand using Remote Direct Memory Access (RDMA) fabric for memory to memory data transfers. The controllers or storage processors (two per X-Brick) each have dual processors with eight cores for compute, along with 256GB of DRAM memory. Part of each controllers DRAM memory is set aside as a mirror its partner or peer and vise versa with access being over the InfiniBand fabric.

XtremIO fabric
XtremIO X-Brick four node fabric cluster or instance

How XtremIO works

Servers access XtremIO X-Bricks using iSCSI and Fibre Channel for block access. A responding X-Brick node handles the storage I/O request and in the case of a write updates other nodes. In the case of a write, the handling node or controller (aka storage processor) checks its meta data map in memory to see if the data is new and unique. If so, the data gets saved to SSD along with meta data information updated across all nodes. Note that data gets ingested and chunked or sharded into 4KB blocks. So for example if a 32KB storage I/O request from the server arrives, that is broken (e.g. chunk or shard) into 8 4KB pieces each with a mathematical unique fingerprint created. This fingerprint is compared to what is known in the in memory meta data tables (this is a hexadecimal number compare so a quick operation). Based on the comparisons if unique the data is saved and pointers created, if already exists, then pointers are updated.

In addition to determining if unique data, the fingerprint is also used for generate a balanced data dispersal plan across the nodes and SSD devices. Thus there is the benefit of reducing duplicate data during ingestion, while also reducing back-end IOs within the XtremIO storage system. Another byproduct is the reduction in time spent on garbage collection or other background tasks commonly associated with SSD and other storage systems.

Meta data is kept in memory with a persistent copied written to reserved area on the flash SSD drives (think of as a vault area) to support and keep system state and consistency. In between data consistency points the meta data is kept in a log journal like how a database handles log writes. What’s different from a typical database is that XtremIO XIOS platform software does these consistency point writes for persistence on a granularity of seconds vs. hours or minutes.

Storage I/O trends

What about rumor that XtremIO can only do 4KB IOPs?

Does this mean that the smallest storage I/O or IOP that XtremIO can do is 4GB?

That is a rumor or some fud I have heard floated by a competitor (or two or three) that assumes if only 4KB internal chunk or shard being used for processing, that must mean no IOPs smaller than 4KB from a server.

XtremIO can do storage I/O IOP sizes of 512 bytes (e.g. the standard block size) as do other systems. Note that the standard server storage I/O block or IO size is 512 bytes or multiples of that unless the new 4KB advanced format (AF) block size being used which based on my conversations with EMC, AF is not supported, yet. (Updated 11/15/13 EMC has indicated that host (front-end) 4K AF support, along with 512 byte emulation modes are available now with XIOS). Also keep in mind that since XtremIO XIOS internally is working with 4KB chunks or shards, that is a stepping stone for being able to eventually leverage back-end AF drive support in the future should EMC decide to do so (Updated 11/15/13 Waiting for confirmation from EMC about if back-end AF support is now enabled or not, will give more clarity as it is recieved).

What else is EMC doing with XtremIO?

VCE Vblock XtremIO systems for SAP HANA (and other databases) in memory databases along with VDI optimized solutions.
VPLEX and XtremIO for extended distance local, metro and wide area HA, BC and DR.
EMC PowerPath XtremIO storage I/O path optimization and resiliency.
Secure Remote Support (aka phone home) and auto support integration.

Boosting your available software license minutes (ASLM) with SSD

Another use of SSD has been in the past the opportunity to make better use of servers stretching their usefulness or delaying purchase of new ones by improving their effective use to do more work. In the past this technique of using SSDs to delay a server or CPU upgrade was used when systems when hardware was more expensive, or during the dot com bubble to fill surge demand gaps. This has the added benefit of stretching database and other expensive software licenses to go further or do more work. The less time servers spend waiting for IOP’s means more time for doing useful work and bringing value of the software license. Otoh, the more time spent waiting is lot available software minutes which is cost overhead.

Think of available software licence minutes (ASLM) in terms of available software license minutes where if doing useful work your software is providing value. On the other hand if those minutes are not used for useful work (e.g. spent waiting or lost due to CPU or server or IO wait, then they are lost). This is like airlines and available seat miles (ASM) metric where if left empty it’s a lost opportunity, however if used, then value, not to mention if yield management applied to price that seat differently. To make up for that loss many organizations have to add extra servers and thus more software licensing costs.

Storage I/O trends

Can we get a side of context with them metrics?

EMC along with some other vendors are starting to give more context with their storage I/O performance metrics that matter than simple IOP’s or Hero Marketing Metrics. However context extends beyond performance to also availability and space capacity which means data protection overhead. As an example, EMC claims 25% for RAID 5 and 20% for RAID 6 or 30% for RAID 5/RAID 6 combo where a 25 drive (SSD) XDP has a 8% overhead. However this assumes a 4+1 (5 drive) RAID , not apples to apples comparison on a space overhead basis. For example a 25 drive RAID 5 (24+1) would have around an 4% parity protection space overhead or a RAID 6 (23+2) about 8%.

Granted while the space protection overhead might be more apples to apples with the earlier examples to XDP, there are other differences. For example solutions such as XDP can be more tolerant to multiple drive failures with faster rebuilds than some of the standard or basic RAID implementations. Thus more context and clarity would be helpful.

StorageIO would like see vendors including EMC along with startups who give data protection space overhead comparisons without context to do so (and applaud those who provide context). This means providing the context for data protection space overhead comparisons similar to performance metrics that matter. For example simply state with an asterisk or footnote comparing a 4+1 RAID 5 vs. a 25 drive erasure or forward error correction or dispersal or XDP or wide stripe RAID for that matter (e.g. can we get a side of context). Note this is in no way unique to EMC and in fact quite common with many of the smaller startups as well as established vendors.

General comments

My laundry list of items which for now would be nice to have’s, however for you might be need to have would include native replication (today leverages Recover Point), Advanced Format (4KB) support for servers (Updated 11/15/13 Per above, EMC has confirmed that host/server-side (front-end) AF along with 512 byte emulation modes exist today), as well as SSD based drives, DIF (Data Integrity Feature), and Microsoft ODX among others. While 12Gb SAS server to X-Brick attachment for small in the cabinet connectivity might be nice for some, more practical on a go forward basis would be 40GbE support.

Now let us see what EMC does with XtremIO and how it competes in the market. One indicator to watch in the industry and market of the impact or presence of EMC XtremIO is the amount of fud and mud that will be tossed around. Perhaps time to make a big bowl of popcorn, sit back and enjoy the show…

Ok, nuff said (for now).

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)

November 14, 2013January 30, 2020

EMC announces XtremIO General Availability (Part I)

Storage I/O trends

EMC announces XtremIO flash SSD General Availability

EMC announced today the general availability (GA) if the all flash Solid State Device (SSD) XtremIO that they acquired a little over a year ago. Earlier this year EMC announced directed availability (DA) of the EMC version of XtremIO as part of other SSD hardware and software updates (here and here). The XtremIO GA announcement also follows that of the VNX2 or MCx released in September of this year that also has flash SSD enhancements along with doing more with available resources.

EMC XtremIO flash SSD boosting storage I/O performance

As an industry trend, the question is not if SSD is in your future, rather where, when, how much, what to use along with coexistence to complement Hard Disk Drive (HDD) based solutions in some environments. This also means that SSD is like real estate where location matters, not to mention having different types of technologies, packaging, solutions to meet various needs (and price points). This all ties back to the best server and storage I/O or IOP is the one that you do know have to do, the second best is the one with the least impact and best application benefit.

From industry adoption to customer deployment

EMC has evolved the XtremIO platform from a pre-acquisition solution to an first EMC version that was offered to an early set of customers e.g. DA.

I suspect that the DA was as much a focus on getting early customer feedback, addressing immediate needs or opportunities as wells as getting the EMC sales and marketing teams messaging, marching orders aligned and deployed. The latter would be rather important to decrease or avoid the temptation to cannibalize existing product sales with the shiny new technology (SNT). Likewise, it would be important for EMC to not create isolated pockets or fenced off products as some other vendors often do.

EMC XtremIO X-Brick
25 SSD drive X-Brick

What is being announced?

General availability vs. directed or limited availability
Version 2.2 of the XIOS platform software
Integrating with EMC support and service tools

Let us get back go this announcement and XtremIO of which EMC has indicated that they have several customers who have now done either $1M or $5M USD deals. EMC has claimed over 1.5 PBytes have been booked and deployed, or with data footprint reduction (DFR) including dedupe over 10PB effective capacity. Note that for those who are focused on dedupe or DFR reduction ratios 10:1.5 may not be as impressive as seen with some backup solutions, however keep in mind that this is for primary high performance storage vs. secondary or tertiary storage devices.

As part of this announcement, EMC has also release V2.2 of the XtremIO platform software (XIOS). Hence a normal new product should start with a version 1.0 at launch, however as explained this is both a new version of the technology as well as the initial GA by EMC.

Also as part of this announcement, EMC is making available XtremIO 10TB X-Bricks with 25 eMLC SSD drives each, along with dual controllers (storage processors). EMC has indicated that it will make available a 20TB X-Brick using larger capacity SSD drives in January 2014. Note that the same type of SSD drives must be used in the systems. Currently there can be up to four X-Bricks per XtremIO cluster or instance that are interconnected using a dedicated InfiniBand Fabric. Application servers access the XtremIO X-Bricks using standard Fibre Channel or Ethernet and IP based iSCSI. In addition to the hardware platform items, the XtremIO platform software (XIOS) includes built-in on the fly data footprint reduction (DFR) using global dedupe during data ingestion and placement. Other features include thin provisioning, VMware VAII, data protection and self-balancing data placement.

Storage I/O trends

Who or what applications are XtremIO being positioned for?

Some of XtremIO industry sectors include:

Financial and insurance services
Medical, healthcare and life sciences
Manufacturing, retail and warehouse management
Government and defense
Media and entertainment

Application and workload focus:

VDI including replacing linked clones with ability to do full clone without overhead
Server virtualization where aggregation causes aggravation with many mixed IOPs
Database for reducing latency, boosting IOPs as well as improving software license costs.

Databases such as IBM DB2, Oracle RAC, Microsoft SQLserver and MySQL among others have traditionally for decades been a prime opportunity for SSD (DRAM and flash). This also includes newer NoSQL or key value stores and meta data repositories for object such as Mongo, Hbase, Cassandra, Riak among others. Typical focus includes placing entire instances, or specific files and objects such as indices, journals and redo logs, import/export temp or scratch space, message queries and high activity tables among others.

What about overlap with other EMC products?

If you simply looked at the above list of sectors (among others) or applications, you could easily come to a conclusion that there is or would be overlap. Granted in some environments there will be which means XtremIO (or other vendors solutions) may be the primary storage solution. On the other hand since everything is not the same in most data centers or information factories, there will be a mix of storage systems handling various tasks. This is where EMC will need to be careful learning what they did during DA on where to place XtremIO and how to positing to complement when and where needed other solutions, or as applicable being a replacement.

XtremIO Announcement Summary

All flash SSD storage solution with iSCSI and Fibre Channel server attachment
Scale out and scale up performance while keeping latency low and deterministic
Enhanced flash duty cycle (wear leveling) to increase program / erase (P/E) cycles durability
Can complement other storage systems, arrays or appliances or function as a standalone
Coexists and complements host side caching hardware and software
Inline always on data footprint reduction (DFR) including dedupe (global dedupe without performance compromise), space saving snapshots and copies along with thin provisioning

Storage I/O trends

Some General Comment and Perspectives

Overall, XtremIO gives EMC and their customers, partners and prospects a new technology to use and add to their toolbox for addressing various challenges. SSD is in your future, when, where, with what and how are questions not to mention how much. After all, a bit of flash SSD in the right location used effectively can have a large impact. On the other hand, a lot of flash SSD in the wrong place or not used effectively will cost you lots of cash. Key for EMC and their partners will be to articulate clearly, where XtremIO fits vs. other solutions without adding complexity.

Checkout part II of this series to learn more about XtremIO including what it is, how it works, competition and added perspectives.

Ok, nuff said (for now).

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)

August 20, 2013January 23, 2019

IBM Server Side Storage I/O SSD Flash Cache Software

Storage I/O trends

IBM Server Side Storage I/O SSD Flash Cache Software

As I often say, the best server storage I/O or IOP is the one that you do not have to do. The second best storage I/O or IOP is the one with least impact or that can be done in a cost-effective way. Likewise the question is not if solid-state device (SSD) including nand flash are in your future, rather when, where, why, with what, how much along with from whom. Also location matters when it comes to SSD including nand flash with different environments and applications leveraging different placement (locality) options, not to mention how much performance do you need vs. want?

As part of their $1 billion USD (to be spent over three years, or $333.3333 million per year) flash ahead initiative IBM has announced their Flash Cache Storage Accelerator (FCSA) server software. While IBM did not use the term, (congratulations and thank you btw) some creative marketer might want to try calling this Software Defined Cache (SDC) or Software Defined SSD (SDSSD) which if that occurs, apologies in advance ;). Keep in mind that it was about a year ago this time when IBM announced that they were acquiring SSD industry veteran Texas Memory Systems (TMS).

What was announced, introducing Flash Cache Storage Acceleration or FCSA

With this announcement of FCSA slated for customer general availability by end of August, IBM joins EMC and NetApp among other storage systems vendors who developed their own, or have collaborated on server-side IO optimization and cache software. Some of the other startup and established vendors who have IO optimization, performance acceleration and caching software include DataRam (Ramdisk), FusionIO, Infinio (NFS for VMware), Pernix (block for VMware), Proximal and SANdisk (bought flashsoft) among others.

Read more about IBM Flash Cache Software (FCSA) including various questions and perspectives in part two of this two-part post located here.

Ok, nuff said (for now)

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)

August 12, 2013November 21, 2019

Server and Storage IO Memory: DRAM and nand flash

Storage I/O trends

DRAM, DIMM, DDR3, nand flash memory, SSD, stating what’s often assumed

Often what’s assumed is not always the case. For example in along with around server, storage and IO networking circles including virtual as well as cloud environments terms such as nand (Negated AND or NOT And) flash memory aka (Solid State Device or SSD), DRAM (Dynamic Random Access Memory), DDR3 (Double Data Rate 3) not to mention DIMM (Dual Inline Memory Module) get tossed around with the assumption everybody must know what they mean.

On the other hand, I find plenty of people who are not sure what those among other terms or things are, sometimes they are even embarrassed to ask, particular if they are a self-proclaimed expert.

So for those who need a refresh or primer, here you go, an excerpt from Chapter 7 (Servers – Physical, Virtual and Software) from my book "The Green and Virtual Data Center" (CRC Press) available at Amazon.com and other global venues in print and ebook formats.

7.2.2 Memory

Computers rely on some form of memory ranging from internal registers, local on-board processor Level 1 (L1) and Level 2 (L2) caches, random accessible memory (RAM), non-volatile RAM (NVRAM) or nand Flash (SSD) along with external disk storage. Memory, which includes external disk storage, is used for storing operating system software along with associated tools or utilities, application programs and data. Main memory or RAM, also known as dynamic RAM (DRAM) chips, is packaged in different ways with a common form being dual inline memory modules (DIMMs) for notebook or laptop, desktop PC and servers.

RAM main memory on a server is the fastest form of memory, second only to internal processor or chip based registers, L1, L2 or local memory. RAM and processor based memories are volatile and non-persistent in that when power is removed, the contents of memory are lost. As a result, some form of persistent memory is needed to keep programs and data when power is removed. Read only memory (ROM) and NVRAM are both persistent forms of memory in that their contents are not lost when power is removed. The amount of RAM that can be installed into a server will vary with specific architecture implementation and operating software being used. In addition to memory capacity and packaging format, the speed of memory is also important to be able to move data and programs quickly to avoid internal bottlenecks. Memory bandwidth performance increases with the width of the memory bus in bits and frequency in MHz. For example, moving 8 bytes on a 64 bit buss in parallel at the same time at 100MHz provides a theoretical 800MByte/sec speed.

To improve availability and increase the level of persistence, some servers include battery backed up RAM or cache to protect data in the event of a power loss. Another technique to protect memory data on some servers is memory mirroring where twice the amount of memory is installed and divided into two groups. Each group of memory has a copy of data being stored so that in the event of a memory failure beyond those correctable with standard parity and error correction code (ECC) no data is lost. In addition to being fast, RAM based memories are also more expensive and used in smaller quantities compared to external persistent memories such as magnetic hard disk drives, magnetic tape or optical based memory medias.

Memory diagram
Memory and Storage Pyramid

The above shows a tiered memory model that may look familiar as the bottom part is often expanded to show tiered storage. At the top of the memory pyramid is high-speed processor memory followed by RAM, ROM, NVRAM and FLASH along with many forms of external memory commonly called storage. More detail about tiered storage is covered in chapter 8 (Data Storage – Data Storage – Disk, Tape, Optical, and Memory). In addition to being slower and lower cost than RAM based memories, disk storage along with NVRAM and FLASH based memory devices are also persistent.

By being persistent, when power is removed, data is retained on the storage or memory device. Also shown in the above figure is that on a relative basis, less energy is used for power storage or memory at the bottom of the pyramid than for upper levels where performance increases. From a PCFE (Power, Cooling, Floor space, Economic) perspective, balancing memory and storage performance, availability, capacity and energy to a given function, quality of service and service level objective for a given cost needs to be kept in perspective and not considering simply the lowest cost for the most amount of memory or storage. In addition to gauging memory on capacity, other metrics include percent used, operating system page faults and page read/write operations along with memory swap activity as well memory errors.

Base 2 versus base 10 numbering systems can account for some storage capacity that appears to “missing” when real storage is compared to what is expected to be seen. Disk drive manufacturers use base 10 (decimal) to count bytes of data while memory chip, server and operating system vendors typically use base 2 (binary) to count bytes of data. This has led to confusion when comparing a disk drive base 10 GB with a chip memory base 2 GB of memory capacity, such as 1,000,000,000 (10^9) bytes versus 1,073,741,824 (2^30) bytes. Nomenclature based on the International System of Units uses MiB, GiB and TiB to denote million, billion and trillion bytes for base 2 numbering with base 10 using MB, TB and GB . Most vendors do document how many bytes, sometimes in both base 2 and base 10, as well as the number of 512 byte sectors supported on their storage devices and storage systems, though it might be in the small print.

Ok, nuff said (for now).

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier).

May 29, 2013November 26, 2023

Web chat Thur May 30th: Hot Storage Trends for 2013 (and beyond)

Storage I/O trends

Join me on Thursday May 30, 2013 at Noon ET (9AM PT) for a live web chat at the 21st Century IT (21cit) site (click here to register, sign-up, or view earlier posts). This will be an online web chat format interactive conversation so if you are not able to attend, you can visit at your convenience to view and give your questions along with comments. I have done several of these web chats with 21cit as well as other venues that are a lot of fun and engaging (time flies by fast).

For those not familiar, 21cIT is part of the Desum/UBM family of sites including Internet Evolution, SMB Authority, and Enterprise Efficiency among others that I do article posts, videos and live chats for.

Some things keep going around, Seagate ships 2 Billion HDD’s

Seagate (@Seagate) announced today that it reached a milestone of having shipped 2 Billion hard disk drives (HDD’s), something that is round stores data that keeps growing. As part of their announcement, Seagate has a good info graphics and facts page here going back to 1979 when it was founded as Shugart Technology (read about Al Shugart here).

By coincidence, just a few years before Seagate was founded, McDonalds (who makes round things as well) announced that they had served over 20 billion hamburgers. Thus McDonald feeds the appetites of consumers hungry for a quick meal while Seagate feeds the information demands, perhaps while stopping for a quick breakfast, lunch, coffee or dinner. Speaking of things that go around (like HDD’s), check out what NAS, NASA and NASCAR have in common all of which are also involved in big data as well as little data.

Storage I/O industry trends image

Both Seagate and McDonalds have also expanded their menu of offerings over the years maintaining their core products while expanding into new and adjacent areas given different appetites and preferences. After all, in the data cloud, virtual or physical data center also known as an information factory not everything is the same either.

Cloud

Granted Seagate is helping to feed or fuel the internet along with traditional hungry demand for data, not to mention people and data are living longer, as well as getting larger.

Cloud, virtual server, big data and little data storage I/O image

In the case of Seagate and other driver manufactures of which have consolidated down to three (Toshiba, Seagate and Western Digital), the physical devices are getting smaller, however capacities are increasing.

Storage I/O

Why the continued growth? As mentioned data is getting larger (big data and little data) and living longer, there is also no such thing as a data or information recession. Consequently data storage is an important pillar or part of cloud, virtual and traditional information services with HDD’s remaining popular along side nand flash solid state devices (SSD).

The Seagate info graphic page can be seen here and is a good walk back in time for some, perhaps a history lesson for others. It goes back to the Sony Walkman which some might remember, launch of the PC and Apple Macintosh in the 80s, Linux and the web in the 90s and moving forward from then to now.

HDD
A few of my HDD’s, different types for various tasks.

If you think or believe HDD’s are a dead technology, take a few minutes to view the info graphic to update your insight on what has been an important aspect of computing and remains popular in cloud environments. Otoh, if you believe that HDD’s are still a core piece of computing and will remain so including in roles in the future, have a look to see how things have progressed, maybe some Dejavu.

Oh, for those who are thinking that the HDD did not begin in 1979, you are absolutely correct as it dates back into the 1950s. Here is a link to something that I wrote a few years ago on the HDD’s 50th birthday and looks like it will easily celebrate 60 and beyond.

Additional related reading:
In the data center or information factory, not everything is the same
Hard Disk Drives (HDDs) for virtual and physical environments
Happy 50th, hard drive. But will you make it to 60?
Seagate to say goodbye to Cayman Islands, Hello Ireland
More Storage IO momentus HHDD and SSD moments part II
In the data center or information factory, not everything is the same
The Human Face of Big Data, a Book Review

Congratulations to Seagate, now how long until the 3 billion served, excuse me, shipped HDD occurs?

Disclosure: Its been almost a month since my last visit to McDonalds or buying another HDD (or SSD) from Amazon.com.

Ok, nuff said

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

February 21, 2013December 24, 2020

NetApp EF540, something familiar, something new

StorageIO Industry trends and perspectives image

NetApp announced the other day a new all nand flash solid-state devices (SSD) storage system called the EF540 that is available now. The EF540 has something’s new and cool, along with some things familiar, tried, true and proven.

What is new is that the EF540 is an all nand flash multi-level cell (MLC) SSD storage system. What is old is that the EF540 is based on the NetApp E-Series (read more here and here) and SANtricity software with hundreds of thousands installed systems. As a refresher, the E-Series are the storage system technologies and solutions obtained via the Engenio acquisition from LSI in 2011.

Image of NetApp EF540 via ntapgeek.com
Image via www.ntapgeek.com

The EF540 expands the NetApp SSD flash portfolio which includes products such as FlashCache (read cache aka PAM) for controllers in ONTAP based storage systems. Other NetApp items in the NetApp flash portfolio include FlashPool SSD drives for persistent read and write storage in ONTAP based systems. Complimenting FlashCache and FlashPool is the server-side PCIe caching card and software FlashAccel. NetApp is claiming to have revenue shipped 36PB of flash complimenting over 3 Exabytes (EB) of storage while continuing to ship a large amount of SAS and SATA HDD’s.

NetApp also previewed its future FlashRay storage system that should appear in beta later in 2013 and general availability in 2014.

In addition to SSD and flash related announcements, NetApp also announced enhancements to its ONTAP FAS/V6200 series including the FAS/V6220, FAS/V6250 and FAS/V6290.

Some characteristics of the NetApp EF540 and SANtricity include:

Two models with 12 or 24 x 6Gbs SAS 800GB MLC SSD devices
Up to 9.6TB or 19.2TB physical storage in a 2U (3.5 inch) tall enclosure
Dual controllers for redundancy, load-balancing and availability
IOP performance of over 300,000 4Kbyte random 100% reads under 1ms
6GByte/sec performance of 512Kbyte sequential reads, 5.5Gbyte/sec random reads
Multiple RAID levels (0, 1, 10, 3, 5, 6) and flexible group sizes
12GB of DRAM cache memory in each controller (mirrored)
4 x 8GFC host server-side ports per controller
Optional expansion host ports (6Gb SAS, 8GFC, 10Gb iSCSI, 40Gb IBA/SRP)
Snapshots and replication (synchronous and asynchronous) including to HDD systems
Can be used for traditional IOP intensive little-data, or bandwidth for big-data
Proactive SSD wear monitoring and notification alerts
Utilizes SANtricity version 10.84

Poll, Are large storage arrays day’s numbered?

EMC and NetApp (along with other vendors) continue to sell large numbers of HDD’s as well as large amounts of SSD. Both EMC and NetApp are taking similar approaches of leveraging PCIe flash cards as cache adding software functionality to compliment underlying storage systems. The benefit is that the cache approach is less disruptive for many environments while allowing improved return on investment (ROI) of existing assets.

	EMC	NetApp
Storage systems with HDD and SSD	VMAX, VNX	FAS/V, E-Series
Storage systems with SSD cache	FastCache,	FlashCache
All SSD based storage	VMAX, VNX	EF540
All new SSD system in development	Project X	FlashRay
Server side PCIe SSD cache	VFCache	FlashAcell
Partner ecosystems	Yes	Yes

The best IO is the one that you do not have to do, however the next best are those that have the least cost or affect which is where SSD comes into play. SSD is like real estate in that location matters in terms of providing benefit, as well as how much space or capacity is needed.

What does this all mean?
The NetApp EF540 based on the E-Series storage system architecture is like one of its primary competitors (e.g. EMC VNX also available as an all-flash model). The similarity is that both have been competitors, as well as have been around for over a decade with hundreds of thousands of installed systems. The similarities are also that both continue to evolve their code base leveraging new hardware and software functionality. These improvements have resulted in improved performance, availability, capacity, energy effectiveness and cost reduction.

Whats your take on RAID still being relevant?

From a performance perspective, there are plenty of public workloads and benchmarks including Microsoft ESRP and SPC among others to confirm its performance. Watch for NetApp to release EF540 SPC results given their history of doing so with other E-Series based systems. With those or other results, compare and contrast to other solutions looking not just at IOPS or MB/sec (bandwidth), also latency, functionality and cost.

What does the EF540 compete with?
The EF540 competes with all flash-based SSD solutions (Violin, Solidfire, Purestorage, Whiptail, Kaminario, IBM/TMS, up-coming EMC Project “X” (aka XtremeIO)) among others. Some of those systems use general-purpose servers combined SSD drives, PCIe cards along with management software where others leverage customized platforms with software. To a lesser extent, competition will also be mixed mode SSD and HDD solutions along with some PCIe target SSD cards for some situations.

What to watch and look for:
It will be interesting to view and contrast public price performance results using SPC or Microsoft ESRP among others to see how the EF540 compares. In addition, it will be interesting to compare other storage based, as well as SSD systems beyond the number of IOPS. What will be interesting is to keep an eye on latency, as well as bandwidth, feature functionality and associated costs.

Given that the NetApp E-Series are OEM or sold by third parties, let’s see if something looking similar or identical to the EF540 appear at any of those or new partners. This includes traditional general purpose and little-data environments, along with cloud, managed service provider, high performance compute and high productivity compute (HPC), super computer (SC), big data and big bandwidth among others.

Poll, Have SSD been successful in traditional storage systems and arrays

The EF540 could also appear as a storage or IO accelerator for large-scale out, clustered, grid and object storage systems for meta data, indices, key value stores among other uses either direct attached to servers, or via shared iSCSI, SAS, FC and InfiniBand (IBA) SCSI Remote Protocol (SRP).

Keep an eye on how the startups that have been primarily Just a Bunch Of SSD (JBOS) in a box start talking about adding new features and functionality such as snapshots, replication or price reductions. Also, keep an eye and ear open to what EMC does with project “X” along with NetApp FlashRay among other improvements.

For NetApp customers, prospects, partners, E-Series OEMs and their customers with the need for IO consolidation, or performance optimization for big-data, little-data and related applications the EF540 opens up new opportunities and should be good news. For EMC competitors, they now have new competition which also signals an expanding market with new opportunities in adjacent areas for growth. This also further signals the need for diverse ssd portfolios and product options to meet different customer application needs, along with increased functionality vs. lowest cost for high capacity fast nand SSD storage.

Some related reading:

Disclosure: NetApp, Engenio (when LSI), EMC and TMS (now IBM) have been clients of StorageIO.

Ok, nuff said

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

January 15, 2013November 3, 2024

Thanks for viewing StorageIO content and top 2012 viewed posts

StorageIO industry trends cloud, virtualization and big data

2012 was a busy year (it was our 7th year in business) along with plenty of activity on StorageIOblog.com as well as on the various syndicate and other sites that pickup our content feed (https://storageioblog.com/RSSfull.xml).

Excluding traditional media venues, columns, articles, web casts and web site visits (StorageIO.com and StorageIO.TV), StorageIO generated content including posts and pod casts have reached over 50,000 views per month (and growing) across StorageIOblog.com and our partner or syndicated sites. Including both public and private, there were about four dozen in-person events and activities not counting attending conferences or vendor briefing sessions, along with plenty of industry commentary. On the twitter front, plenty of activity there as well closing in on 7,000 followers.

Thank you to everyone who have visited the sites where you will find StorageIO generated content, along with industry trends and perspective comments, articles, tips, webinars, live in person events and other activities.

In terms of what was popular on the StorageIOblog.com site, here are the top 20 viewed posts in alphabetical order.

Amazon cloud storage options enhanced with Glacier
Announcing SAS SANs for Dummies book, LSI edition
Are large storage arrays dead at the hands of SSD?
AWS (Amazon) storage gateway, first, second and third impressions
EMC VFCache respinning SSD and intelligent caching
Hard product vs. soft product
How much SSD do you need vs. want?
Oracle, Xsigo, VMware, Nicira, SDN and IOV: IO IO its off to work they go
Is SSD dead? No, however some vendors might be
IT and storage economics 101, supply and demand
More storage and IO metrics that matter
NAD recommends Oracle discontinue certain Exadata performance claims
New Seagate Momentus XT Hybrid drive (SSD and HDD)
PureSystems, something old, something new, something from big blue
Researchers and marketers dont agree on future of nand flash SSD
Should Everything Be Virtualized?
SSD, flash and DRAM, DejaVu or something new?
What is the best kind of IO? The one you do not have to do
Why FC and FCoE vendors get beat up over bandwidth?
Why SSD based arrays and storage appliances can be a good idea

Moving beyond the top twenty read posts on StorageIOblog.com site, the list quickly expands to include more popular posts around clouds, virtualization and data protection modernization (backup/restore, HA, BC, DR, archiving), general IT/ICT industry trends and related themes.

I would like to thank the current StorageIOblog.com site sponsors Solarwinds (management tools including response time monitoring for physical and virtual servers) and Veeam (VMware and Hyper-V virtual server backup and data protection management tools) for their support.

Thanks again to everyone for reading and following these and other posts as well as for your continued support, watch for more content on the above and other related and new topics or themes throughout 2013.

Btw, if you are into Facebook, you can give StorageIO a like at facebook.com/storageio (thanks in advance) along with viewing our newsletter here.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

January 5, 2013October 18, 2024

The Human Face of Big Data, a Book Review

My copy of the new book The Human Face of Big Data created by Rick Smolan and Jennifer Erwitt arrived yesterday compliments of EMC (the lead sponsor). In addition to EMC, the other sponsors of the book are Cisco, VMware, FedEx, Originate and Tableau software.

To say this is a big book would be an understatement, then again, big data is a big topic with a lot of diversity if you open your eyes and think in a pragmatic way, which once you open and see the pages you will see. This is physically a big book (11x 14 inches) with lots of pictures, texts, stories, factoids and thought stimulating information of the many facets and dimensions of big data across 224 pages.

While Big Data as a buzzword and industry topic theme might be new, along with some of the related technologies, techniques and focus areas, other as aspects have been around for some time. Big data means many things to various people depending on their focus or areas of interest ranging from analytics to images, videos and other big files. A common theme is the fact that there is no such thing as an information or data recession, and that people and data are living longer, getting larger, and we are all addicted to information for various reasons.

Big data needs to be protected and preserved as it has value, or its value can increase over time as new ways to leverage it are discovered which also leads to changing data access and life cycle patterns. With many faces, facets and areas of interests applying to various spheres of influence, big data is not limited to programmatic, scientific, analytical or research, yet there are many current and use cases in those areas.

Big data is not limited to videos for security surveillance, entertainment, telemetry, audio, social media, energy exploration, geosciences, seismic, forecasting or simulation, yet those have been areas of focus for years. Some big data files or objects are millions of bytes (MBytes), billion of bytes (GBytes) or trillion of bytes (TBytes) in size that when put into file systems or object repositories, add up to Exabytes (EB – 1000 TBytes) or Zettabytes (ZB – 1000 EBs). Now if you think those numbers are far-fetched, simply look back to when you thought a TByte, GByte let alone a MByte was big or far-fetched future. Remember, there is no such thing as a data or information recession, people and data are living longer and getting larger.

Big data is more than hadoop, map reduce, SAS or other programmatic and analytical focused tool, solution or platform, yet those all have been and will be significant focus areas in the future. This also means big data is more than data warehouse, data mart, data mining, social media and event or activity log processing which also are main parts have continued roles going forward. Just as there are large MByte, GByte or TByte sized files or objects, there are also millions and billions of smaller files, objects or pieces of information that are part of the big data universe.

You can take a narrow, product, platform, tool, process, approach, application, sphere of influence or domain of interest view towards big data, or a pragmatic view of the various faces and facets. Of course you can also spin everything that is not little-data to be big data and that is where some of the BS about big data comes from. Big data is not exclusive to the data scientist, researchers, academia, governments or analysts, yet there are areas of focus where those are important. What this means is that there are other areas of big data that do not need a data science, computer science, mathematical, statistician, Doctoral Phd or other advanced degree or training, in other words big data is for everybody.

Cover image of Human Face of Big Data Book

Back to how big this book is in both physical size, as well as rich content. Note the size of The Human Face of Big Data book in the adjacent image that for comparison purposes has a copy of my last book Cloud and Virtual Data Storage Networking (CRC), along with a 2.5 inch hard disk drive (HDD) and a growler. The Growler is from Lift Bridge Brewery (Stillwater, MN), after all, reading a big book about big data can create the need for a big beer to address a big thirst for information ;).

The Human Face of Big Data is more than a coffee table or picture book as it is full of with information, factoids and perspectives how information and data surround us every day. Check out the image below and note the 2.5 inch HDD sitting on the top right hand corner of the page above the text. Open up a copy of The Human Face of Big Data and you will see examples of how data and information are all around us, and our dependence upon it.

A look inside the book The Humand Face of Big Data image

Book Details:
Copyright 2012
Against All Odds Productions
ISBN 978-1-4549-0827-2
Hardcover 224 pages, 11 x 0.9 x 14 inches
4.8 pounds, English

There is also an applet to view related videos and images found in the book at HumanFaceofBigData.com/viewer in addition to other material on the companion site www.HumanFacesofBigData.com.

Get your copy of
The Human Face of Big Data at Amazon.com by clicking here or at other venues including by clicking on the following image (Amazon.com).

Some added and related material:
Little data, big data and very big data (VBD) or big BS?
How many degrees separate you and your information?
Hardware, Software, what about Valueware?
Changing Lifecycles and Data Footprint Reduction (Data doesnt have to lose value over time)
Garbage data in, garbage information out, big data or big garbage?
Industry adoption vs. industry deployment, is there a difference?
Is There a Data and I/O Activity Recession?
Industry trend: People plus data are aging and living longer
Supporting IT growth demand during economic uncertain times
No Such Thing as an Information Recession

For those who can see big data in a broad and pragmatic way, perhaps using the visualization aspect this book brings forth the idea that there are and will be many opportunities. Then again for those who have a narrow or specific view of what is or is not big data, there is so much of it around and various types along with focus areas you too will see some benefits.

Do you want to play in or be part of a big data puddle, pond, or lake, or sail and explore the oceans of big data and all the different aspects found in, under and around those bigger broader bodies of water.

Bottom line, this is a great book and read regardless of if you are involved with data and information related topics or themes, the format and design lend itself to any audience. Broaden your horizons, open your eyes, ears and thinking to the many facets and faces of big data that are all around us by getting your copy of The Human Face of Big Data (Click here to go to Amazon for your copy) book.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

November 24, 2012January 23, 2019

Garbage data in, garbage information out, big data or big garbage?

Do you know the computer technology saying, garbage data in results in garbage information out?

In other words even with the best algorithms and hardware, bad, junk or garbage data put in results in garbage information delivered. Of course, you might have data analysis and cleaning software to look for, find and remove bad or garbage data, however that’s for a different post on another day.

If garbage data in results in garbage information out, does garbage big data in result in big garbage out?

I’m sure my sales and marketing friends or their surrogates will jump at the opportunity to tell me why and how big data is the solution to the decades old garbage data in problem.

Likewise they will probably tell me big data is the solution to problems that have not even occurred or been discovered yet, yeah right.

However garbage data does not discriminate or show preference towards big data or little data, in fact it can infiltrate all types of data and systems.

Lets shift gears from big and little data to how all of that information is protected, backed up, replicated, copied for HA, BC, DR, compliance, regulatory or other reasons. I wonder how much garbage data is really out there and many garbage backups, snapshots, replication or other copies of data exist? Sounds like a good reason to modernize data protection.

If we don’t know where the garbage data is, how can we know if there is a garbage copy of the data for protection on some other tape, disk or cloud. That also means plenty of garbage data to compact (e.g. compress and dedupe) to cut its data footprint impact particular with tough economic times.

Does this mean then that the cloud is the new destination for garbage data in different shapes or forms, from online primary to back up and archive?

Does that then make the cloud the new virtual garbage dump for big and little data?

Hmm, I think I need to empty my desktop trash bin and email deleted items among other digital house keeping chores now.

On the other hand, just had a thought about orphaned data and orphaned storage, however lets leave those sleeping dogs lay where they rest for now.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

November 24, 2012April 27, 2025

SSD, flash and DRAM, DejaVu or something new?

Recently I was in Europe for a couple of weeks including stops at Storage Networking World (SNW) Europe in Frankfurt, StorageExpo Holland, Ceph Day in Amsterdam (object and cloud storage), and Nijkerk where I delivered two separate 2 day, and a single 1 day seminar.

Image of Frankfurt transtation Image of inside front of ICE train going from Frankfurt to Utrecht

At the recent StorageExpo Holland event in Utrecht, I gave a couple of presentations, one on cloud, virtualization and storage networking trends, the other taking a deeper look at Solid State Devices (SSD’s). As in the past, StorageExpo Holland was great in a fantastic venue, with many large exhibits and great attendance which I heard was over 6,000 people over two days (excluding exhibitor vendors, vars, analysts, press and bloggers) which was several times larger than what was seen in Frankfurt at the SNW event.

Image of Ilja Coolen (twitter @@iCoolen) who was session host for SSD presentation in Utrecht Image of StorageExpo Holland exhibit show floor in Utrecht

Both presentations were very well attended and included lively interactive discussion during and after the sessions. The theme of my second talk was SSD, the question is not if, rather what to use where, how and when which brings us up to this post.

For those who have been around or using SSD for more than a decade outside of cell phones, camera, SD cards or USB thumb drives, that probably means DRAM based with some form of data persistency mechanisms. More recently mention SSD and that implies nand flash-based, either MLC or eMLC or SLC or perhaps emerging mram or PCM. Some might even think of NVRAM or other forms of SSD including emerging mram or mem-resistors among others, however lets stick to nand flash and dram for now.

Often in technology what is old can be new, what is new can be seen as old, if you have seen, experienced or done something before you will have a sense of DejaVu and it might be evolutionary. On the other hand, if you have not seen, heard, experienced, or found a new audience, then it can be revolutionary or maybe even an industry first ;).

Technology evolves, gets improved on, matures, and can often go in cycles of adoption, deployment, refinement, retirement, and so forth. SSD in general has been an on again, off again type cycle technology for the past several decades except for the past six to seven years. Normally there is an up cycle tied to different events, servers not being fast enough or affordable so use SSD to help address performance woes, or drives and storage systems not being fast enough and so forth.

Btw, for those of you who think that the current SSD focused technology (nand flash) is new, it is in fact 25 years old and still evolving and far from reaching its full potential in terms of customer deployment opportunities.

Nand flash memory has helped keep SSD practical for the past several years riding the similar curve that is keeping hard disk drives (HDD’s) that they were supposed to replace alive. That is improved reliability, endurance or duty cycle, better annual failure rate (AFR), larger space capacity, lower cost, and enhanced interfaces, packaging, power and functionality.

DRAM historically at least for enterprise has been the main option for SSD based solutions using some form of data persistency. Data persistency options include battery backup combined with internal HDD’s to de stage information from the DRAM before power was lost. TMS (recently bought by IBM) was one of the early SSD vendors from the DRAM era that made the transition to flash including being one of the first many years ago to combine DRAM as a cache layer over nand flash as a persistency or de-stage layer. This would be an example of if you were not familiar with TMS back then and their capacities, you might think or believe that some more recent introductions are new and revolutionary, and perhaps they are in their own right or with enough caveats and qualifiers.

An emerging trend, which for some will be Dejavu, is that of using more DRAM in combination with nand flash SSD.

Oracle is one example of a vendor who IMHO rather quietly (intentionally or accidentally) has done this in the 7000 series storage systems as well as ExaData based database storage systems. Rest assured they are not alone and in fact many of the legacy large storage vendors have also piled up large amounts of DRAM based cache in their storage systems. For example EMC with 2TByte of DRAM cache in their VMAX 40K, or similar systems from Fujitsu HP, HDS, IBM and NetApp (including recent acquisition of DRAM based CacheIQ) among others. This has also prompted the question of if SSD has been successful in traditional storage arrays, systems or appliances as some would have you believe not, click here to learn more and cast your vote.

So is the future in the past? Some would say no, some will say yes, however IMHO there are lessons to learn and leverage from the past while looking and moving forward.

Early SSD’s were essentially RAM disks, that is a portion of main random access memory (RAM) or what we now call DRAM set aside as a non persistent (unless battery backed up) cache or device. Using a device driver, applications could use the RAM disk as though it were a normal storage system. Different vendors springing up with drivers for various platforms and disappeared as their need were reduced with faster storage systems, interfaces and ram disks drives supplied by vendors, not to mention SSD devices.

Oh, for you tech trivia types, there was also database machines from the late 80’s such as Briton Lee that would offload your database processing functions to a specialized appliance. Sound like Oracle ExaData I, II or III to anybody?

Ok, so we have seen this movie before, no worries, old movies or shows get remade, and unless you are nostalgic or cling to the past, sure some of the remakes are duds, however many can be quite good.

Same goes with the remake of some of what we are seeing now. Sure there is a generation that does not know nor care about the past, its full speed ahead and leverage what will get them there.

Thus we are seeing in memory databases again, some of you may remember the original series (pick your generation, platform, tool and technology) with each variation getting better. With 64 bit processor, 128 bit and beyond file system and addressing, not to mention ability for more DRAM to be accessed directly, or via memory address extension, combined with memory data footprint reduction or compression, there is more space to put things (e.g. no such thing as a data or information recession).

Lets also keep in mind that the best IO is the IO that you do not have to do, and that SSD which is an extension of the memory map plays by the same rules of real estate. That is location matters.

Thus, here we go again for some of you (DejaVu), while for others get ready for a new and exciting ride (new and revolutionary). We are back to the future with in memory database which while for a time will take some pressure from underlying IO systems until they once again out grow server memory addressing limits (or IT budgets).

However for those who do not fall into a false sense of security, no fear, as there is no such thing as a data or information recession. Sure as the sun rises in the east and sets in the west, sooner or later those IO’s that were or are being kept in memory will need to be de-staged to persistent storage, either nand flash SSD, HDD or somewhere down the road PCM, mram and more.

There is another trend that with more IOs being cached, reads are moving to where they should resolve which is closer to the application or via higher up in the memory and IO pyramid or hierarchy (shown above).

Thus, we could see a shift over time to more writes and ugly IOs being sent down to the storage systems. Keep in mind that any cache historically provides temporal relieve, question is how long of a temporal relief or until the next new and revolutionary or DejaVu technology shows up.

Ok, go have fun now, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

November 21, 2012November 3, 2024

IBM vs. Oracle, NAD intervenes, again

With HP announcing that they were sold a bogus deal with Autonomy (read here, here and here among others) and the multi billion write off (loss), or speculation of who will be named the new CEO of Intel in 2013, don’t worry if you missed the latest in the ongoing IBM vs. Oracle campaign. The other day the NAD (National Advertising Directive) part of the Better Business Bureau (BBB) issued yet another statement about IBM and Oracle (read here and posted below).

NAD BBB logo

In case you had not heard, earlier this year, Oracle launched an advertising promotion touting how much faster their solutions are vs. IBM. Perhaps you even saw the advertising billboards along highways or in airports making the Oracle claims.

Big Blue (e.g. IBM) being the giant that they are was not going take the Oracle challenge sitting down and stepped up and complained to the better business bureau (BBB). As a result, the NAD issued a decision for Oracle to stop the ads (read more here). Oracle at 37.1B (May 2012 annual earnings) is about a third the size of IBM at 106.9B (2011 earnings), thus neither is exactly a small business.

Lets get back to the topic at hand the NAD issued yet another directive. In the latest spat, after the first Ads, Oracle launched the 10M challenge (you can read about that here).

Once again the BBB and the NAD weighs in for IBM and issued the following statement (mentioned above):

For Immediate Release
Contact: Linda Bean
212.705.0129

NAD Determines Oracle Acted Properly in Discontinuing Performance Claim Couched in ‘Contest’ Language

New York, NY – Nov. 20, 2012 – The National Advertising Division has determined that Oracle Corporation took necessary action in discontinuing advertising that stated its Exadata server is “5x Faster Than IBM … Or you win $10,000,000.”

The claim, which appeared in print advertising in the Wall Street Journal and other major newspapers, was challenged before NAD by International Business Machines Corporation.

NAD is an investigative unit of the advertising industry system of self-regulation and is administered by the Council of Better Business Bureaus.

As an initial matter, NAD considered whether or not Oracle’s advertisement conveyed a comparative performance claim – or whether the advertisement simply described a contest.

In an NAD proceeding, the advertiser is obligated to support all reasonable interpretations of its advertising claims, not just the message it intended to convey. In the absence of reliable consumer perception evidence, NAD uses its judgment to determine what implied messages, if any, are conveyed by an advertisement.

Here, NAD found that, even accounting for a sophisticated target audience, a consumer would be reasonable to take away the message that all Oracle Exadata systems run five times as fast as all IBM’s Power computer products. NAD noted in its decision that the fact that the claim was made in the context of a contest announcement did not excuse the advertiser from its obligation to provide substantiation.

The advertiser did not provide any speed performance tests, examples of comparative system speed superiority or any other data to substantiate the message that its Exadata computer systems run data warehouses five times as fast as IBM Power computer systems.

Accordingly, NAD determined that the advertiser’s decision to permanently discontinue this advertisement was necessary and appropriate. Further, to the extent that Oracle reserves the right to publish similar advertisements in the future, NAD cautioned that such performance claims require evidentiary support whether or not the claims are couched in a contest announcement.

Oracle, in its advertiser’s statement, said it disagreed with NAD’s findings, but would take “NAD’s concerns into account should it disseminate similar advertising in the future.”

###

NAD’s inquiry was conducted under NAD/CARU/NARB Procedures for the Voluntary Self-Regulation of National Advertising. Details of the initial inquiry, NAD’s decision, and the advertiser’s response will be included in the next NAD/CARU Case Report.

About Advertising Industry Self-Regulation: The Advertising Self-Regulatory Council establishes the policies and procedures for advertising industry self-regulation, including the National Advertising Division (NAD), Children’s Advertising Review Unit (CARU), National Advertising Review Board (NARB), Electronic Retailing Self-Regulation Program (ERSP) and Online Interest-Based Advertising Accountability Program (Accountability Program.) The self-regulatory system is administered by the Council of Better Business Bureaus.

Self-regulation is good for consumers. The self-regulatory system monitors the marketplace, holds advertisers responsible for their claims and practices and tracks emerging issues and trends. Self-regulation is good for advertisers. Rigorous review serves to encourage consumer trust; the self-regulatory system offers an expert, cost-efficient, meaningful alternative to litigation and provides a framework for the development of a self-regulatory to emerging issues.

To learn more about supporting advertising industry self-regulation, please visit us at: www.asrcreviews.org.

Linda Bean Director, Communications,
Advertising Self-Regulatory Council

Tel: 212.705.0129
Cell: 908.812.8175
lbean@asrc.bbb.org

112 Madison Ave.
3rd Fl.
New York, NY
10016

Not surprisingly, IBM sent the following email to highlight their latest news:

Greg,

For the third time in eight months Oracle has agreed to kill a misleading advertisement targeting IBM after scrutiny from the Better Business Bureau’s National Advertising Division.

Oracle’s ‘$10 Million Challenge’ ad claimed that its Exadata server was ‘Five Times Faster than IBM Power or You Win $10,000,000.’ The advertising council just issued a press release announcing that the claim was not supported by the evidence in the record, and that Oracle has agreed to stop making the claim. ‘[Oracle] did not provide speed performance tests, examples of comparative systems speed superiority or any other data to substantiate its message,’ the BBB says in the release: The ads ran in The Wall Street Journal, The Economist, Chief Executive Magazine, trade publications and online.

The National Advertising Division reached similar judgments against Oracle advertising on two previous occasions this year. Lofty and unsubstantiated claims about Oracle systems being ‘Twenty Times Faster than IBM’ and ‘Twice as Fast Running Java’ were both deemed to be unsubstantiated and misleading. Oracle quietly shelved both campaigns.

If you follow Oracle’s history of claims, you won’t be surprised that the company issues misleading ads until they’re called out in public and forced to kill the campaign. As far back as 2001, Oracle’s favorite tactic has been to launch unsubstantiated attacks on competitors in ads while promising prize money to anyone who can disprove the bluff. Not surprisingly, no prize money is ever paid as the campaigns wither under scrutiny. They are designed to generate publicity for Oracle, nothing more. You may be familiar with their presentation, ‘Ridding the Market of Competition,’ which they issued to the Society of Competitive Intelligence Professionals laying out their strategy.

The repeated rulings by the BBB even caused analyst Rob Enderle to comment that, ‘there have been significant forced retractions and it is also apparent that increasingly the only people who could cite these false Oracle performance advantages with a straight face were Oracle’s own executives, who either were too dumb to know they were false or too dishonest to care.’

Let me know if you’re interested in following up on this news. You won’t hear anything about it from Oracle.

Best,

Chris

Christopher Rubsamen
Worldwide Communications for PureSystems and Cloud Computing
IBM Systems & Technology Group
aim: crubsamen
twitter: @crubsamen

Wow, I never knew however I should not be surprised that there is a Society of Competitive Intelligence Professionals.

Now Oracle is what they are, aggressive and have a history of doing creative or innovative (e.g. stepping out-of-bounds) in sales and marketing campaigns, benchmarking and other activities. On the other hand has IBM been victimized at the hands of Oracle and thus having to resort to using the BBB and NAD as part of its new sales and marketing tool to counter Oracle?

Does anybody think that the above will cause Oracle to retreat, repent, and tone down how they compete on the field of sales and marketing of servers, storage, database and related IT, ICT, big and little data, clouds?

Anyone else have a visual of a group of IBMers sitting around a table at an exclusive country club enjoying a fine cigar along with glass of cognac toasting each other on their recent success in having the BBB and NAD issue another ruling against Oracle. Meanwhile perhaps at some left coast yacht club, the Oracle crew are high fiving, congratulating each other on their commission checks while spraying champagne all over the place like they just won the Americas cup race?

How about it Oracle, IBM says Im not going to hear anything from you, is that true?

Ok, nuff said (for now).

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

November 3, 2012April 27, 2025

Ben Woo on Big Data Buzzword Bingo and Business Benefits

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed )

In this episode, In this episode, Im joined in Frankfurt Germany by Ben Woo (@benwoony) of Neuralytix.com. Our conversation includes cloud; big data and how buzzword bingo technology focused discussions can result in missed business benefits for both vendors and customers. We also reminisce about MTI where we worked together along with protecting home storage.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Ben and myself.

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode with Ben Woo talking big data and business benefits vs. buzzword bingo.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

October 28, 2012November 26, 2023

Little data, big data and very big data (VBD) or big BS?

This is an industry trends and perspective piece about big data and little data, industry adoption and customer deployment.

If you are in any way associated with information technology (IT), business, scientific, media and entertainment computing or related areas, you may have heard big data mentioned. Big data has been a popular buzzword bingo topic and term for a couple of years now. Big data is being used to describe new and emerging along with existing types of applications and information processing tools and techniques.

I routinely hear from different people or groups trying to define what is or is not big data and all too often those are based on a particular product, technology, service or application focus. Thus it should be no surprise that those trying to police what is or is not big data will often do so based on what their interest, sphere of influence, knowledge or experience and jobs depend on.

Not long ago while out traveling I ran into a person who told me that big data is new data that did not exist just a few years ago. Turns out this person was involved in geology so I was surprised that somebody in that field was not aware of or working with geophysical, mapping, seismic and other legacy or traditional big data. Turns out this person was basing his statements on what he knew, heard, was told about or on sphere of influence around a particular technology, tool or approach.

Fwiw, if you have not figured out already, like cloud, virtualization and other technology enabling tools and techniques, I tend to take a pragmatic approach vs. becoming latched on to a particular bandwagon (for or against) per say.

Not surprisingly there is confusion and debate about what is or is not big data including if it only applies to new vs. existing and old data. As with any new technology, technique or buzzword bingo topic theme, various parties will try to place what is or is not under the definition to align with their needs, goals and preferences. This is the case with big data where you can routinely find proponents of Hadoop and Map reduce position big data as aligning with the capabilities and usage scenarios of those related technologies for business and other forms of analytics.

Not surprisingly the granddaddy of all business analytics, data science and statistic analysis number crunching is the Statistical Analysis Software (SAS) from the SAS Institute. If these types of technology solutions and their peers define what is big data then SAS (not to be confused with Serial Attached SCSI which can be found on the back-end of big data storage solutions) can be considered first generation big data analytics or Big Data 1.0 (BD1 ;) ). That means Hadoop Map Reduce is Big Data 2.0 (BD2 ;) ;) ) if you like, or dislike for that matter.

Funny thing about some fans and proponents or surrogates of BD2 is that they may have heard of BD1 like SAS with a limited understanding of what it is or how it is or can be used. When I worked in IT as a performance and capacity planning analyst focused on servers, storage, network hardware, software and applications I used SAS to crunch various data streams of event, activity and other data from diverse sources. This involved correlating data, running various analytic algorithms on the data to determine response times, availability, usage and other things in support of modeling, forecasting, tuning and trouble shooting. Hmm, sound like first generation big data analytics or Data Center Infrastructure Management (DCIM) and IT Service Management (ITSM) to anybody?

Now to be fair, comparing SAS, SPSS or any number of other BD1 generation tools to Hadoop and Map Reduce or BD2 second generation tools is like comparing apples to oranges, or apples to pears.

Lets move on as there is much more to what is big data than simply focus around SAS or Hadoop.

Another type of big data are the information generated, processed, stored and used by applications that result in large files, data sets or objects. Large file, objects or data sets include low resolution and high-definition photos, videos, audio, security and surveillance, geophysical mapping and seismic exploration among others. Then there are data warehouses where transactional data from databases gets moved to for analysis in systems such as those from Oracle, Teradata, Vertica or FX among others. Some of those other tools even play (or work) in both traditional e.g. BD1 and new or emerging BD2 worlds.

This is where some interesting discussions, debates or disagreements can occur between those who latch onto or want to keep big data associated with being something new and usually focused around their preferred tool or technology. What results from these types of debates or disagreements is a missed opportunity for organizations to realize that they might already be doing or using a form of big data and thus have a familiarity and comfort zone with it.

By having a familiarity or comfort zone vs. seeing big data as something new, different, hype or full of FUD (or BS), an organization can be comfortable with the term big data. Often after taking a step back and looking at big data beyond the hype or fud, the reaction is along the lines of, oh yeah, now we get it, sure, we are already doing something like that so lets take a look at some of the new tools and techniques to see how we can extend what we are doing.

Likewise many organizations are doing big bandwidth already and may not realize it thinking that is only what media and entertainment, government, technical or scientific computing, high performance computing or high productivity computing (HPC) does. I’m assuming that some of the big data and big bandwidth pundits will disagree, however if in your environment you are doing many large backups, archives, content distribution, or copying large amounts of data for different purposes that consume big bandwidth and need big bandwidth solutions.

Yes I know, that’s apples to oranges and perhaps stretching the limits of what is or can be called big bandwidth based on somebody’s definition, taxonomy or preference. Hopefully you get the point that there is diversity across various environments as well as types of data and applications, technologies, tools and techniques.

What about little data then?

I often say that if big data is getting all the marketing dollars to generate industry adoption, then little data is generating all the revenue (and profit or margin) dollars by customer deployment. While tools and technologies related to Hadoop (or Haydoop if you are from HDS) are getting industry adoption attention (e.g. marketing dollars being spent) revenues from customer deployment are growing.

Where big data revenues are strongest for most vendors today are centered around solutions for hosting, storing, managing and protecting big files, big objects. These include scale out NAS solutions for large unstructured data like those from Amplidata, Cray, Dell, Data Direct Networks (DDN), EMC (e.g. Isilon), HP X9000 (IBRIX), IBM SONAS, NetApp, Oracle and Xyratex among others. Then there flexible converged compute storage platforms optimized for analytics and running different software tools such as those from EMC (Greenplum), IBM (Netezza), NetApp (via partnerships) or Oracle among others that can be used for different purposes in addition to supporting Hadoop and Map reduce.

If little data is databases and things not generally lumped into the big data bucket, and if you think or perceive big data only to be Hadoop map reduce based data, then does that mean all the large unstructured non little data is then very big data or VBD?

Of course the virtualization folks might want to if they have not already corner the V for Virtual Big Data. In that case, then instead of Very Big Data, how about very very Big Data (vvBD). How about Ultra-Large Big Data (ULBD), or High-Revenue Big Data (HRBD), granted the HR might cause some to think its unique for Health Records, or Human Resources, both btw leverage different forms of big data regardless of what you see or think big data is.

Does that then mean we should really be calling videos, audio, PACs, seismic, security surveillance video and related data to be VBD? Would this further confuse the market, or the industry or help elevate it to a grander status in terms of size (data file or object capacity, bandwidth, market size and application usage, market revenue and so forth)?

Do we need various industry consortiums, lobbyists or trade groups to go off and create models, taxonomies, standards and dictionaries based on their constituents needs and would they align with those of the customers, after all, there are big dollars flowing around big data industry adoption (marketing).

What does this all mean?

Is Big Data BS?

First let me be clear, big data is not BS, however there is a lot of BS marketing BS by some along with hype and fud adding to the confusion and chaos, perhaps even missed opportunities. Keep in mind that in chaos and confusion there can be opportunity for some.

IMHO big data is real.

There are different variations, use cases and types of products, technologies and services that fall under the big data umbrella. That does not mean everything can or should fall under the big data umbrella as there is also little data.

What this all means is that there are different types of applications for various industries that have big and little data, virtual and very big data from videos, photos, images, audio, documents and more.

Big data is a big buzzword bingo term these days with vendor marketing big dollars being applied so no surprise the buzz, hype, fud and more.

Ok, nuff said, for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

XtremIO flash SSD more than storage I/O speed

XtremIO the basics

When would you use XtremIO vs. another storage system?

How does XtremIO compare to others?

Who or what are XtremIO competition?

What’s new with this XtremIO announcement

What does XtremIO look like?

How XtremIO works

What about rumor that XtremIO can only do 4KB IOPs?

What else is EMC doing with XtremIO?

Boosting your available software license minutes (ASLM) with SSD

Can we get a side of context with them metrics?

General comments

Share this:

EMC announces XtremIO flash SSD General Availability

What is being announced?

Who or what applications are XtremIO being positioned for?

What about overlap with other EMC products?

XtremIO Announcement Summary

Some General Comment and Perspectives

Share this:

IBM Server Side Storage I/O SSD Flash Cache Software

What was announced, introducing Flash Cache Storage Acceleration or FCSA

Share this:

DRAM, DIMM, DDR3, nand flash memory, SSD, stating what’s often assumed

Share this:

Share this:

Share this:

Share this:

Share this:

The Human Face of Big Data, a Book Review

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: