Little data, big data and very big data (VBD) or big BS?

StorageIO industry trends cloud, virtualization and big data

This is an industry trends and perspective piece about big data and little data, industry adoption and customer deployment.

If you are in any way associated with information technology (IT), business, scientific, media and entertainment computing or related areas, you may have heard big data mentioned. Big data has been a popular buzzword bingo topic and term for a couple of years now. Big data is being used to describe new and emerging along with existing types of applications and information processing tools and techniques.

I routinely hear from different people or groups trying to define what is or is not big data and all too often those are based on a particular product, technology, service or application focus. Thus it should be no surprise that those trying to police what is or is not big data will often do so based on what their interest, sphere of influence, knowledge or experience and jobs depend on.

Traveling and big data images

Not long ago while out traveling I ran into a person who told me that big data is new data that did not exist just a few years ago. Turns out this person was involved in geology so I was surprised that somebody in that field was not aware of or working with geophysical, mapping, seismic and other legacy or traditional big data. Turns out this person was basing his statements on what he knew, heard, was told about or on sphere of influence around a particular technology, tool or approach.

Fwiw, if you have not figured out already, like cloud, virtualization and other technology enabling tools and techniques, I tend to take a pragmatic approach vs. becoming latched on to a particular bandwagon (for or against) per say.

Not surprisingly there is confusion and debate about what is or is not big data including if it only applies to new vs. existing and old data. As with any new technology, technique or buzzword bingo topic theme, various parties will try to place what is or is not under the definition to align with their needs, goals and preferences. This is the case with big data where you can routinely find proponents of Hadoop and Map reduce position big data as aligning with the capabilities and usage scenarios of those related technologies for business and other forms of analytics.

SAS software for big data

Not surprisingly the granddaddy of all business analytics, data science and statistic analysis number crunching is the Statistical Analysis Software (SAS) from the SAS Institute. If these types of technology solutions and their peers define what is big data then SAS (not to be confused with Serial Attached SCSI which can be found on the back-end of big data storage solutions) can be considered first generation big data analytics or Big Data 1.0 (BD1 ;) ). That means Hadoop Map Reduce is Big Data 2.0 (BD2 ;) ;) ) if you like, or dislike for that matter.

Funny thing about some fans and proponents or surrogates of BD2 is that they may have heard of BD1 like SAS with a limited understanding of what it is or how it is or can be used. When I worked in IT as a performance and capacity planning analyst focused on servers, storage, network hardware, software and applications I used SAS to crunch various data streams of event, activity and other data from diverse sources. This involved correlating data, running various analytic algorithms on the data to determine response times, availability, usage and other things in support of modeling, forecasting, tuning and trouble shooting. Hmm, sound like first generation big data analytics or Data Center Infrastructure Management (DCIM) and IT Service Management (ITSM) to anybody?

Now to be fair, comparing SAS, SPSS or any number of other BD1 generation tools to Hadoop and Map Reduce or BD2 second generation tools is like comparing apples to oranges, or apples to pears.

Lets move on as there is much more to what is big data than simply focus around SAS or Hadoop.

StorageIO industry trends cloud, virtualization and big data

Another type of big data are the information generated, processed, stored and used by applications that result in large files, data sets or objects. Large file, objects or data sets include low resolution and high-definition photos, videos, audio, security and surveillance, geophysical mapping and seismic exploration among others. Then there are data warehouses where transactional data from databases gets moved to for analysis in systems such as those from Oracle, Teradata, Vertica or FX among others. Some of those other tools even play (or work) in both traditional e.g. BD1 and new or emerging BD2 worlds.

This is where some interesting discussions, debates or disagreements can occur between those who latch onto or want to keep big data associated with being something new and usually focused around their preferred tool or technology. What results from these types of debates or disagreements is a missed opportunity for organizations to realize that they might already be doing or using a form of big data and thus have a familiarity and comfort zone with it.

By having a familiarity or comfort zone vs. seeing big data as something new, different, hype or full of FUD (or BS), an organization can be comfortable with the term big data. Often after taking a step back and looking at big data beyond the hype or fud, the reaction is along the lines of, oh yeah, now we get it, sure, we are already doing something like that so lets take a look at some of the new tools and techniques to see how we can extend what we are doing.

Likewise many organizations are doing big bandwidth already and may not realize it thinking that is only what media and entertainment, government, technical or scientific computing, high performance computing or high productivity computing (HPC) does. I’m assuming that some of the big data and big bandwidth pundits will disagree, however if in your environment you are doing many large backups, archives, content distribution, or copying large amounts of data for different purposes that consume big bandwidth and need big bandwidth solutions.

Yes I know, that’s apples to oranges and perhaps stretching the limits of what is or can be called big bandwidth based on somebody’s definition, taxonomy or preference. Hopefully you get the point that there is diversity across various environments as well as types of data and applications, technologies, tools and techniques.

StorageIO industry trends cloud, virtualization and big data

What about little data then?

I often say that if big data is getting all the marketing dollars to generate industry adoption, then little data is generating all the revenue (and profit or margin) dollars by customer deployment. While tools and technologies related to Hadoop (or Haydoop if you are from HDS) are getting industry adoption attention (e.g. marketing dollars being spent) revenues from customer deployment are growing.

Where big data revenues are strongest for most vendors today are centered around solutions for hosting, storing, managing and protecting big files, big objects. These include scale out NAS solutions for large unstructured data like those from Amplidata, Cray, Dell, Data Direct Networks (DDN), EMC (e.g. Isilon), HP X9000 (IBRIX), IBM SONAS, NetApp, Oracle and Xyratex among others. Then there flexible converged compute storage platforms optimized for analytics and running different software tools such as those from EMC (Greenplum), IBM (Netezza), NetApp (via partnerships) or Oracle among others that can be used for different purposes in addition to supporting Hadoop and Map reduce.

If little data is databases and things not generally lumped into the big data bucket, and if you think or perceive big data only to be Hadoop map reduce based data, then does that mean all the large unstructured non little data is then very big data or VBD?

StorageIO industry trends cloud, virtualization and big data

Of course the virtualization folks might want to if they have not already corner the V for Virtual Big Data. In that case, then instead of Very Big Data, how about very very Big Data (vvBD). How about Ultra-Large Big Data (ULBD), or High-Revenue Big Data (HRBD), granted the HR might cause some to think its unique for Health Records, or Human Resources, both btw leverage different forms of big data regardless of what you see or think big data is.

Does that then mean we should really be calling videos, audio, PACs, seismic, security surveillance video and related data to be VBD? Would this further confuse the market, or the industry or help elevate it to a grander status in terms of size (data file or object capacity, bandwidth, market size and application usage, market revenue and so forth)?

Do we need various industry consortiums, lobbyists or trade groups to go off and create models, taxonomies, standards and dictionaries based on their constituents needs and would they align with those of the customers, after all, there are big dollars flowing around big data industry adoption (marketing).

StorageIO industry trends cloud, virtualization and big data

What does this all mean?

Is Big Data BS?

First let me be clear, big data is not BS, however there is a lot of BS marketing BS by some along with hype and fud adding to the confusion and chaos, perhaps even missed opportunities. Keep in mind that in chaos and confusion there can be opportunity for some.

IMHO big data is real.

There are different variations, use cases and types of products, technologies and services that fall under the big data umbrella. That does not mean everything can or should fall under the big data umbrella as there is also little data.

What this all means is that there are different types of applications for various industries that have big and little data, virtual and very big data from videos, photos, images, audio, documents and more.

Big data is a big buzzword bingo term these days with vendor marketing big dollars being applied so no surprise the buzz, hype, fud and more.

Ok, nuff said, for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Are large storage arrays dead at the hands of SSD?

Storage I/O trends

An industry trends and perspective.

.

Are large storage arrays dead at the hands of SSD? Short answer NO not yet.
There is still a place for traditional storage arrays or appliances particular those with extensive features, functionality and reliability availability serviceability (RAS). In other words, there is still a place for large (and small) storage arrays or appliances including those with SSDs.

Is there a place for newer flash SSD storage systems, appliances and architectures? Yes
Similar to how there is a place for traditional midrange storage arrays or appliances have found their roles vs. traditional higher end so-called enterprise arrays. Think as an example  EMC CLARiiON/VNX or HP EVA/P6000 or HDS AMS/HUS or NetApp FAS or IBM DS5000 or IBM V7000 among others vs. EMC Symmetrix/DMX/VMAX or HP P10000/3Par or HDS VSP/USP or IBM DS8000. In addition to traditional enterprise or high-end storage systems and midrange also known as modular, there are also specialized appliances or targets such as for backup/restore and archiving. Also do not forget the IO performance SSD appliances like those from TMS among others that have been around for a while.

Is the role of large storage systems changing or evolving? Yes
Given their scale and ability to do large amounts of work in a dense footprint, for some the role of these systems is still mission critical tier 1 application and data support. For other environments, their role continues to evolve being used for high-density tier 2 bulk or even near-line storage for on-line access at scale.

Storage I/O trends

Does this mean there is completion between the old and new systems? Yes
In some circumstances as we have seen already with SSD solutions. Some will place as competing or replacements while others as complementing. For example in the PCIe flash SSD card segment EMC VFCache is positioned is complementing Dell, EMC, HDS, HP, IBM, NetApp, Oracle or others storage vs. FusionIO who positions as a replacement for the above and others. Another scenario is how some SSD vendors have and continue to position their all-flash SSD arrays using either drives or PCIe cards to complement and coexist with other storage systems in an environment (e.g. data center level tiering) vs. as a replacement. Also keep in mind SSD solutions that also support a mix of flash devices and traditional HDDs for capacity and cost savings or cloud access in the same solution.

Does this mean that the industry has adopted all SSD appliances as the state of art?
Avoid confusing industry adoption or talk with industry and customer deployment. They are similar, however one is focused on what the industry talks about or discusses as state of art or the future while the other is what customers are doing. Certainly some of the new flash SSD appliance and storage startups such as Solidfire, Nexgen, Violin, Whiptail or veteran TMS among others have promising futures, some of which may actually be in play with the current SSD market shakeout and consolidation.

Does that mean everybody is going SSD?
SSD customer adoption and deployment continues to grow, however so too does the deployment of high-capacity HDDs.

Storage I/O trends

Do SSDs need HDDs, do HDDs need SSDs? Yes
Granted there are environments where needs can be addressed by all of one or the other. However at least near term, there is a very strong market for tiering and mix of SSD, some fast HDDs and lots of high-capacity HDDs to meet various needs including performance, availability, capacity, energy and economics. After all, there is no such thing, as a data or information recession yet budgets are tight or being reduced. Likewise, people and data are living longer.

What does this mean?
If there, were no such thing as a data recession and budgets a non-issue, perhaps everything could move to all flash SSD storage systems. However, we also know that people and data are living longer along with changing data life-cycle patterns. There is also the need for performance to close the traditional data center IO performance to space capacity gap and bottlenecks as well as store and keep data longer.

There will continue to be a need for a mix of high-capacity and high performance. More IO will continue to gravitate towards the IO appliances, however more data will settle in for longer-term retention and continued access as data life-cycle continue to evolve. Watch for more SSD and cache in the large systems, along with higher density SAS-NL (SAS Near Line e.g. high capacity) type drives appearing in those systems.

If you like new shiny new toys or technology (SNTs) to buy, sell or talk about, there will be plenty of those to continue industry adoption while for those who are focused on industry deployment, there will be a mix of new, and continued evolution for implementation.

Related links
Industry adoption vs. industry deployment, is there a difference?

Industry trend: People plus data are aging and living longer

No Such Thing as an Information Recession

Changing Lifecycles & Data Footprint Reduction
What is the best kind of IO? The one you do not have to do
Is SSD dead? No, however some vendors might be
Speaking of speeding up business with SSD storage
Are Hard Disk Drives (HDD’s) getting too big?
IT and storage economics 101, supply and demand
Has SSD put Hard Disk Drives (HDD’s) On Endangered Species List?
Why SSD based arrays and storage appliances can be a good idea (Part I)
Researchers and marketers don’t agree on future of nand flash SSD
EMC VFCache respinning SSD and intelligent caching (Part I)
SSD options for Virtual (and Physical) Environments Part I: Spinning up to speed on SSD

Ok, nuff said for now

Cheers Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

HDS buys BlueArc, any surprises here?

Technically here in the northern hemisphere it is still summer, so there is another summer wedding to announce.

The other day Hitachi Data Systems (aka HDS) announced that they finally tied the knot buying their Network Attached Storage (NAS) partner BlueArc whom they have been in a OEM premarital arrangement for the last five years or so (wow, was that a long engagement or what?). HDS being a subsidiary of Hitachi Ltd. a Japanese company it should be no surprise that they operate in a cool, calculated conservative manner with products that have over the past several decades been known for delivering resiliency, functionality, performance and value.

To those in the IT and specifically data storage industry, the only surprise about HDS buying BlueArc should be what took them so long to do so myself included. With unstructured data, big data, high performance computing, high productivity computing (aka HPC), and big bandwidth needs expanding, it only makes sense that HDS finally ties the knot formally acquiring BlueArc signaling what I hope are a few things for their collective future together.

Things that I hope HDS can accomplish with their acquisition of BlueArc include among others:

  • Leverage the BlueArc hardware and performance combine with the HDS software suite to expand further upstream (and downstream) as well as into different adjacent markets leveraging their success over the long courtship where both parties got to know each other more.
  • Signal to the industry that they are truly committed to a long term NAS product solution strategy. HDS has been doing a good job of sticking with BlueArc for the past five or so years having had several previous NAS partner relationships including with NetApp, NSS and others besides their own internal projects.
  • Expand their focus to lead with NAS pulling storage with it in addition to using NAS to accessorize (or bling aka Mr. T starter kit to go with Mr. T storage videos) storage systems which means of course, going more direct toe to toe with the likes of former partner NetApp, EMC, HP (with IBRIX), IBM and Dell among many others. Ironically former HDS partner NetApp acquired the Engenio storage group from LSI whose products competed with HDS in some spaces, while BlueArc was a Engenio partner.
  • Continue to develop both the hardware and software feature functionality around the BlueArc products in addition to further integration across the joint product lines for both traditional, as well as clustered, scale out, bulk, big data, big bandwidth and HPC environments.
  • Sharpen their NAS message and solution offerings including providing the support, tools and programs to enable both their joint direct sales forces as well as their partner value added reseller (VAR) and channel networks.

Check out (here) some additional comments and perspectives by Ray Lucchesi (aka twitter @raylucchesi) over on his blog pertaining to HDS buying BlueArc.

Congratulations to both HDS and BlueArc along with best wishes, this is a deal that is good for both, now, or once the honeymoon is over, lets see how this is executed upon building on their prior joint success to expand into new market opportunities on a global basis. HDS has tools and people to move into and leverage these new as well as existing opportunities, lets see how they can execute on those hopefully not spending too much time or money on the honeymoon while their competitors are out being busy in some of those same accounts in this last month of an important sales quarter (all quarters are important when it comes to sales).

Disclosure for those interested and FWIW: BlueArc had been a client of StorageIO a few years ago, however not currently. HDS is not nor have they been a client of StorageIO, however in prior life I was a customer of theirs in addition to being a partner and supplier when I was on the vendor side of the table.

 

Ok, nuff said for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved