December 2012 StorageIO Update news letter

StorageIO News Letter Image
December 2012 News letter

Welcome to the December 2012 year end edition of the StorageIO Update news letter including a new format and added content.

You can get access to this news letter via various social media venues (some are shown below) in addition to StorageIO web sites and subscriptions.

Click on the following links to view the December 2012 edition as brief (short HTML sent via Email) version, or the full HTML or PDF versions.

Visit the news letter page to view previous editions of the StorageIO Update.

You can subscribe to the news letter by clicking here.

Enjoy this edition of the StorageIO Update news letter, let me know your comments and feedback.

Nuff said for now

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Ceph Day Amsterdam 2012 (Object and cloud storage)

StorageIO industry trends cloud, virtualization and big data

Recently while I was in Europe presenting some sessions at conferences and doing some seminars, I was invited by Ed Saipetch (@edsai) of Inktank.com to attend the first Ceph Day in Amsterdam.

Ceph day image

As luck or fate would turn out, I was in Nijkerk which is about an hour train ride from Amsterdam central station plus a free day in my schedule. After a morning train ride and nice walk from Amsterdam Central I arrived at the Tobacco Theatre (a former tobacco trading venue) where Ceph Day was underway, and in time for lunch of Krokettens sandwich.

Attendees at Ceph Day

Lets take a quick step back and address for those not familiar what is Ceph (Cephalanthera) and why it was worth spending a day to attend this event. Ceph is an open source distributed object scale out (e.g. cluster or grid) software platform running on industry standard hardware.

Dell server supporting ceph demoSketch of ceph demo configuration

Ceph is used for deploying object storage, cloud storage and managed services, general purpose storage for research, commercial, scientific, high performance computing (HPC) or high productivity computing (commercial) along with backup or data protection and archiving destinations. Other software similar in functionality or capabilities to Ceph include OpenStack Swift, Basho Riak CS, Cleversafe, Scality and Caringo among others. There are also the tin wrapped software (e.g. appliances or pre-packaged) solutions such as Dell DX (Caringo), DataDirect Networks (DDN) WOS, EMC ATMOS and Centera, Amplidata and HDS HCP among others. From a service standpoint, these solutions can be used to build services similar Amazon S3 and Glacier, Rackspace Cloud files and Cloud Block, DreamHost DreamObject and HP Cloud storage among others.

Ceph cloud and object storage architecture image

At the heart of Ceph is RADOS a distributed object store that consists of peer nodes functioning as object storage devices (OSD). Data can be accessed via REST (Amazon S3 like) APIs, Libraries, CEPHFS and gateway with information being spread across nodes and OSDs using a CRUSH based algorithm (note Sage Weil is one of the authors of CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data). Ceph is scalable in terms of performance, availability and capacity by adding extra nodes with hard disk drives (HDD) or solid state devices (SSDs). One of the presentations pertained to DreamHost that was an early adopter of Ceph to make their DreamObjects (cloud storage) offering.

Ceph cloud and object storage deployment image

In addition to storage nodes, there are also an odd number of monitor nodes to coordinate and manage the Ceph cluster along with optional gateways for file access. In the above figure (via DreamHost), load balancers sit in front of gateways that interact with the storage nodes. The storage node in this example is a physical server with 12 x 3TB HDDs each configured as a OSD.

Ceph dreamhost dreamobject cloud and object storage configuration image

In the DreamHost example above, there are 90 storage nodes plus 3 management nodes, the total raw storage capacity (no RAID) is about 3PB (12 x 3TB = 36TB x 90 = 3.24PB). Instead of using RAID or mirroring, each objects data is replicated or copied to three (e.g. N=3) different OSDs (on separate nodes), where N is adjustable for a given level of data protection, for a usable storage capacity of about 1PB.

Note that for more usable capacity and lower availability, N could be set lower, or a larger value of N would give more durability or data protection at higher storage capacity overhead cost. In addition to using JBOD configurations with replication, Ceph can also be configured with a combination of RAID and replication providing more flexibility for larger environments to balance performance, availability, capacity and economics.

Ceph dreamhost and dreamobject cloud and object storage deployment image

One of the benefits of Ceph is the flexibility to configure it how you want or need for different applications. This can be in a cost-effective hardware light configuration using JBOD or internal HDDs in small form factor generally available servers, or high density servers and storage enclosures with optional RAID adapters along with SSD. This flexibility is different from some cloud and object storage systems or software tools which take a stance of not using or avoiding RAID vs. providing options and flexibility to configure and use the technology how you see fit.

Here are some links to presentations from Ceph Day:
Introduction and Welcome by Wido den Hollander
Ceph: A Unified Distributed Storage System by Sage Weil
Ceph in the Cloud by Wido den Hollander
DreamObjects: Cloud Object Storage with Ceph by Ross Turk
Cluster Design and Deployment by Greg Farnum
Notes on Librados by Sage Weil

Presentations during ceph day

While at Ceph day, I was able to spend a few minutes with Sage Weil Ceph creator and founder of inktank.com to record a pod cast (listen here) about what Ceph is, where and when to use it, along with other related topics. Also while at the event I had a chance to sit down with Curtis (aka Mr. Backup) Preston where we did a simulcast video and pod cast. The simulcast involved Curtis recording this video with me as a guest discussing Ceph, cloud and object storage, backup, data protection and related themes while I recorded this pod cast.

One of the interesting things I heard, or actually did not hear while at the Ceph Day event that I tend to hear at related conferences such as SNW is a focus on where and how to use, configure and deploy Ceph along with various configuration options, replication or copy modes as opposed to going off on erasure codes or other tangents. In other words, instead of focusing on the data protection protocol and algorithms, or what is wrong with the competition or other architectures, the Ceph Day focused was removing cloud and object storage objections and enablement.

Where do you get Ceph? You can get it here, as well as via 42on.com and inktank.com.

Thanks again to Sage Weil for taking time out of his busy schedule to record a pod cast talking about Ceph, as well 42on.com and inktank for hosting, and the invitation to attend the first Ceph Day in Amsterdam.

View of downtown Amsterdam on way to train station to return to Nijkerk
Returning to Amsterdam central station after Ceph Day

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Seven databases in seven weeks, a book review of NoSQL databases

StorageIO industry trends cloud, virtualization and big data

Seven Databases in Seven Weeks (A Guide to Modern Databases and the NoSQL Movement) is a book written Eric Redmond (@coderoshi) and Jim Wilson (@hexlib), part of The Pragmatic Programmers (@pragprog) series that takes a look at several non SQL based database systems.

Cover image of seven databases in seven weeks book image

Coverage includes PostgreSQL, Riak, Apache HBase, MongoDB, Apache CouchDB, Neo4J and Redis with plenty of code and architecture examples. Also covered include relational vs. key value, columnar and document based systems among others.

The details: Seven Databases in Seven Weeks
Paperback: 352 pages
Publisher: Pragmatic Bookshelf (May 18, 2012)
Language: English
ISBN-10: 1934356921
ISBN-13: 978-1934356920
Product Dimensions: 7.5 x 0.8 x 9 inches

Buzzwords (or keywords) include availability, consistency, performance and related themes. Others include MongoDB, Cassandra, Redis, Neo4J, JSON, CouchDB, Hadoop, HBase, Amazon Dynamo, Map Reduce, Riak (Basho) and Postgres along with data models including relational, key value, columnar, document and graph along with big data, little data, cloud and object storage.

While this book is not a how to tutorial or installation guide, it does give a deep dive into the different databases covered. The benefit is gaining an understanding of what the different databases are good for, strengths, weakness, where and when to use or choose them for various needs.

Look inside seven databases in seven weeks book image
A look inside my copy of Seven Databases in Seven Days

Who should this book includes applications developers, programmers, Cloud, big data and IT/ICT architects, planners and designers along with database, server, virtualization and storage professionals. What I like about the book is that it is a great intro and overview along with sufficient depth to understand what these different solutions can and cannot do, when, where and why to use these tools for different situations in a quick read format and plenty of detail.

Would I recommend buying it: Yes, I bought a copy myself on Amazon.com, get your copy by clicking here.

Ok, nuff said

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Data Center Infrastructure Management (DCIM) and IRM

StorageIO industry trends cloud, virtualization and big data

There are many business drivers and technology reasons for adopting data center infrastructure management (DCIM) and infrastructure Resource Management (IRM) techniques, tools and best practices. Today’s agile data centers need updated management systems, tools, and best practices that allow organizations to plan, run at a low-cost, and analyze for workflow improvement. After all, there is no such thing as an information recession driving the need to move process and store more data. With budget and other constraints, organizations need to be able to stretch available resources further while reducing costs including for physical space and energy consumption.

The business value proposition of DCIM and IRM includes:

DCIM, Data Center, Cloud and storage management figure

Data Center Infrastructure Management or DCIM also known as IRM has as their names describe a focus around management resources in the data center or information factory. IT resources include physical floor and cabinet space, power and cooling, networks and cabling, physical (and virtual) servers and storage, other hardware and software management tools. For some organizations, DCIM will have a more facilities oriented view focusing on physical floor space, power and cooling. Other organizations will have a converged view crossing hardware, software, facilities along with how those are used to effectively deliver information services in a cost-effective way.

Common to all DCIM and IRM practices are metrics and measurements along with other related information of available resources for gaining situational awareness. Situational awareness enables visibility into what resources exist, how they are configured and being used, by what applications, their performance, availability, capacity and economic effectiveness (PACE) to deliver a given level of service. In other words, DCIM enabled with metrics and measurements that matter allow you to avoid flying blind to make prompt and effective decisions.

DCIM, Data Center and Cloud Metrics Figure

DCIM comprises the following:

  • Facilities, power (primary and standby, distribution), cooling, floor space
  • Resource planning, management, asset and resource tracking
  • Hardware (servers, storage, networking)
  • Software (virtualization, operating systems, applications, tools)
  • People, processes, policies and best practices for management operations
  • Metrics and measurements for analytics and insight (situational awareness)

The evolving DCIM model is around elasticity, multi-tenant, scalability, flexibility, and is metered and service-oriented. Service-oriented, means a combination of being able to rapidly give new services while keeping customer experience and satisfaction in mind. Also part of being focused on the customer is to enable organizations to be competitive with outside service offerings while focusing on being more productive and economic efficient.

DCIM, Data Center and Cloud E2E management figure

While specific technology domain areas or groups may be focused on their respective areas, interdependencies across IT resource areas are a matter of fact for efficient virtual data centers. For example, provisioning a virtual server relies on configuration and security of the virtual environment, physical servers, storage and networks along with associated software and facility related resources.

You can read more about DCIM, ITSM and IRM in this white paper that I did, as well as in my books Cloud and Virtual Data Storage Networking (CRC Press) and The Green and Virtual Data Center (CRC Press).

Ok, nuff said, for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Podcast: vBrownbags, vForums and VMware vTraining with Alastair Cooke

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, we go virtual, both with the topic (virtualization) and communicating around the world via Skype. My guest is Alastair Cooke (@DemitasseNZ) who joins me from New Zealand to talk about VMware education, training and social networking. Some of the topics that we cover include vForums, vBrownbags, VMware VCDX certification, VDI, Autolab, Professional vBrownbag tech talks, coffee and more. If you are into server virtualization or virtual desktop infrastructures (VDI), or need to learn more, Alastair talks about some great resources. Check out Alastairs site www.demitasse.co.nz for more information about the AutoLab, VMware training and education, along with the vBrownbag podcasts that are also available on iTunes as well as the APAC Virtualisation podcasts.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Alastair and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode vBrownbags, vForums and VMware vTraining with Alastair Cooke.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

SSD past, present and future with Jim Handy

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, I talk with SSD nand flash and DRAM chip analyst Jim Handy of Objective Analysis at the LSI AIS (Accelerating Innovation Summit) 2012 in San Jose. Our conversation includes SSD past, present and future, market and industry trends, who are doing what and things to keep an eye and ear, open for along with server, storage and memory convergence.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Jim and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode SSD Past, Present and Future with Jim Handy.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Have SSDs been unsuccessful with storage arrays (with poll)?

Storage I/O Industry Trends and Perspectives

I hear people talking about how Solid State Devices (SSDs) have not been successful with or for vendors of storage arrays, particular legacy storage systems. Some people have also asserted that large storage arrays are dead at the hands of new purpose-built SSD appliances or storage systems (read more here).

As a reference, legacy storage systems include those from EMC (VMAX and VNX), IBM (DS8000, DCS3700, XIV, and V7000), and NetApp FAS along with those from Dell, Fujitsu, HDS, HP, NEC and Oracle among others.

Granted EMC have launched new SSD based solutions in addition to buying startup eXtremeIO (aka Project X), and IBM bought SSD industry veteran TMS. IMHO, neither of those actions by either vendor signals an early retirement for their legacy storage solutions, instead opening up new markets giving customers more options for addressing data center and IO performance challenges. Keep in mind that the best IO is the one that you do not have to do with the second best being the least impact to applications in a cost-effective way.

SSD, IO, memory and storage hirearchy

Sometimes I even hear people citing or using some other person or source to attribute or make their assertions sound authoritative. You know the game, according to XYZ or, ABC said blah blah blah blah. Of course if you say or repeat something often enough, or hear it again and again, it can become self-convincing (e.g. industry adoption vs. customer deployments). Likewise depending on how many degrees of separation exists between you and the information you get, the more that it can change from what it originally was.

So what about it, has SSD not been successful for legacy storage system vendors and is the only place that SSD has had success is with startups or non-array based solutions?

While there have been some storage systems (arrays and appliances) that may not perform up to their claimed capabilities due to various internal architecture or implementation bottlenecks. For the most part the large vendors including EMC, HP, HDS, IBM, NetApp and Oracle have done very well shipping SSD drives in their solutions. Likewise some of the clean sheet new design based startup systems, as well as some of the startups with hybrid solutions combing HDDs  and SSDs have done well while others are still emerging.

Where SSD can be used and options

This could also be an example where myth becomes reality based on industry adoption vs. customer deployment. What this means is that the myth is that it is the startups that are having success vs. the legacy vendors from an industry adoption conversation standpoint and thus believed by some.

On the other hand, the myth is that vendors such as EMC or NetApp have not had success with their arrays and SSD yet their customer deployments prove otherwise. There is also a myth that only PCIe based SSD can be of value and that drive based SSDs are not worth using which I have a good idea where that myth comes from.

IMHO it is a depends, however safe to say from what I have seen directly that there are some vendors of storage arrays, including so-called legacy systems that have had very good success with SSD. Likewise have seen where some startups have done ok with their new clean sheet designs, including EMC (Project X). Oh, at least for now I am not a believer that with the all SSD based project “X” over at EMC that the venerable VMAX  formerly known as DMX and its predecessors Symmetric have finally hit the end of the line. Rather they will be positioned and play to different markets for some time yet.

Over at IBM I don’t think the DS8000 or XIV or V7000 and SVC folks are winding things down now that they bought SSD vendor TMS who has SSD appliances and PCIe cards. Rest assured there have been success by PCIe flash card vendors both as targets (FusionIO) and cache or hybrid cache and target systems such as those from Intel, LSI, Micron, and TMS (now IBM) among others. Oh, and if you have not noticed, check out what Qlogic, Emulex and some of the other traditional HBA vendors have done with and around SSD caching.

So where does the FUD that storage systems have not had success with SSD come from?

I suspect from those who would rather not see or hear about those who have had success taking away attention from them or their markets. In other words, using Fear, Uncertainty and Doubt (FUD) or some community peer pressure, there is a belief by some that if you hear enough times that something is dead or not of a benefit; you will look at the alternatives.

Care to guess what the preferred alternative is for some? If you guessed a PCIe card or SSD based appliance from your favorite startup that would be a fair assumption.

On the other hand, my educated guess (ok, its much more informed than a guess ;) ) is that if you ask a vendor such as EMC or NetApp they would disagree, while at the same time articulate benefits of different approaches and tools. Likewise, my educated guess is that if you ask some others, they will say mixed things and of course if you talk with the pure plays, take a wild yet educated guess what they will say.

Here is my point.

SSD, DRAM, PCM and storage adoption timeline

The SSD market, including DRAM, nand flash (SLC or MLC or any other xLC), emerging PCM or future mram among other technologies and packaging options is still in its relative infancy. Yes, I know there have been significant industry adoption and many early customer deployments, however talking with IT organizations of all size as well as with vendors and vars, customer deployment of SSD is far from reaching its full potential meaning a bright future.

Simply putting an SSD, card or drive into a solution does not guarantee results.

Likewise having a new architecture does not guarantee things will be faster.

Fast storage systems need fast devices (HDD, HHDD and SSDs) along with fast interfaces to connect with fast servers. Put a fast HDD, HHDD or SSD into a storage system that has bottlenecks (hardware, software, architectural design) and you may not see the full potential of the technology. Likewise put fast ports or interfaces on a storage system that has fast devices however also a bottleneck in its controller has or system architecture and you will not realize the full potential of that solution.

This is not unique to legacy or traditional storage systems, arrays or appliances as it is also the case with new clean sheet designs.

There are many new solutions that are or should be as fast as their touted marketing stories present, however just because something looks impressive in a YouTube video or slide deck or WebEx does not mean it will be fast in your environment. Some of these new design SSD based solutions will displace some legacy storage systems or arrays while many others will find new opportunities. Similar to how previous generation SSD storage appliances found roles complementing traditional storage systems, so to will many of these new generation of products.

What this all means is to navigate your way through the various marketing and architecture debates, benchmarks battles, claims and counter claims to understand what fits your needs and requires.

StorageIO industry trends cloud, virtualization and big data

What say you?

Ok, nuff said

Cheers
Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Mr. Backup (Curtis Preston) goes back to Ceph School

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, I am at the Ceph day in Amsterdam Holland event at the Tobacco Theatre hosted by on42.com and inktank.com.

Ceph Day Amsterdam 2012

My guest for this episode is Curtis (Mr. Backup) Preston (@wcpreston) of Backup School and Backup Central fame where we discuss what is Ceph and object storage, cloud storage, file systems, backup and data protection along with dinner we had at an Indonesian restaurant .

Dinner Restaurant Blauw Utrecht Netherlands
Mr Backup getting ready to compress and dedupe dinner

The dinner we are referring to was at Restaurant Blauw in Utrecht Holland (click here) where Curtis and me were joined by Hans De Leenher @hansdeleenher of Veeam (thanks again for the dinner, that was a disclosure btw ;) ).

Note that this is a special episode in that while I’m recording the pod cast, Curtis is recording a video of our discussion for his truebit.tv site that you can view here.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Curtis and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Also check out the companion to this pod cast where I meet up with Ceph Creator Sage Weil while at Ceph Day.

Enjoy this episode Mr. Backup (Curtis Preston) goes back to Ceph School.

 

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Ben Woo on Big Data Buzzword Bingo and Business Benefits

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, In this episode, Im joined in Frankfurt Germany by Ben Woo (@benwoony) of Neuralytix.com. Our conversation includes cloud; big data and how buzzword bingo technology focused discussions can result in missed business benefits for both vendors and customers. We also reminisce about MTI where we worked together along with protecting home storage.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Ben and myself.

StorageIO podcast

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode with Ben Woo talking big data and business benefits vs. buzzword bingo.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Ceph Day in Amsterdam and Sage Weil on Object Storage

Now also available via

This is a new episode in the continuing StorageIO industry trends and perspectives pod cast series (you can view more episodes or shows along with other audio and video content here) as well as listening via iTunes or via your preferred means using this RSS feed (https://storageio.com/StorageIO_Podcast.xml)

StorageIO industry trends cloud, virtualization and big data

In this episode, I am at the Ceph day in Amsterdam Holland event at the Tobacco Theatre. My guest for this episode is Ceph (Cephalanthera) creator Sage Weil who is also the founder of inktank.com that provides services and support for the open source based Ceph project.

For those not familiar with Ceph, it is an open source distributed object scale out software platform that can be used for deploying cloud and managed services, general purpose storage for research, commercial, scientific, high performance computing (HPC) or high productivity computing (commercial) along with backup or data protection and archiving destinations.

During our conversation Sage presents an overview of what Ceph is (e.g. Ceph for non Dummies), where and how it can be used, some history of the project and how it fits in with or provides an alternative to other solutions. Sage also talks about the business or commercial considerations for open source based projects, importance of community and having good business mentors and partners as well as staying busy with his young family.

If you are a Ceph fan, gain more insight into Sage along with Ceph day sponsors Inktank and 42on. On the other hand, if you new to object storage, open source storage software or cloud storage, listen in to gain perspectives of where technology such as Ceph fits for public, private, hybrid or traditional environments.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Sage and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy this episode Ceph Day in Amsterdam with Sage Weil.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Little data, big data and very big data (VBD) or big BS?

StorageIO industry trends cloud, virtualization and big data

This is an industry trends and perspective piece about big data and little data, industry adoption and customer deployment.

If you are in any way associated with information technology (IT), business, scientific, media and entertainment computing or related areas, you may have heard big data mentioned. Big data has been a popular buzzword bingo topic and term for a couple of years now. Big data is being used to describe new and emerging along with existing types of applications and information processing tools and techniques.

I routinely hear from different people or groups trying to define what is or is not big data and all too often those are based on a particular product, technology, service or application focus. Thus it should be no surprise that those trying to police what is or is not big data will often do so based on what their interest, sphere of influence, knowledge or experience and jobs depend on.

Traveling and big data images

Not long ago while out traveling I ran into a person who told me that big data is new data that did not exist just a few years ago. Turns out this person was involved in geology so I was surprised that somebody in that field was not aware of or working with geophysical, mapping, seismic and other legacy or traditional big data. Turns out this person was basing his statements on what he knew, heard, was told about or on sphere of influence around a particular technology, tool or approach.

Fwiw, if you have not figured out already, like cloud, virtualization and other technology enabling tools and techniques, I tend to take a pragmatic approach vs. becoming latched on to a particular bandwagon (for or against) per say.

Not surprisingly there is confusion and debate about what is or is not big data including if it only applies to new vs. existing and old data. As with any new technology, technique or buzzword bingo topic theme, various parties will try to place what is or is not under the definition to align with their needs, goals and preferences. This is the case with big data where you can routinely find proponents of Hadoop and Map reduce position big data as aligning with the capabilities and usage scenarios of those related technologies for business and other forms of analytics.

SAS software for big data

Not surprisingly the granddaddy of all business analytics, data science and statistic analysis number crunching is the Statistical Analysis Software (SAS) from the SAS Institute. If these types of technology solutions and their peers define what is big data then SAS (not to be confused with Serial Attached SCSI which can be found on the back-end of big data storage solutions) can be considered first generation big data analytics or Big Data 1.0 (BD1 ;) ). That means Hadoop Map Reduce is Big Data 2.0 (BD2 ;) ;) ) if you like, or dislike for that matter.

Funny thing about some fans and proponents or surrogates of BD2 is that they may have heard of BD1 like SAS with a limited understanding of what it is or how it is or can be used. When I worked in IT as a performance and capacity planning analyst focused on servers, storage, network hardware, software and applications I used SAS to crunch various data streams of event, activity and other data from diverse sources. This involved correlating data, running various analytic algorithms on the data to determine response times, availability, usage and other things in support of modeling, forecasting, tuning and trouble shooting. Hmm, sound like first generation big data analytics or Data Center Infrastructure Management (DCIM) and IT Service Management (ITSM) to anybody?

Now to be fair, comparing SAS, SPSS or any number of other BD1 generation tools to Hadoop and Map Reduce or BD2 second generation tools is like comparing apples to oranges, or apples to pears.

Lets move on as there is much more to what is big data than simply focus around SAS or Hadoop.

StorageIO industry trends cloud, virtualization and big data

Another type of big data are the information generated, processed, stored and used by applications that result in large files, data sets or objects. Large file, objects or data sets include low resolution and high-definition photos, videos, audio, security and surveillance, geophysical mapping and seismic exploration among others. Then there are data warehouses where transactional data from databases gets moved to for analysis in systems such as those from Oracle, Teradata, Vertica or FX among others. Some of those other tools even play (or work) in both traditional e.g. BD1 and new or emerging BD2 worlds.

This is where some interesting discussions, debates or disagreements can occur between those who latch onto or want to keep big data associated with being something new and usually focused around their preferred tool or technology. What results from these types of debates or disagreements is a missed opportunity for organizations to realize that they might already be doing or using a form of big data and thus have a familiarity and comfort zone with it.

By having a familiarity or comfort zone vs. seeing big data as something new, different, hype or full of FUD (or BS), an organization can be comfortable with the term big data. Often after taking a step back and looking at big data beyond the hype or fud, the reaction is along the lines of, oh yeah, now we get it, sure, we are already doing something like that so lets take a look at some of the new tools and techniques to see how we can extend what we are doing.

Likewise many organizations are doing big bandwidth already and may not realize it thinking that is only what media and entertainment, government, technical or scientific computing, high performance computing or high productivity computing (HPC) does. I’m assuming that some of the big data and big bandwidth pundits will disagree, however if in your environment you are doing many large backups, archives, content distribution, or copying large amounts of data for different purposes that consume big bandwidth and need big bandwidth solutions.

Yes I know, that’s apples to oranges and perhaps stretching the limits of what is or can be called big bandwidth based on somebody’s definition, taxonomy or preference. Hopefully you get the point that there is diversity across various environments as well as types of data and applications, technologies, tools and techniques.

StorageIO industry trends cloud, virtualization and big data

What about little data then?

I often say that if big data is getting all the marketing dollars to generate industry adoption, then little data is generating all the revenue (and profit or margin) dollars by customer deployment. While tools and technologies related to Hadoop (or Haydoop if you are from HDS) are getting industry adoption attention (e.g. marketing dollars being spent) revenues from customer deployment are growing.

Where big data revenues are strongest for most vendors today are centered around solutions for hosting, storing, managing and protecting big files, big objects. These include scale out NAS solutions for large unstructured data like those from Amplidata, Cray, Dell, Data Direct Networks (DDN), EMC (e.g. Isilon), HP X9000 (IBRIX), IBM SONAS, NetApp, Oracle and Xyratex among others. Then there flexible converged compute storage platforms optimized for analytics and running different software tools such as those from EMC (Greenplum), IBM (Netezza), NetApp (via partnerships) or Oracle among others that can be used for different purposes in addition to supporting Hadoop and Map reduce.

If little data is databases and things not generally lumped into the big data bucket, and if you think or perceive big data only to be Hadoop map reduce based data, then does that mean all the large unstructured non little data is then very big data or VBD?

StorageIO industry trends cloud, virtualization and big data

Of course the virtualization folks might want to if they have not already corner the V for Virtual Big Data. In that case, then instead of Very Big Data, how about very very Big Data (vvBD). How about Ultra-Large Big Data (ULBD), or High-Revenue Big Data (HRBD), granted the HR might cause some to think its unique for Health Records, or Human Resources, both btw leverage different forms of big data regardless of what you see or think big data is.

Does that then mean we should really be calling videos, audio, PACs, seismic, security surveillance video and related data to be VBD? Would this further confuse the market, or the industry or help elevate it to a grander status in terms of size (data file or object capacity, bandwidth, market size and application usage, market revenue and so forth)?

Do we need various industry consortiums, lobbyists or trade groups to go off and create models, taxonomies, standards and dictionaries based on their constituents needs and would they align with those of the customers, after all, there are big dollars flowing around big data industry adoption (marketing).

StorageIO industry trends cloud, virtualization and big data

What does this all mean?

Is Big Data BS?

First let me be clear, big data is not BS, however there is a lot of BS marketing BS by some along with hype and fud adding to the confusion and chaos, perhaps even missed opportunities. Keep in mind that in chaos and confusion there can be opportunity for some.

IMHO big data is real.

There are different variations, use cases and types of products, technologies and services that fall under the big data umbrella. That does not mean everything can or should fall under the big data umbrella as there is also little data.

What this all means is that there are different types of applications for various industries that have big and little data, virtual and very big data from videos, photos, images, audio, documents and more.

Big data is a big buzzword bingo term these days with vendor marketing big dollars being applied so no surprise the buzz, hype, fud and more.

Ok, nuff said, for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Networking with Bruce Ravid and Bruce Rave

Now also available via

This is the eighth (here is the first, second, third, fourth, fifth, sixth, and seventh) in a series of StorageIO industry trends and perspective audio blog and pod cast discussions from Storage Networking World (SNW) Fall 2012 in Santa Clara California.

StorageIO industry trends cloud, virtualization and big data

In this episode, my co-host Bruce Rave aka Bruce Ravid of Ravid and Associates (twitter @brucerave) recap the recent SNW conference and series of pod casts. Our conversation also covers importance of networking and career tips (Bruce is an executive recruiter aka career advisory consultant) for those of you that are new and upcoming, as well those of you who are seasoned veterans to standout in a crowd.

Bruce also talks about his internet music radio show called Go Deep on moheak.com along with up and coming bands to keep an eye and ear open for in 2013. Check out Bruces sites at ravid.com and godeepmusic.net as well as listen to his internet radio show that airs weekly Sunday evenings 7 to 9PM PT on moheak.com.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Bruce and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts from SNW and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy listening to Networking with Bruce Ravid.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Industry trends and perspectives: Ray Lucchesi on Storage and SNW

Now also available via

This is the sixth (here is the first, second, third, fourth and fifth) in a series of StorageIO industry trends and perspective audio blog and pod cast discussions from Storage Networking World (SNW) Fall 2012 in Santa Clara California.

StorageIO industry trends cloud, virtualization and big data

Given how at conference conversations tend to occur in the hallways, lobbies and bar areas of venues, what better place to have candid conversations with people from throughout the industry, some you know, some you will get to know better.

In this episode, my co-host Bruce Rave aka Bruce Ravid of Ravid and Associates (twitter @brucerave) meets up with Ray Lucchesi (@RayLucchesi) of Silverton Consulting and Ray on storage blog in the Santa Clara Hyatt (event venue) lobby bar area. Our conversation covers past and present SNWs along with other industry conferences, shows and events, along with social networking, technology, being a soccer dad with teenage kids who are aspiring actors and more.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Ray, Bruce and myself.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts from SNW and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com and StorageIOblog.com.

Enjoy listening to Ray on storage and SNW from the Fall SNW 2012 pod cast.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

Industry trends and perspectives: Learning with Leo Leger of SNIA

This is the fifth (here is the first, second, third and fourth ) in a series of StorageIO industry trends and perspective audio blog and pod cast discussions from Storage Networking World (SNW) Fall 2012 in Santa Clara California.

StorageIO industry trends cloud, virtualization and big data

In this episode, while I’m on a plane flying home above the clouds, my co-host Bruce Rave aka Bruce Ravid of Ravid and Associates (twitter @brucerave) meets up with SNIA executive director Leo Leger. Some of you may know or know of Leo, for those who do not, he is the person behind the scenes that puts SNW together as well as coordinates many other SNIA activities and events in conjunction with chair Wayne Adams (aka listen to Waynes World here) and other SNIA members and staff.

Click here (right-click to download MP3 file) or on the microphone image to listen to the conversation with Leo and Bruce.

StorageIO podcast

Also available via

Watch (and listen) for more StorageIO industry trends and perspectives audio blog posts pod casts from SNW and other upcoming events. Also be sure to heck out other related pod casts, videos, posts, tips and industry commentary at StorageIO.com, StorageIOblog.com and StorageIO.tv.

Enjoy listening to learning Leo Leger from the Fall SNW 2012 pod cast.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved