January 2017 Server StorageIO Update Newsletter

Volume 17, Issue I

Hello and welcome to first 2017 issue of the Server StorageIO update newsletter.

Now that we are past the holidays, year-end crunch, post new years activity including NFL football playoffs, its time to get back on track for the new year and new things.

There is a lot going on, in and around data infrastructure server, storage, and I/O networking connectivity from a hardware, software, and services perceptive. From consumer to small/medium business (SMB), enterprise to web-scale and cloud-managed service providers, physical to virtual, spanning structured database (aka “little data”) to unstructured big data and very big fast data, a lot is happening today.

Watch for more coverage involving data infrastructures as well as other related topics in future newsletters, at StorageIOblog.com as well as in different venues and events.

In This Issue

  • Commentary in the news
  • Tips and Articles
  • StorageIOblog posts
  • Events and Webinars
  • Industry Activity Trends
  • Resources and Links
  • Connect and Converse With Us
  • About Us
  • Enjoy this edition of the Server StorageIO update newsletter.

    Cheers GS

    Industry Activity Trends

    Recent Industry News and Activity includes:

    Broadcom buying Brocade for $5.5B USD (if you missed last fall)
    Cavium QLogic expands 10GbE connectivity for server and storage I/O
    HPE announces enhancements to flash-ready HPE StoreVirtual 3200
    HPE buying scaleable HCI vendor Simplivity for $650 million USD (Cash)
    LinBit and SUSE providing open source high availability (HA) solutions
    StorageCraft (data protection software) acquires Exablox (object storage)
    Teradata has launched their big data database on Azure

     

    StorageIOblog Posts

    Recent and popular Server StorageIOblog posts include:

    In case you missed it:

  • PCIe Server Storage I/O Network Fundamentals
  • If NVMe is the answer, what are the questions?
  • Fixing the Microsoft Windows 10 1709 post upgrade restart loop
  • Data Infrastructure server storage I/O network Recommended Reading
  • Introducing Windows Subsystem for Linux WSL Overview
  • IT transformation Serverless Life Beyond DevOps with New York Times CTO Nick Rockwell Podcast
  • HPE Announces AMD Powered Gen 10 ProLiant DL385 For Software Defined Workloads
  • AWS Announces New S3 Cloud Storage Security Encryption Features
  • NVM Non Volatile Memory Express NVMe Place
  • Data Protection Fundamental Topics Tools Techniques Technologies Tips
  • View other recent as well as past StorageIOblog posts here

     

    StorageIO Commentary in the news

    Recent Server StorageIO industry trends perspectives commentary in the news.

    Via InfoStor: 10 Top Data Storage Applications
    Via InfoStor: Cloud Storage Concerns, Considerations and Trends
    Via InfoStor: 10 Top Data Storage Applications
    Via InfoStor: SSD Trends, Tips and Topics
    Via HPE: Decision guide: Public cloud versus on-prem storage
    Via InfoStor: Six Ways to Boost Data Storage Performance

    View more Server, Storage and I/O trends and perspectives comments here

     

    StorageIO Tips and Articles

    Recent and past Server StorageIO articles appearing in different venues include:

    Via FutureReadyOEM:  When to implement ultra-dense storage
    Via InfoStor: Cloud Storage Concerns, Considerations and Trends
    Via InfoStor: SSD Trends, Tips and Topics

    Check out these resources techniques, trends and tools. View more tips and articles here

     

    Events and Activities

    Recent and upcoming event activities.

    April 3-7, 2017 – Seminars – Dutch workshop seminar series – Nijkerk Netherlands

    March 15, 2017 – Webinar – SNIA/BrightTalkHyperConverged (HCI) and Storage – 10AM PT

    January 26 2017 – Seminar – Presenting at Wipro SDx Summit London UK

    January 11, 2017 Webinar – Redmond Magazine
    Dell Software – Presenting – Tailor Your Backup Data Repositories to Fit Your Needs

    December 13 VMware webinar – vSAN, HCIBench, vSAN Observer and healthcheck

    December 7, 2016 11AM PT – BrightTalk Webinar: Hyper-Converged Infrastructure

    See more webinars and activities on the Server StorageIO Events page here.

     

    Server StorageIO Industry Resources and Links

    Useful links and pages:
    Microsoft TechNet – Various Microsoft related from Azure to Docker to Windows
    storageio.com/links – Various industry links (over 1,000 with more to be added soon)
    objectstoragecenter.com – Cloud and object storage topics, tips and news items
    OpenStack.org – Various OpenStack related items
    storageio.com/protect – Various data protection items and topics
    thenvmeplace.com – Focus on NVMe trends and technologies
    thessdplace.com – NVM and Solid State Disk topics, tips and techniques
    storageio.com/performance – Various server, storage and I/O benchmark and tools
    VMware Technical Network – Various VMware related items

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    Data Infrastructure Primer Overview (Its Whats Inside The Data Center)

    Data Infrastructure Primer Overview

    Data Infrastructure Primer Overview

    Updated 1/17/2018

    Data Infrastructure Primer Overview looks at the resources that combine to support business, cloud and information technology (IT) among other applications that transform data into information or services. The fundamental role of data infrastructures is to provide a platform environment for applications and data that is resilient, flexible, scalable, agile, efficient as well as cost-effective. Put another way, data infrastructures exist to protect, preserve, process, move, secure and serve data as well as their applications for information services delivery. Technologies that make up data infrastructures include hardware, software, cloud or managed services, servers, storage, I/O and networking along with people, processes, policies along with various tools spanning legacy, software-defined virtual, containers and cloud.

    Various Types and Layers of Infrastructures

    Depending on your role or focus, you may have a different view than somebody else of what is infrastructure, or what an infrastructure is. Generally speaking, people tend to refer to infrastructure as those things that support what they are doing at work, at home, or in other aspects of their lives. For example, the roads and bridges that carry you over rivers or valleys when traveling in a vehicle are referred to as infrastructure.

    Similarly, the system of pipes, valves, meters, lifts, and pumps that bring fresh water to you, and the sewer system that takes away waste water, are called infrastructure. The telecommunications network. This includes both wired and wireless, such as cell phone networks, along with electrical generating and transmission networks are considered infrastructure. Even the airplanes, trains, boats, and buses that transport us locally or globally are considered part of the transportation infrastructure. Anything that is below what you do, or that supports what you do is considered infrastructure.

    Software Defined Data Infrastructure overview

    Figure 1 Business, IT Information, Data and other Infrastructures

    This is also the situation with IT systems and services where, depending on where you sit or use various services, anything below what you do may be considered infrastructure. However, that also causes a context issue in that infrastructure can mean different things. For example in figure 1, the user, customer, client, or consumer who is accessing some service or application may view IT in general as infrastructure, or perhaps as business infrastructure.

    Those who develop, service, and support the business infrastructure and its users or clients may view anything below them as infrastructure, from desktop to database, servers to storage, network to security, data protection to physical facilities. Moving down a layer (lower altitude) in figure 1 is the information infrastructure which, depending on your view, may also include servers, storage, and I/O hardware and software.

    To help make a point, let’s think of the information infrastructure as the collection of databases, key-value stores, repositories, and applications along with development tools that support the business infrastructure. This is where you may find developers who maintain and create real business applications for the business infrastructure. Those in the information infrastructure usually refer to what’s below them as infrastructure. Meanwhile, those lower in the stack shown in figure 1 may refer to what’s above them as the customer, user, or application, even if the real user is up another layer or two.

    Whats inside a data infrastructure
    Context matters in the discussion of infrastructure. So for our of server storage I/O fundamentals, the data infrastructures support the databases and applications developers as well as things above, while existing above the physical facilities infrastructure, leveraging power, cooling, and communication network infrastructures below.

    SDDI and Data Infrastructure building blocks

    Figure 2 Data Infrastructure fundamental building blocks (hardware, software, services).

    Figure 2 shows the fundamental pillars or building blocks for a data infrastructure, including servers for computer processing, I/O networks for connectivity, and storage for storing data. These resources including both hardware and software as well as services and tools. The size of the environment, organization, or application needs will determine how large or small the data infrastructure is or can be.

    For example, at one extreme you can have a single high-performance laptop with a hypervisor running OpenStack; along with various operating systems along with their applications leveraging flash SSD and high-performance wired or wireless networks powering a home lab or test environment. On the other hand, you can have a scenario with tens of thousands (or more) servers, networking devices, and hundreds of petabytes (PBs) of storage (or more).

    In figure 2 the primary data infrastructure components or pillar (server, storage, and I/O) hardware and software resources are packaged and defined to meet various needs. Software-defined storage management includes configuring the server, storage, and I/O hardware and software as well as services for use, implementing data protection and security, provisioning, diagnostics, troubleshooting, performance analysis, and other activities. Server storage and I/O hardware and software can be individual components, prepackaged as bundles or application suites and converged, among other options.

    Figure 3 shows a deeper look into the data infrastructure shown at a high level in figure 2. The lower left of figure 2 shows the common-to-all-environments hardware, software, people, processes, and practices that include tradecraft (experiences, skills, techniques) and “valueware”. Valueware is how you define the hardware and software along with any customization to create a resulting service that adds value to what you are doing or supporting. Also shown in figure 3 are common application and services attributes including performance, availability, capacity, and economics (PACE), which vary with different applications or usage scenarios.

    Data Infrastructure components

    Figure 3 Data Infrastructure server storage I/O hardware and software components.

    Applications are what transform data into information. Figure 4 shows how applications, which are software defined by people and software, consist of algorithms, policies, procedures, and rules that are put into some code to tell the server processor (CPU) what to do.

    SDDI and SDDC server storage I/O

    Figure 4 How data infrastructure resources transform data into information.

    Application programs include data structures (not to be confused with infrastructures) that define what data looks like and how to organize and access it using the “rules of the road” (the algorithms). The program algorithms along with data structures are stored in memory, together with some of the data being worked on (i.e., the active working set). Additional data is stored in some form of extended memory storage devices such as Non-Volatile Memory (NVM) solid-state devices (SSD), hard disk drives (HDD), or tape, among others, either locally or remotely. Also shown in figure 4 are various devices that do input/output (I/O) with the applications and server, including mobile devices as well as other application servers.

    Bringing IT All Together (for now)

    Software Defined Data Infrastructure overview

    Figure 5 Data Infrastructure  fundamentals “big picture”

    A fundamental theme is that servers process data using various applications programs to create information; I/O networks provide connectivity to access servers and storage; storage is where data gets stored, protected, preserved, and served from; and all of this needs to be managed. There are also many technologies involved, including hardware, software, and services as well as various techniques that make up a server, storage, and I/O enabled data infrastructure.

    Server storage I/O and data infrastructure fundamental focus areas include:

    • Organizations: Markets and industry focus, organizational size
    • Applications: What’s using, creating, and resulting in server storage I/O demands
    • Technologies: Tools and hard products (hardware, software, services, packaging)
    • Trade craft: Techniques, skills, best practices, how managed, decision making
    • Management: Configuration, monitoring, reporting, troubleshooting, performance, availability, data protection and security, access, and capacity planning

    Where To Learn More

    View additional Data Infrastructure and related topics via the following links.

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    Whether you realize it or not, you may already be using, rely upon, affiliated with, support or otherwise involved with data infrastructures. Granted what you or others generically refer to as infrastructure or the data center may, in fact, be the data infrastructure. Watch for more discussions and content about as well as related technologies, tools, trends, techniques and tradecraft in future posts as well as other venues, some of which involve legacy, others software-defined, cloud, virtual, container and hybrid.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    S3motion Buckets Containers Objects AWS S3 Cloud and EMCcode

    Storage I/O trends

    S3motion Buckets Containers Objects AWS S3 Cloud and EMCcode

    It’s springtime in Kentucky and recently I had the opportunity to have a conversation with Kendrick Coleman to talk about S3motion, Buckets, Containers, Objects, AWS S3, Cloud and Object Storage, node.js, EMCcode and open source among other related topics which are available in a podcast here, or video here and available at StorageIO.tv.

    In this Server StorageIO industry trends perspective podcast episode, @EMCcode (Part of EMC) developer advocate Kendrick Coleman (@KendrickColeman) joins me for a conversation. Our conversation spans spring-time in Kentucky (where Kendrick lives) which means Bourbon and horse racing as well as his blog (www.kendrickcoleman.com).

    Btw, in the podcast I refer to Captain Obvious and Kendrick’s beard, for those not familiar with who or what @Captainobvious is that is made reference to, click here to learn more.


    @Kendrickcoleman
    & @Captainobvious

    What about Clouds Object Storage Programming and other technical stuff?

    Of course we also talk some tech including what is EMCcode, EMC Federation, Cloud Foundry, clouds, object storage, buckets, containers, objects, node.js, Docker, Openstack, AWS S3, micro services, and the S3motion tool that Kendrick developed.

    Cloud and Object Storage Access
    Click to view video

    Kendrick explains the motivation behind S3motion along with trends in and around objects (including GET, PUT vs. traditional Read, Write) as well as programming among related topic themes and how context matters.

    S3motion for AWS S3 Google and object storage
    Click to listen to podcast

    I have used S3motion for moving buckets, containers and objects around including between AWS S3, Google Cloud Storage (GCS) and Microsoft Azure as well as to/from local. S3motion is a good tool to have in your server storage I/O tool box for working with cloud and object storage along with others such as Cloudberry, S3fs, Cyberduck, S3 browser among many others.

    You can get S3motion free from git hub here.

    Amazon Web Services AWS

    Where to learn more

    Here are some links to learn more about AWS S3, Cloud and Object Storage along with related topics

    Also available on

    What this all means and wrap-up

    Context matters when it comes to many things particular about objects as they can mean different things. Tools such as S3motion make it easy for moving your buckets or containers along with objects from one cloud storage system, solution or service to another. Also check out EMCcode to see what they are doing on different fronts from supporting new and greenfield development with Cloud Foundry and PaaS to Openstack to bridging current environments to the next generation of platforms. Also check out Kendricks blog site as he has a lot of good technical content as well as some other fun stuff to learn about. Look forward to having Kendrick on as a guest again soon to continue our conversations. In the meantime, check out S3motion to see how it can fit into your server storage I/O tool box.

    Ok, nuff said, for now..

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    March 2015 Server StorageIO Update Newsletter

     

     

    Volume 15, Issue III

    Hello and welcome to this March 2015 Server and StorageIO update newsletter. Here in the northern hemisphere at least by the calendar spring is here, weather wise winter continues to linger in some areas. March also means in the US college university sports tournaments with many focused on their NCAA men’s basketball championship brackets.

    Besides various college championships, March also has a connection to back up and data protection. Thus this months newsletter has a focus on data protection, after all March 31 is World Backup Day which means it should also be World Restore test day!

    Focus on Data Protection

    Data protection including backup/restore, business continuance (BC), disaster recovery (DR), business resiliency (BR) and archiving across physical, virtual and cloud environments.

    Data Protection Fundamentals

    A reminder on the importance of data protection including backup, BC, DR and related technologies is to make sure they are occuring as planned. Also test your copies and remember the 4 3 2 1 rule or guide.

    4 – Versions (different time intervals)
    3 – Copies of critical data (including versions)
    2 – Different media, devices or systems
    1 – Off-site (cloud or elsewhere)

    The above means having at least four (4) different versions from various points in time of your data. Having three (3) copies including various versions protects against one or more copies being corrupt or damaged. Placing those versions and copies on at least two (2) different storage systems, devices or media if something happens.

    While it might be common sense, a bad April Fools recovery joke would be finding out all of your copies were on the same device which is damaged. That might seem obvious however sometimes the obvious needs to be stated. Also make sure that at least one (1) of your copies is off-site either on off-line media (tape, disk, ssd, optical) or cloud.

    Take a few moments and to verify that your data protection strategy is being implemented and practiced as intended. Also test what is being copied including not only restore the data from cloud, disk, ssd or tape, also make sure you can actually read or use the data being protected. This means make sure that your security credentials including access certificates and decryption occur as expected.

    Watch for more news, updates industry trends perspectives commentary, tips, articles and other information at Storageio.com, StorageIOblog.com, various partner venues as well as in future newsletters.

    StorageIOblog posts

    Data Protection Diaries
    Are restores ready for World Backup Day?
    In case you forgot or did not know, World Backup Day is March 31 2015 (@worldbackupday) so now is a good time to be ready. The only challenge that I have with the World Backup Day (view their site here) that has gone on for a few years know is that it is a good way to call out the importance of backing up or protecting data.
    world backup day test your restore

    However it’s also time to put more emphasis and focus on being able to make sure those backups or protection copies actually work.

    By this I mean doing more than making sure that your data can be read from tape, disk, SSD or cloud service actually going a step further and verifying that restored data can actually be used (read, written, etc).

    The problem, issue and challenges are simple, are your applications, systems and data protected as well as can you use those protection copies (e.g. backups, snapshots, replicas or archives) when as well as were needed? Read more here about World Backup Day and what I’m doing as well as various tips to be ready for successful recovery and avoid being an April 1st fool ;).

    Cloud Conversations
    AWS S3 Cross Region Replication
    Amazon Web Services (AWS) announced several enhancements including a new Simple Storage Service (S3) cross-region replication of objects from a bucket (e.g. container) in one region to a bucket in another region.

    AWS also recently enhanced Elastic Block Storage (EBS) increasing maximum performance and size of Provisioned IOPS (SSD) and General Purpose (SSD) volumes. EBS enhancements included ability to store up to 16 TBytes of data in a single volume and do 20,000 input/output operations per second (IOPS). Read more about EBS and other AWS server, storage I/O  enhancements here.
    AWS regions and availability zones (AZ)
    Example of some AWS Regions and AZs

    AWS S3 buckets and objects are stored in a specific region designated by the customer or user (AWS S3, EBS, EC2, Glacier, Regions and Availability Zone primer can be found here). The challenge being addressed by AWS with S3 replication is being able to move data (e.g. objects) stored in AWS buckets in one region to another in a safe, secure, timely, automated, cost-effective way.

    Continue reading more here about AWS S3 bucket and object replication feature along with related material.

    Additional March StorageIOblog posts include:

    Server Storage I/O performance (Image licensed from Shutterstock by StorageIO)

     

     

    View other recent as well as past blog posts here

    In This Issue

    • Industry Trends Perspectives News
    • Commentary in the news
    • Tips and Articles
    • StorageIOblog posts
    • Events and Webinars
    • Recommended Reading List
    • StorageIOblog posts
    • Server StorageIO Lab reports
    • Resources and Links

     

    Industry News and Activity

    Recent Industry news and activity

    EMC sets up cloudfoundry Dojo
    AWS S3, EBS IOPs and other updates
    New backup/data protection vendor Rubrik
    Google adds nearline Cloud Storage
    AWS and Microsoft Cloud Price battle

    View other recent and upcoming events here

    StorageIO Commentary in the news

    StorageIO news (image licensed for use from Shutterstock by StorageIO)
    Recent Server StorageIO commentary and industry trends perspectives about news, activities and announcements.

    Processor: Enterprise Backup Solution Tips
    Processor: Failed & Old Drives
    EnterpriseStorageForum: Disk Buying Guide
    ChannelProNetwork: 2015 Tech and SSD
    Processor: Detect & Avoid Drive Failures

    View more trends comments here

    StorageIO Tips and Articles

    So you have a new storage device or system. How will you test or find its performance? Check out this quick-read tip on storage benchmark and testing fundamentals over at BizTech.

    Keeping with this months theme of data protection including backup/restore, BC, DR, BR and archiving, here are some more tips. These tips span server storage I/O networking hardware, software, cloud, virtual, performance, data protection applications and related themes including:

    • Test your data restores, can you read and actually use the data? Is you data decrypted, proper security certificates applied?
    • Remember to back up or protect your security encryption keys, certificates and application settings!
    • Revisit what format your data is being saved in including how will you be able to use data saved to the cloud. Will you be able to do a restore to a cloud server or do you need to make sure a copy of your backup tools are on your cloud server instances?

    Check out these resources and links on server storage I/O performance and benchmarking tools. View more tips and articles here

    Various Industry Events

    EMCworld – May 4-6 2015

    Interop – April 29 2015 (Las Vegas)

    Presenting Smart Shopping for Your Storage Strategy

    NAB – April 14-15 2015

    SNIA DSI Event – April 7-9

    View other recent and upcoming events here

    Webinars

    December 11, 2014 – BrightTalk
    Server & Storage I/O Performance

    December 10, 2014 – BrightTalk
    Server & Storage I/O Decision Making

    December 9, 2014 – BrightTalk
    Virtual Server and Storage Decision Making

    December 3, 2014 – BrightTalk
    Data Protection Modernization

    Videos and Podcasts

    StorageIO podcasts are also available via and at StorageIO.tv

    From StorageIO Labs

    Research, Reviews and Reports

    Datadynamics StorageX
    Datadynamics StorageX

    More than a data mover migration tool, StorageX is a tool for adding management and automation around unstructured local and distributed NAS (NFS, CIFS, DFS) file data. Read more here.

    View other StorageIO lab review reports here

    Recommended Reading List

    This is a new section being introduced in this edition of the Server StorageIO update mentioning various books, websites, blogs, articles, tips, tools, videos, podcasts along with other things I have found interesting and want to share with you.

      • Introducing s3motion (via EMCcode e.g. opensource) a tool for copying buckets and objects between public, private and hybrid clouds (e.g. AWS S3, GCS, Microsoft Azure and others) as well as object storage systems. This is a great tool which I have added to my server storage I/O cloud, virtual and physical toolbox. If you are not familiar with EMCcode check it out to learn more…
    • Running Hadoop on Ubuntu Linux (Series of tutorials) for those who want to get their hands dirty vs. using one of the All In One (AIO) appliances.
      • Yellow-bricks (Good blog focused on virtualization, VMware and other related themes) by Duncan Epping @duncanyb

    Resources and Links

    Check out these useful links and pages:
    storageio.com/links
    objectstoragecenter.com
    storageioblog.com/data-protection-diaries-main/

    storageperformance.us
    thessdplace.com
    storageio.com/raid
    storageio.com/ssd

    Enjoy this edition of the Server and StorageIO update newsletter and watch for new tips, articles, StorageIO lab report reviews, blog posts, videos and podcasts along with in the news commentary appearing soon.

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Collecting Transaction Per Minute from SQL Server and HammerDB

    Storage I/O trends

    Collecting Transaction Per Minute from SQL Server and HammerDB

    When using benchmark or workload generation tools such as HammerDB I needed a way to capture and log performance activity metrics such as transactions per minute. For example using HammerDB to simulate an application making database requests performing various transactions as part of testing an overall system solution including server and storage I/O activity. This post takes a look at the problem or challenge I was looking to address, as well as creating a solution after spending time searching for one (still searching btw).

    The Problem, Issue, Challenge, Opportunity and Need

    The challenge is to collect application performance such as transactions per minute from a workload using a database. The workload or benchmark tool (in this case HammerDB) is the System Test Initiator (STI) that drives the activity (e.g. database requests) to a System Under Test (SUT). In this example the SUT is a Microsoft SQL Server running on a Windows 2012 R2 server. What I need is to collect and log into a file for later analysis the transaction rate per minute while the STI is generating a particular workload.

    Server Storage I/O performance

    Understanding the challenge and designing a strategy

    If you have ever used benchmark or workload generation tools such as Quest Benchmark Factory (part of the Toad tools collection) you might be spoiled with how it can be used to not only generate the workload, as well as collect, process, present and even store the results for database workloads such as TPC simulations. In this situation, Transaction Processing Council (TPC) like workloads need to be run and metrics on performance collected. Lets leave Benchmark Factory for a future discussion and focus instead on a free tool called HammerDB and more specifically how to collection transactions per minute metrics from Microsoft SQL Server. While the focus is SQL Server, you can easily adapt the approach for MySQL among others, not to mention there are tools such as Sysbench, Aerospike among other tools.

    The following image (created using my Livescribe Echo digital pen) outlines the problem, as well as sketches out a possible solution design. In the following figure, for my solution I’m going to show how to grab every minute for a given amount of time the count of transactions that have occurred. Later in the post processing (you could also do in the SQL Script) I take the new transaction count (which is cumulative) and subtract the earlier interval which yields the transactions per minute (see examples later in this post).

    collect TPM metrics from SQL Server with hammerdb
    The problem and challenge, a way to collect Transactions Per Minute (TPM)

    Finding a solution

    HammerDB displays results via its GUI, and perhaps there is a way or some trick to get it to log results to a file or some other means, however after searching the web, found that it was quicker to come up with solution. That solution was to decide how to collect and report the transactions per minute (or you could do by second or other interval) from Microsoft SQL Server. The solution was to find what performance counters and metrics are available from SQL Server, how to collect those and log them to a file for processing. What this means is a SQL Server script file would need to be created that ran in a loop collecting for a given amount of time at a specified interval. For example once a minute for several hours.

    Taking action

    The following is a script that I came up with that is far from optimal however it gets the job done and is a starting point for adding more capabilities or optimizations.

    In the following example, set loopcount to some number of minutes to collect samples for. Note however that if you are running a workload test for eight (8) hours with a 30 minute ramp-up time, you would want to use a loopcount (e.g. number of minutes to collect for) of 480 + 30 + 10. The extra 10 minutes is to allow for some samples before the ramp and start of workload, as well as to give a pronounced end of test number of samples. Add or subtract however many minutes to collect for as needed, however keep this in mind, better to collect a few extra minutes vs. not have them and wished you did.

    -- Note and disclaimer:
    -- 
    -- Use of this code sample is at your own risk with Server StorageIO and UnlimitedIO LLC
    -- assuming no responsibility for its use or consequences. You are free to use this as is
    -- for non-commercial scenarios with no warranty implied. However feel free to enhance and
    -- share those enhancements with others e.g. pay it forward.
    -- 
    DECLARE @cntr_value bigint;
    DECLARE @loopcount bigint; # how many minutes to take samples for
    
    set @loopcount = 240
    
    SELECT @cntr_value = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'transactions/sec'
     AND object_name = 'MSSQL$DBIO:Databases'
     AND instance_name = 'tpcc' ; print @cntr_value;
     WAITFOR DELAY '00:00:01'
    -- 
    -- Start loop to collect TPM every minute
    -- 
    
    while @loopcount <> 0
    begin
    SELECT @cntr_value = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'transactions/sec'
     AND object_name = 'MSSQL$DBIO:Databases'
     AND instance_name = 'tpcc' ; print @cntr_value;
     WAITFOR DELAY '00:01:00'
     set @loopcount = @loopcount - 1
    end
    -- 
    -- All done with loop, write out the last value
    -- 
    SELECT @cntr_value = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'transactions/sec'
     AND object_name = 'MSSQL$DBIO:Databases'
     AND instance_name = 'tpcc' ; print @cntr_value;
    -- 
    -- End of script
    -- 

    The above example has loopcount set to 240 for a 200 minute test with a 30 minute ramp and 10 extra minutes of samples. I use the a couple of the minutes to make sure that the system test initiator (STI) such as HammerDB is configured and ready to start executing transactions. You could also put this along with your HammerDB items into a script file for further automation, however I will leave that exercise up to you.

    For those of you familiar with SQL and SQL Server you probably already see some things to improve or stylized or simply apply your own preference which is great, go for it. Also note that I’m only selecting a certain variable from the performance counters as there are many others which you can easily discovery with a couple of SQL commands (e.g. select and specify database instance and object name. Also note that the key is accessing the items in sys.dm_os_performance_counters of your SQL Server database instance.

    The results

    The output from the above is a list of cumulative numbers as shown below which you will need to post process (or add a calculation to the above script). Note that part of running the script is specifying an output file which I show later.

    785
    785
    785
    785
    37142
    1259026
    2453479
    3635138
    

    Implementing the solution

    You can setup the above script to run as part of a larger automation shell or batch script, however for simplicity I’m showing it here using Microsoft SQL Server Studio.

    SQL Server script to collect TPM
    Microsoft SQL Server Studio with script to collect Transaction Per Minute (TPM)

    The following image shows how to specify an output file for the results to be logged to when using Microsoft SQL Studio to run the TPM collection script.

    Specify SQL Server tpm output file
    Microsoft SQL Server Studio specify output file

    With the SQL Server script running to collect results, and HammerDB workload running to generate activity, the following shows Quest Spotlight on Windows (SoW) displaying WIndows Server 2012 R2 operating system level performance including CPU, memory, paging and other activity. Note that this example had about the system test initiator (STI) which is HammerDB and the system under test (SUT) that is Microsoft SQL Server on the same server.

    Spotlight on Windows while SQL Server doing tpc
    Quest Spotlight on Windows showing Windows Server performance activity

    Results and post-processing

    As part of post processing simple use your favorite tool or script or what I often do is pull the numbers into Excel spreadsheet, and simply create a new column of numbers that computes and shows the difference between each step (see below). While in Excel then I plot the numbers as needed which can also be done via a shell script and other plotting tools such as R.

    In the following example, the results are imported into Excel (your favorite tool or script) where I then add a column (B) that simple computes the difference between the existing and earlier counter. For example in cell B2 = A2-A1, B3 = A3-A2 and so forth for the rest of the numbers in column A. I then plot the numbers in column B to show the transaction rates over time that can then be used for various things.

    Hammerdb TPM results from SQL Server processed in Excel
    Results processed in Excel and plotted

    Note that in the above results that might seem too good to be true they are, these were cached results to show the tools and data collection process as opposed to the real work being done, at least for now…

    Where to learn more

    Here are some extra links to have a look at:

    How to test your HDD, SSD or all flash array (AFA) storage fundamentals
    Server and Storage I/O Benchmarking 101 for Smarties
    Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I)
    The SSD Place (collection of flash and SSD resources)
    Server and Storage I/O Benchmarking and Performance Resources
    I/O, I/O how well do you know about good or bad server and storage I/Os?

    What this all means and wrap-up

    There are probably many ways to fine tune and optimize the above script, likewise there may even be some existing tool, plug-in, add-on module, or configuration setting that allows HammerDB to log the transaction activity rates to a file vs. simply showing on a screen. However for now, this is a work around that I have found for when needing to collect transaction activity performance data with HammerDB and SQL Server.

    Ok, nuff said, for now…

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Microsoft Diskspd (Part II): Server Storage I/O Benchmark Tools

    Microsoft Diskspd (Part II): Server Storage I/O Benchmark Tools

    server storage I/O trends

    This is part-two of a two-part post pertaining Microsoft Diskspd.that is also part of a broader series focused on server storage I/O benchmarking, performance, capacity planning, tools and related technologies. You can view part-one of this post here, along with companion links here.

    Microsoft Diskspd StorageIO lab test drive

    Server and StorageIO lab

    Talking about tools and technologies is one thing, installing as well as trying them is the next step for gaining experience so how about some quick hands-on time with Microsoft Diskspd (download your copy here).

    The following commands all specify an I/O size of 8Kbytes doing I/O to a 45GByte file called diskspd.dat located on the F: drive. Note that a 45GByte file is on the small size for general performance testing, however it was used for simplicity in this example. Ideally a larger target storage area (file, partition, device) would be used, otoh, if your application uses a small storage device or volume, then tune accordingly.

    In this test, the F: drive is an iSCSI RAID protected volume, however you could use other storage interfaces supported by Windows including other block DAS or SAN (e.g. SATA, SAS, USB, iSCSI, FC, FCoE, etc) as well as NAS. Also common to the following commands is using 16 threads and 32 outstanding I/Os to simulate concurrent activity of many users, or application processing threads.
    server storage I/O performance
    Another common parameter used in the following was -r for random, 7200 seconds (e.g. two hour) test duration time, display latency ( -L ) disable hardware and software cache ( -h), forcing cpu affinity (-a0,1,2,3). Since the test ran on a server with four cores I wanted to see if I could use those for helping to keep the threads and storage busy. What varies in the commands below is the percentage of reads vs. writes, as well as the results output file. Some of the workload below also had the -S option specified to disable OS I/O buffering (to view how buffering helps when enabled or disabled). Depending on the goal, or type of test, validation, or workload being run, I would choose to set some of these parameters differently.

    diskspd -c45g -b8K -t16 -o32 -r -d7200 -h -w0 -L -a0,1,2,3 F:\diskspd.dat >> SIOWS2012R203_Eiscsi_145_noh_write000.txt

    diskspd -c45g -b8K -t16 -o32 -r -d7200 -h -w50 -L -a0,1,2,3 F:\diskspd.dat >> SIOWS2012R203_Eiscsi_145_noh_write050.txt

    diskspd -c45g -b8K -t16 -o32 -r -d7200 -h -w100 -L -a0,1,2,3 F:\diskspd.dat >> SIOWS2012R203_Eiscsi_145_noh_write100.txt

    diskspd -c45g -b8K -t16 -o32 -r -d7200 -h -S -w0 -L -a0,1,2,3 F:\diskspd.dat >> SIOWS2012R203_Eiscsi_145_noSh_test_write000.txt

    diskspd -c45g -b8K -t16 -o32 -r -d7200 -h -S -w50 -L -a0,1,2,3 F:\diskspd.dat >> SIOWS2012R203_Eiscsi_145_noSh_write050.txt

    diskspd -c45g -b8K -t16 -o32 -r -d7200 -h -S -w100 -L -a0,1,2,3 F:\diskspd.dat >> SIOWS2012R203_Eiscsi_145_noSh_write100.txt

    The following is the output from the above workload command.
    Microsoft Diskspd sample output
    Microsoft Diskspd sample output part 2
    Microsoft Diskspd sample output part 3

    Note that as with any benchmark, workload test or simulation your results will vary. In the above the server, storage and I/O system were not tuned as the focus was on working with the tool, determining its capabilities. Thus do not focus on the performance results per say, rather what you can do with Diskspd as a tool to try different things. Btw, fwiw, in the above example in addition to using an iSCSI target, the Windows 2012 R2 server was a guest on a VMware ESXi 5.5 system.

    Where to learn more

    The following are related links to read more about server (cloud, virtual and physical) storage I/O benchmarking tools, technologies and techniques.

    Drew Robb’s benchmarking quick reference guide
    Server storage I/O benchmarking tools, technologies and techniques resource page
    Server and Storage I/O Benchmarking 101 for Smarties.
    Microsoft Diskspd download and Microsoft Diskspd overview (via Technet)
    I/O, I/O how well do you know about good or bad server and storage I/Os?
    Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I and Part II)

    Comments and wrap-up

    What I like about Diskspd (Pros)

    Reporting including CPU usage (you can’t do server and storage I/O without CPU) along with IOP’s (activity), bandwidth (throughout or amount of data being moved), per thread and total results along with optional reporting. While a GUI would be nice particular for beginners, I’m used to setting up scripts for different workloads so having an extensive options for setting up different workloads is welcome. Being associated with a specific OS (e.g. Windows) the CPU affinity and buffer management controls will be handy for some projects.

    Diskspd has the flexibility to use different storage interfaces and types of storage including files or partitions should be taken for granted, however with some tools don’t take things for granted. I like the flexibility to easily specify various IO sizes including large 1MByte, 10MByte, 20MByte, 100MByte and 500MByte to simulate application workloads that do large sequential (or random) activity. I tried some IO sizes (e.g. specified by -b parameter larger than 500MB however, I received various errors including "Could not allocate a buffer bytes for target" which means that Diskspd can do IO sizes smaller than that. While not able to do IO sizes larger than 500MB, this is actually impressive. Several other tools I have used or with have IO size limits down around 10MByte which makes it difficult for creating workloads that do large IOP’s (note this is the IOP size, not the number of IOP’s).

    Oh, something else that should be obvious however will state it, Diskspd is free unlike some industry de-facto standard tools or workload generators that need a fee to get and use.

    Where Diskspd could be improved (Cons)

    For some users a GUI or configuration wizard would make the tool easier to get started with, on the other hand (oth), I tend to use the command capabilities of tools. Would also be nice to specify ranges as part of a single command such as stepping through an IO size range (e.g. 4K, 8K, 16K, 1MB, 10MB) as well as read write percentages along with varying random sequential mixes. Granted this can easily be done by having a series of commands, however I have become spoiled by using other tools such as vdbench.

    Summary

    Server and storage I/O performance toolbox

    Overall I like Diskspd and have added it to my Server Storage I/O workload and benchmark tool-box

    Keep in mind that the best benchmark or workload generation technology tool will be your own application(s) configured to run as close as possible to production activity levels.

    However when that is not possible, the an alternative is to use tools that have the flexibility to be configured as close as possible to your application(s) workload characteristics. This means that the focus should not be as much on the tool, as opposed to how flexible is a tool to work for you, granted the tool needs to be robust.

    Having said that, Microsoft Diskspd is a good and extensible tool for benchmarking, simulation, validation and comparisons, however it will only be as good as the parameters and configuration you set it up to use.

    Check out Microsoft Diskspd and add it to your benchmark and server storage I/O tool-box like I have done.

    Ok, nuff said (for now)

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Docker for Smarties (e.g. non-dummies) from VMworld 2014

    Docker for Smarties (e.g. non-dummies) from VMworld 2014

    In this Industry Trends Perspectives video pod cast episode (On YouTube) I had a chance to visit with Nathan LeClaire of docker.com at the recent VMworld 2014 in San Francisco for a quick overview of docker and containers are about, what you need to know and where to find more information. Check out this StorageIO Industry Trends Perspective episode "Docker for Smarties" aka not for dummies via YouTube by clicking here or on the image below.

    storage i/o video

    StorageIO docker for smarties from VMworld 2014

    For those not familiar with docker.YouTube videos about server and storage I/O

    Server storage I/O docker for non-dummies
    Docker overview

    What to know about docker
    Three things to know about docker

    key points and where to learn more about docker

    Checkout the Docker for non-dummies storage i/o videovideo here.

    What’s your take, is docker in your future or are you already using it?

    Ok, nuff said (for now)

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Chat with Cash Coleman talking ClearDB, cloud database and Johnny Cash

    Podcast with Cash Coleman talking ClearDB, cloud database and Johnny Cash

    audio

    In this episode from the SNIA DSI 2014 event I am joined by Cashton Coleman (@Cash_Coleman).

    cash coleman cleardb

    Introducing Cashton (Cash) Coleman and ClearDB

    Cashton (Cash) is a Software architect, product mason, family bonder, life builder, idea founder along with Founder & CEO of SuccessBricks, Inc., makers of ClearDB. ClearDB is a provider of MySQL database software tools for cloud and physical environments. In our conversation talk about ClearDB, what they do and whom they do it with including deployments in cloud’s as well as onsite. For example if you are using some of the Microsoft Azure cloud services using MySQL, you may already be using this technology. However, there is more to the story and discussion including how Cash got his name, how to speed up databases for little and big data among other topics.

    If you are a database person, you will want to listen to what Cash has to say about boosting performance and getting more value out of your physical hardware or cloud services. On the other hand if you are a storage person, listen in to get some insight and ideas on to address database performance and resiliency. For others who just like to listen to new trends, technology talk, or hear about emerging companies to keep an eye on, you wont want to miss the podcast conversation.

    Topics and themes discussed:

  • Traditional and Cloud Database
  • MySQL and Database as a Service (DaaS)
  • Microsoft Azure and Amazon Web Services (AWS)
  • Little Data, Big Data and Big Data Databases
  • Boosting database performance with less hardware
  • Getting more value out of fast SSD hardware
  • Database performance and resiliency
  • What’s the Johnny Cash and Cloud Connection
  • Check out ClearDB and listen in to the conversation with Cash podcast here.

    Also available via 

    Ok, nuff said.

    Cheers
    Gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Welcome to the Cloud Bulk Object Storage Resources Center

    Updated 8/31/19

    Cloud Bulk Big Data Software Defined Object Storage Resources

    server storage I/O trends Object Storage resources

    Welcome to the Cloud, Big Data, Software Defined, Bulk and Object Storage Resources Center Page objectstoragecenter.com.

    This object storage resources, along with software defined, cloud, bulk, and scale-out storage page is part of the server StorageIOblog microsite collection of resources. Software-defined, Bulk, Cloud and Object Storage exist to support expanding and diverse application data demands.

    Other related resources include:

  • Software Defined, Cloud, Bulk and Object Storage Fundamentals
  • Software Defined Data Infrastructure Essentials book (CRC Press)
  • Cloud, Software Defined, Scale-Out, Object Storage News Trends
  •  Object storage SDDC SDDI
    Via Software Defined Data Infrastructure Essentials (CRC Press 2017)

    Bulk, Cloud, Object Storage Solutions and Services

    There are various types of cloud, bulk, and object storage including public services such as Amazon Web Services (AWS) Simple Storage Service (S3), Backblaze, Google, Microsoft Azure, IBM Softlayer, Rackspace among many others. There are also solutions for hybrid and private deployment from Cisco, Cloudian, CTERA, Cray, DDN, Dell EMC, Elastifile, Fujitsu, Vantera/HDS, HPE, Hedvig, Huawei, IBM, NetApp, Noobaa, OpenIO, OpenStack, Quantum, Rackspace, Rozo, Scality, Spectra, Storpool, StorageCraft, Suse, Swift, Virtuozzo, WekaIO, WD, among many others.

    Bulk Cloud Object storage SDDC SDDI
    Via Software Defined Data Infrastructure Essentials (CRC Press 2017)

    Cloud products and services among others, along with associated data infrastructures including object storage, file systems, repositories and access methods are at the center of bulk, big data, big bandwidth and little data initiatives on a public, private, hybrid and community basis. After all, not everything is the same in cloud, virtual and traditional data centers or information factories from active data to in-active deep digital archiving.

    Object Context Matters

    Before discussing Object Storage lets take a step back and look at some context that can clarify some confusion around the term object. The word object has many different meanings and context, both inside of the IT world as well as outside. Context matters with the term object such as a verb being a thing that can be seen or touched as well as a person or thing of action or feeling directed towards.

    Besides a person, place or physical thing, an object can be a software-defined data structure that describes something. For example, a database record describing somebody’s contact or banking information, or a file descriptor with name, index ID, date and time stamps, permissions and access control lists along with other attributes or metadata. Another example is an object or blob stored in a cloud or object storage system repository, as well as an item in a hypervisor, operating system, container image or other application.

    Besides being a verb, an object can also be a noun such as disapproval or disagreement with something or someone. From an IT context perspective, an object can also refer to a programming method (e.g. object-oriented programming [oop], or Java [among other environments] objects and classes) and systems development in addition to describing entities with data structures.

    In other words, a data structure describes an object that can be a simple variable, constant, complex descriptor of something being processed by a program, as well as a function or unit of work. There are also objects unique or with context to specific environments besides Java or databases, operating systems, hypervisors, file systems, cloud and other things.

    The Need For Bulk, Cloud and Object Storage

    There is no such thing as an information recession with more data being generated, moved, processed, stored, preserved and served, granted there are economic realities. Likewise as a society our dependence on information being available for work or entertainment, from medical healthcare to social media and all points in between continues to increase (check out the Human Face of Big Data).

    In addition, people and data are living longer, as well as getting larger (hence little data, big data and very big data). Cloud products and services along with associated object storage, file systems, repositories and access methods are at the center of big data, big bandwidth and little data initiatives on a public, private, hybrid and community basis. After all, not everything is the same in cloud, virtual and traditional data centers or information factories from active data to in-active deep digital archiving.

    Click here to view (and hear) more content including cloud and object storage fundamentals

    Click here to view software defined, bulk, cloud and object storage trend news

    cloud object storage

    Where to learn more

    The following resources provide additional information about big data, bulk, software defined, cloud and object storage.



    Via InfoStor: Object Storage Is In Your Future
    Via FujiFilm IT Summit: Software Defined Data Infrastructures (SDDI) and Hybrid Clouds
    Via MultiChannel: After ditching cloud business, Verizon inks Virtual Network Services deal with Amazon
    Via MultiChannel: Verizon Digital Media Services now offers integrated Microsoft Azure Storage
    Via StorageIOblog: AWS EFS Elastic File System (Cloud NAS) First Preview Look
    Via InfoStor: Cloud Storage Concerns, Considerations and Trends
    Via InfoStor: Object Storage Is In Your Future
    Via Server StorageIO: April 2015 Newsletter Focus on Cloud and Object storage
    Via StorageIOblog: AWS S3 Cross Region Replication storage enhancements
    Cloud conversations: AWS EBS, Glacier and S3 overview
    AWS (Amazon) storage gateway, first, second and third impressions
    Cloud and Virtual Data Storage Networking (CRC Book)

    View more news, trends and related cloud object storage activity here.

    Videos and podcasts at storageio.tv also available via Applie iTunes.

    Human Face of Big Data
    Human Face of Big Data (Book review)

    Seven Databases in Seven weeks Seven Databases in Seven Weeks (Book review)

    Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

    Software Defined Data Infrastructure Essentials Book SDDC

    What This All Means

    Object and cloud storage are in your future, the questions are when, where, with what and how among others.

    Watch for more content and links to be added here soon to this object storage center page including posts, presentations, pod casts, polls, perspectives along with services and product solutions profiles.

    Ok, nuff said, for now.

    Gs

    Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

    A Pivotal or cloudy moment for EMC and VMware?

    Storage I/O cloud virtual and big data perspectives

    EMC and VMware (who is majority owned by EMC) have announced a new joint initiative called Pivotal (read more here and here) as part of their software defined data center strategies and architecture.

    Image of EMC and VMware Pivotal PaaS cloud

    Is this a pivotal moment for both EMC and VMware signaling that they will be going head to head (via their new initiative based company) with Amazon Web Services (AWS), Microsoft Azure, HP Cloud services, Rackspace and a long list of others?

    Part of the answer to that question would be based on what is meant by going head to head, and which aspects of those services. For Cloud Platform as a Service (PaaS) along with big data analytics related I would say yes. In terms of other Cloud AaaS or SaaS or IaaS probably not as much so at this time.

    On the surface Pivotal appears to at least initially be more of a Platform as a Service (PaaS) play vs. Software as a Service (SaaS) or Application as a Service (AaaS) or Infrastructure as a Service (IaaS) play. Thus it will be interesting to see how Pivotal pivots and evolves into other directions beyond first cloud and big data applications development assistance.

    This will not be the first initiative or company jointly formed with VMware following on the heals of VCE that also includes Cisco and Intel as partners.

    Pivotal will be headed up by Paul Maritz who has been EMC Chief Strategist and formerly CEO of VMware as well as having spent time at Microsoft. EMC will have 69% ownership with VMware having the balance, it is estimated that about $400 Million US dollars will need to be invested.

    The new company or initiative is slated to launch on or about April 1, 2013 (April Fools day) with target 2013 revenues of about $300 Million. Projections are for an annual revenue of around $1 Billion in five years. That revenue will come from the existing assets and business being brought together along with probably some net new business. Doing some quick back of the napkin based math shows an average straight line growth of about 36% over five years.

    VMware intellectual property and assets contributed:
    Cloudfoundry
    Spring source
    Cetas

    EMC intellectual property and assets continued:
    Pivotal labs
    Greenplum big data solutions

    Thus is this a Pivotal move signaling the entry into new areas that could further disrupt and cloud that status of VMware and EMC as technology suppliers?

    Or this clear the clouds a bit to bring clarity to what EMC and VMware are doing along with leveraging various acquisitions?

    By clarity, this in theory should help place both EMC and VMware with their customer, partners and prospects as technology (along with associated services) supplier (what some refer to as arms merchants) vs. competing with those entities.

    Storage I/O cloud virtual and big data perspectives

    IMHO this is pivotal in that it helps to bring clarity for some of the different technologies and business that EMC and VMware has acquired. That clarity will help its own sales teams along with partners avoid creation of revenue prevention teams impacting sales of other solutions.

    Likewise there should be good synergy around the various tools, technology and offerings around big data, little data and application development with pivotal. That synergy is a combination of tools, technologies, development techniques. The combination of the tools and new techniques should enable customers to leverage new technologies in new ways, vs. trying to use and deploy in old ways.

    Btw, anybody notice Mozy or the lack of that mention keeping in mind that technology was brought back into the EMC backup group fold, while still being operated as a service. Also keep in mind that Mozy was bought by EMC and then transferred to VMware a couple of years ago.

    Ok, nuff said (for now).

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

    Ceph Day Amsterdam 2012 (Object and cloud storage)

    StorageIO industry trends cloud, virtualization and big data

    Recently while I was in Europe presenting some sessions at conferences and doing some seminars, I was invited by Ed Saipetch (@edsai) of Inktank.com to attend the first Ceph Day in Amsterdam.

    Ceph day image

    As luck or fate would turn out, I was in Nijkerk which is about an hour train ride from Amsterdam central station plus a free day in my schedule. After a morning train ride and nice walk from Amsterdam Central I arrived at the Tobacco Theatre (a former tobacco trading venue) where Ceph Day was underway, and in time for lunch of Krokettens sandwich.

    Attendees at Ceph Day

    Lets take a quick step back and address for those not familiar what is Ceph (Cephalanthera) and why it was worth spending a day to attend this event. Ceph is an open source distributed object scale out (e.g. cluster or grid) software platform running on industry standard hardware.

    Dell server supporting ceph demoSketch of ceph demo configuration

    Ceph is used for deploying object storage, cloud storage and managed services, general purpose storage for research, commercial, scientific, high performance computing (HPC) or high productivity computing (commercial) along with backup or data protection and archiving destinations. Other software similar in functionality or capabilities to Ceph include OpenStack Swift, Basho Riak CS, Cleversafe, Scality and Caringo among others. There are also the tin wrapped software (e.g. appliances or pre-packaged) solutions such as Dell DX (Caringo), DataDirect Networks (DDN) WOS, EMC ATMOS and Centera, Amplidata and HDS HCP among others. From a service standpoint, these solutions can be used to build services similar Amazon S3 and Glacier, Rackspace Cloud files and Cloud Block, DreamHost DreamObject and HP Cloud storage among others.

    Ceph cloud and object storage architecture image

    At the heart of Ceph is RADOS a distributed object store that consists of peer nodes functioning as object storage devices (OSD). Data can be accessed via REST (Amazon S3 like) APIs, Libraries, CEPHFS and gateway with information being spread across nodes and OSDs using a CRUSH based algorithm (note Sage Weil is one of the authors of CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data). Ceph is scalable in terms of performance, availability and capacity by adding extra nodes with hard disk drives (HDD) or solid state devices (SSDs). One of the presentations pertained to DreamHost that was an early adopter of Ceph to make their DreamObjects (cloud storage) offering.

    Ceph cloud and object storage deployment image

    In addition to storage nodes, there are also an odd number of monitor nodes to coordinate and manage the Ceph cluster along with optional gateways for file access. In the above figure (via DreamHost), load balancers sit in front of gateways that interact with the storage nodes. The storage node in this example is a physical server with 12 x 3TB HDDs each configured as a OSD.

    Ceph dreamhost dreamobject cloud and object storage configuration image

    In the DreamHost example above, there are 90 storage nodes plus 3 management nodes, the total raw storage capacity (no RAID) is about 3PB (12 x 3TB = 36TB x 90 = 3.24PB). Instead of using RAID or mirroring, each objects data is replicated or copied to three (e.g. N=3) different OSDs (on separate nodes), where N is adjustable for a given level of data protection, for a usable storage capacity of about 1PB.

    Note that for more usable capacity and lower availability, N could be set lower, or a larger value of N would give more durability or data protection at higher storage capacity overhead cost. In addition to using JBOD configurations with replication, Ceph can also be configured with a combination of RAID and replication providing more flexibility for larger environments to balance performance, availability, capacity and economics.

    Ceph dreamhost and dreamobject cloud and object storage deployment image

    One of the benefits of Ceph is the flexibility to configure it how you want or need for different applications. This can be in a cost-effective hardware light configuration using JBOD or internal HDDs in small form factor generally available servers, or high density servers and storage enclosures with optional RAID adapters along with SSD. This flexibility is different from some cloud and object storage systems or software tools which take a stance of not using or avoiding RAID vs. providing options and flexibility to configure and use the technology how you see fit.

    Here are some links to presentations from Ceph Day:
    Introduction and Welcome by Wido den Hollander
    Ceph: A Unified Distributed Storage System by Sage Weil
    Ceph in the Cloud by Wido den Hollander
    DreamObjects: Cloud Object Storage with Ceph by Ross Turk
    Cluster Design and Deployment by Greg Farnum
    Notes on Librados by Sage Weil

    Presentations during ceph day

    While at Ceph day, I was able to spend a few minutes with Sage Weil Ceph creator and founder of inktank.com to record a pod cast (listen here) about what Ceph is, where and when to use it, along with other related topics. Also while at the event I had a chance to sit down with Curtis (aka Mr. Backup) Preston where we did a simulcast video and pod cast. The simulcast involved Curtis recording this video with me as a guest discussing Ceph, cloud and object storage, backup, data protection and related themes while I recorded this pod cast.

    One of the interesting things I heard, or actually did not hear while at the Ceph Day event that I tend to hear at related conferences such as SNW is a focus on where and how to use, configure and deploy Ceph along with various configuration options, replication or copy modes as opposed to going off on erasure codes or other tangents. In other words, instead of focusing on the data protection protocol and algorithms, or what is wrong with the competition or other architectures, the Ceph Day focused was removing cloud and object storage objections and enablement.

    Where do you get Ceph? You can get it here, as well as via 42on.com and inktank.com.

    Thanks again to Sage Weil for taking time out of his busy schedule to record a pod cast talking about Ceph, as well 42on.com and inktank for hosting, and the invitation to attend the first Ceph Day in Amsterdam.

    View of downtown Amsterdam on way to train station to return to Nijkerk
    Returning to Amsterdam central station after Ceph Day

    Ok, nuff said.

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

    Seven databases in seven weeks, a book review of NoSQL databases

    StorageIO industry trends cloud, virtualization and big data

    Seven Databases in Seven Weeks (A Guide to Modern Databases and the NoSQL Movement) is a book written Eric Redmond (@coderoshi) and Jim Wilson (@hexlib), part of The Pragmatic Programmers (@pragprog) series that takes a look at several non SQL based database systems.

    Cover image of seven databases in seven weeks book image

    Coverage includes PostgreSQL, Riak, Apache HBase, MongoDB, Apache CouchDB, Neo4J and Redis with plenty of code and architecture examples. Also covered include relational vs. key value, columnar and document based systems among others.

    The details: Seven Databases in Seven Weeks
    Paperback: 352 pages
    Publisher: Pragmatic Bookshelf (May 18, 2012)
    Language: English
    ISBN-10: 1934356921
    ISBN-13: 978-1934356920
    Product Dimensions: 7.5 x 0.8 x 9 inches

    Buzzwords (or keywords) include availability, consistency, performance and related themes. Others include MongoDB, Cassandra, Redis, Neo4J, JSON, CouchDB, Hadoop, HBase, Amazon Dynamo, Map Reduce, Riak (Basho) and Postgres along with data models including relational, key value, columnar, document and graph along with big data, little data, cloud and object storage.

    While this book is not a how to tutorial or installation guide, it does give a deep dive into the different databases covered. The benefit is gaining an understanding of what the different databases are good for, strengths, weakness, where and when to use or choose them for various needs.

    Look inside seven databases in seven weeks book image
    A look inside my copy of Seven Databases in Seven Days

    Who should this book includes applications developers, programmers, Cloud, big data and IT/ICT architects, planners and designers along with database, server, virtualization and storage professionals. What I like about the book is that it is a great intro and overview along with sufficient depth to understand what these different solutions can and cannot do, when, where and why to use these tools for different situations in a quick read format and plenty of detail.

    Would I recommend buying it: Yes, I bought a copy myself on Amazon.com, get your copy by clicking here.

    Ok, nuff said

    Cheers gs

    Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

    twitter @storageio

    All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved