Application Data Value Characteristics Everything Is Not The Same (Part I)

Application Data Value Characteristics Everything Is Not The Same

Application Data Value Characteristics Everything Is Not The Same

Application Data Value Characteristics Everything Is Not The Same

This is part one of a five-part mini-series looking at Application Data Value Characteristics Everything Is Not The Same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we start things off by looking at general application server storage I/O characteristics that have an impact on data value as well as access.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Everything is not the same across different organizations including Information Technology (IT) data centers, data infrastructures along with the applications as well as data they support. For example, there is so-called big data that can be many small files, objects, blobs or data and bit streams representing telemetry, click stream analytics, logs among other information.

Keep in mind that applications impact how data is accessed, used, processed, moved and stored. What this means is that a focus on data value, access patterns, along with other related topics need to also consider application performance, availability, capacity, economic (PACE) attributes.

If everything is not the same, why is so much data along with many applications treated the same from a PACE perspective?

Data Infrastructure resources including servers, storage, networks might be cheap or inexpensive, however, there is a cost to managing them along with data.

Managing includes data protection (backup, restore, BC, DR, HA, security) along with other activities. Likewise, there is a cost to the software along with cloud services among others. By understanding how applications use and interact with data, smarter, more informed data management decisions can be made.

IT Applications and Data Infrastructure Layers
IT Applications and Data Infrastructure Layers

Keep in mind that everything is not the same across various organizations, data centers, data infrastructures, data and the applications that use them. Also keep in mind that programs (e.g. applications) = algorithms (code) + data structures (how data defined and organized, structured or unstructured).

There are traditional applications, along with those tied to Internet of Things (IoT), Artificial Intelligence (AI) and Machine Learning (ML), Big Data and other analytics including real-time click stream, media and entertainment, security and surveillance, log and telemetry processing among many others.

What this means is that there are many different application with various character attributes along with resource (server compute, I/O network and memory, storage requirements) along with service requirements.

Common Applications Characteristics

Different applications will have various attributes, in general, as well as how they are used, for example, database transaction activity vs. reporting or analytics, logs and journals vs. redo logs, indices, tables, indices, import/export, scratch and temp space. Performance, availability, capacity, and economics (PACE) describes the applications and data characters and needs shown in the following figure.

Application and data PACE attributes
Application PACE attributes (via Software Defined Data Infrastructure Essentials)

All applications have PACE attributes, however:

  • PACE attributes vary by application and usage
  • Some applications and their data are more active than others
  • PACE characteristics may vary within different parts of an application

Think of applications along with associated data PACE as its personality or how it behaves, what it does, how it does it, and when, along with value, benefit, or cost as well as quality-of-service (QoS) attributes.

Understanding applications in different environments, including data values and associated PACE attributes, is essential for making informed server, storage, I/O decisions and data infrastructure decisions. Data infrastructures decisions range from configuration to acquisitions or upgrades, when, where, why, and how to protect, and how to optimize performance including capacity planning, reporting, and troubleshooting, not to mention addressing budget concerns.

Primary PACE attributes for active and inactive applications and data are:

P – Performance and activity (how things get used)
A – Availability and durability (resiliency and data protection)
C – Capacity and space (what things use or occupy)
E – Economics and Energy (people, budgets, and other barriers)

Some applications need more performance (server computer, or storage and network I/O), while others need space capacity (storage, memory, network, or I/O connectivity). Likewise, some applications have different availability needs (data protection, durability, security, resiliency, backup, business continuity, disaster recovery) that determine the tools, technologies, and techniques to use.

Budgets are also nearly always a concern, which for some applications means enabling more performance per cost while others are focused on maximizing space capacity and protection level per cost. PACE attributes also define or influence policies for QoS (performance, availability, capacity), as well as thresholds, limits, quotas, retention, and disposition, among others.

Performance and Activity (How Resources Get Used)

Some applications or components that comprise a larger solution will have more performance demands than others. Likewise, the performance characteristics of applications along with their associated data will also vary. Performance applies to the server, storage, and I/O networking hardware along with associated software and applications.

For servers, performance is focused on how much CPU or processor time is used, along with memory and I/O operations. I/O operations to create, read, update, or delete (CRUD) data include activity rate (frequency or data velocity) of I/O operations (IOPS). Other considerations include the volume or amount of data being moved (bandwidth, throughput, transfer), response time or latency, along with queue depths.

Activity is the amount of work to do or being done in a given amount of time (seconds, minutes, hours, days, weeks), which can be transactions, rates, IOPs. Additional performance considerations include latency, bandwidth, throughput, response time, queues, reads or writes, gets or puts, updates, lists, directories, searches, pages views, files opened, videos viewed, or downloads.
 
Server, storage, and I/O network performance include:

  • Processor CPU usage time and queues (user and system overhead)
  • Memory usage effectiveness including page and swap
  • I/O activity including between servers and storage
  • Errors, retransmission, retries, and rebuilds

the following figure shows a generic performance example of data being accessed (mixed reads, writes, random, sequential, big, small, low and high-latency) on a local and a remote basis. The example shows how for a given time interval (see lower right), applications are accessing and working with data via different data streams in the larger image left center. Also shown are queues and I/O handling along with end-to-end (E2E) response time.

fundamental server storage I/O
Server I/O performance fundamentals (via Software Defined Data Infrastructure Essentials)

Click here to view a larger version of the above figure.

Also shown on the left in the above figure is an example of E2E response time from the application through the various data infrastructure layers, as well as, lower center, the response time from the server to the memory or storage devices.

Various queues are shown in the middle of the above figure which are indicators of how much work is occurring, if the processing is keeping up with the work or causing backlogs. Context is needed for queues, as they exist in the server, I/O networking devices, and software drivers, as well as in storage among other locations.

Some basic server, storage, I/O metrics that matter include:

  • Queue depth of I/Os waiting to be processed and concurrency
  • CPU and memory usage to process I/Os
  • I/O size, or how much data can be moved in a given operation
  • I/O activity rate or IOPs = amount of data moved/I/O size per unit of time
  • Bandwidth = data moved per unit of time = I/O size × I/O rate
  • Latency usually increases with larger I/O sizes, decreases with smaller requests
  • I/O rates usually increase with smaller I/O sizes and vice versa
  • Bandwidth increases with larger I/O sizes and vice versa
  • Sequential stream access data may have better performance than some random access data
  • Not all data is conducive to being sequential stream, or random
  • Lower response time is better, higher activity rates and bandwidth are better

Queues with high latency and small I/O size or small I/O rates could indicate a performance bottleneck. Queues with low latency and high I/O rates with good bandwidth or data being moved could be a good thing. An important note is to look at several metrics, not just IOPs or activity, or bandwidth, queues, or response time. Also, keep in mind that metrics that matter for your environment may be different from those for somebody else.

Something to keep in perspective is that there can be a large amount of data with low performance, or a small amount of data with high-performance, not to mention many other variations. The important concept is that as space capacity scales, that does not mean performance also improves or vice versa, after all, everything is not the same.

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. However all applications have some element (high or low) of performance, availability, capacity, economic (PACE) along with various similarities. Likewise data has different value at various times. Continue reading the next post (Part II Application Data Availability Everything Is Not The Same) in this five-part mini-series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Availability 4 3 2 1 Data Protection

Application Data Availability 4 3 2 1 Data Protection

4 3 2 1 data protection Application Data Availability Everything Is Not The Same

Application Data Availability 4 3 2 1 Data Protection

This is part two of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we continue looking at application performance, availability, capacity, economic (PACE) attributes that have an impact on data value as well as availability.

4 3 2 1 data protection  Book SDDC

Availability (Accessibility, Durability, Consistency)

Just as there are many different aspects and focus areas for performance, there are also several facets to availability. Note that applications performance requires availability and availability relies on some level of performance.

Availability is a broad and encompassing area that includes data protection to protect, preserve, and serve (backup/restore, archive, BC, BR, DR, HA) data and applications. There are logical and physical aspects of availability including data protection as well as security including key management (manage your keys or authentication and certificates) and permissions, among other things.

Availability = accessibility (can you get to your application and data) + durability (is the data intact and consistent). This includes basic Reliability, Availability, Serviceability (RAS), as well as high availability, accessibility, and durability. “Durable” has multiple meanings, so context is important. Durable means how data infrastructure resources hold up to, survive, and tolerate wear and tear from use (i.e., endurance), for example, Flash SSD or mechanical devices such as Hard Disk Drives (HDDs). Another context for durable refers to data, meaning how many copies in various places.

Server, storage, and I/O network availability topics include:

  • Resiliency and self-healing to tolerate failure or disruption
  • Hardware, software, and services configured for resiliency
  • Accessibility to reach or be reached for handling work
  • Durability and consistency of data to be available for access
  • Protection of data, applications, and assets including security

Additional server I/O and data infrastructure along with storage topics include:

  • Backup/restore, replication, snapshots, sync, and copies
  • Basic Reliability, Availability, Serviceability, HA, fail over, BC, BR, and DR
  • Alternative paths, redundant components, and associated software
  • Applications that are fault-tolerant, resilient, and self-healing
  • Non disruptive upgrades, code (application or software) loads, and activation
  • Immediate data consistency and integrity vs. eventual consistency
  • Virus, malware, and other data corruption or loss prevention

From a data protection standpoint, the fundamental rule or guideline is 4 3 2 1, which means having at least four copies consisting of at least three versions (different points in time), at least two of which are on different systems or storage devices and at least one of those is off-site (on-line, off-line, cloud, or other). There are many variations of the 4 3 2 1 rule shown in the following figure along with approaches on how to manage technology to use. We will go into deeper this subject in later chapters. For now, remember the following.

large version application server storage I/O
4 3 2 1 data protection (via Software Defined Data Infrastructure Essentials)

4    At least four copies of data (or more), Enables durability in case a copy goes bad, deleted, corrupted, failed device, or site.
3    The number (or more) versions of the data to retain, Enables various recovery points in time to restore, resume, restart from.
2    Data located on two or more systems (devices or media/mediums), Enables protection against device, system, server, file system, or other fault/failure.

1    With at least one of those copies being off-premise and not live (isolated from active primary copy), Enables resiliency across sites, as well as space, time, distance gap for protection.

Capacity and Space (What Gets Consumed and Occupied)

In addition to being available and accessible in a timely manner (performance), data (and applications) occupy space. That space is memory in servers, as well as using available consumable processor CPU time along with I/O (performance) including over networks.

Data and applications also consume storage space where they are stored. In addition to basic data space, there is also space consumed for metadata as well as protection copies (and overhead), application settings, logs, and other items. Another aspect of capacity includes network IP ports and addresses, software licenses, server, storage, and network bandwidth or service time.

Server, storage, and I/O network capacity topics include:

  • Consumable time-expiring resources (processor time, I/O, network bandwidth)
  • Network IP and other addresses
  • Physical resources of servers, storage, and I/O networking devices
  • Software licenses based on consumption or number of users
  • Primary and protection copies of data and applications
  • Active and standby data infrastructure resources and sites
  • Data footprint reduction (DFR) tools and techniques for space optimization
  • Policies, quotas, thresholds, limits, and capacity QoS
  • Application and database optimization

DFR includes various techniques, technologies, and tools to reduce the impact or overhead of protecting, preserving, and serving more data for longer periods of time. There are many different approaches to implementing a DFR strategy, since there are various applications and data.

Common DFR techniques and technologies include archiving, backup modernization, copy data management (CDM), clean up, compress, and consolidate, data management, deletion and dedupe, storage tiering, RAID (including parity-based, erasure codes , local reconstruction codes [LRC] , and Reed-Solomon , Ceph Shingled Erasure Code (SHEC ), among others), along with protection configurations along with thin-provisioning, among others.

DFR can be implemented in various complementary locations from row-level compression in database or email to normalized databases, to file systems, operating systems, appliances, and storage systems using various techniques.

Also, keep in mind that not all data is the same; some is sparse, some is dense, some can be compressed or deduped while others cannot. Likewise, some data may not be compressible or dedupable. However, identical copies can be identified with links created to a common copy.

Economics (People, Budgets, Energy and other Constraints)

If one thing in life and technology that is constant is change, then the other constant is concern about economics or costs. There is a cost to enable and maintain a data infrastructure on premise or in the cloud, which exists to protect, preserve, and serve data and information applications.

However, there should also be a benefit to having the data infrastructure to house data and support applications that provide information to users of the services. A common economic focus is what something costs, either as up-front capital expenditure (CapEx) or as an operating expenditure (OpEx) expense, along with recurring fees.

In general, economic considerations include:

  • Budgets (CapEx and OpEx), both up front and in recurring fees
  • Whether you buy, lease, rent, subscribe, or use free and open sources
  • People time needed to integrate and support even free open-source software
  • Costs including hardware, software, services, power, cooling, facilities, tools
  • People time includes base salary, benefits, training and education

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. All applications have some element of performance, availability, capacity, economic (PACE) needs as well as resource demands. There is often a focus around data storage about storage efficiency and utilization which is where data footprint reduction (DFR) techniques, tools, trends and as well as technologies address capacity requirements. However with data storage there is also an expanding focus around storage effectiveness also known as productivity tied to performance, along with availability including 4 3 2 1 data protection. Continue reading the next post (Part III Application Data Characteristics Types Everything Is Not The Same) in this series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Characteristics Types Everything Is Not The Same

Application Data Characteristics Types Everything Is Not The Same

Application Data Characteristics Types Everything Is Not The Same

Application Data Characteristics Types Everything Is Not The Same

This is part three of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we continue looking at application and data characteristics with a focus on different types of data. There is more to data than simply being big data, fast data, big fast or unstructured, structured or semistructured, some of which has been touched on in this series, with more to follow. Note that there is also data in terms of the programs, applications, code, rules, policies as well as configuration settings, metadata along with other items stored.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Various Types of Data

Data types along with characteristics include big data, little data, fast data, and old as well as new data with a different value, life-cycle, volume and velocity. There are data in files and objects that are big representing images, figures, text, binary, structured or unstructured that are software defined by the applications that create, modify and use them.

There are many different types of data and applications to meet various business, organization, or functional needs. Keep in mind that applications are based on programs which consist of algorithms and data structures that define the data, how to use it, as well as how and when to store it. Those data structures define data that will get transformed into information by programs while also being stored in memory and on data stored in various formats.

Just as various applications have different algorithms, they also have different types of data. Even though everything is not the same in all environments, or even how the same applications get used across various organizations, there are some similarities. Even though there are different types of applications and data, there are also some similarities and general characteristics. Keep in mind that information is the result of programs (applications and their algorithms) that process data into something useful or of value.

Data typically has a basic life cycle of:

  • Creation and some activity, including being protected
  • Dormant, followed by either continued activity or going inactive
  • Disposition (delete or remove)

In general, data can be

  • Temporary, ephemeral or transient
  • Dynamic or changing (“hot data”)
  • Active static on-line, near-line, or off-line (“warm-data”)
  • In-active static on-line or off-line (“cold data”)

Data is organized

  • Structured
  • Semi-structured
  • Unstructured

General data characteristics include:

  • Value = From no value to unknown to some or high value
  • Volume = Amount of data, files, objects of a given size
  • Variety = Various types of data (small, big, fast, structured, unstructured)
  • Velocity = Data streams, flows, rates, load, process, access, active or static

The following figure shows how different data has various values over time. Data that has no value today or in the future can be deleted, while data with unknown value can be retained.

Different data with various values over time

Application Data Value across sddc
Data Value Known, Unknown and No Value

General characteristics include the value of the data which in turn determines its performance, availability, capacity, and economic considerations. Also, data can be ephemeral (temporary) or kept for longer periods of time on persistent, non-volatile storage (you do not lose the data when power is turned off). Examples of temporary scratch include work and scratch areas such as where data gets imported into, or exported out of, an application or database.

Data can also be little, big, or big and fast, terms which describe in part the size as well as volume along with the speed or velocity of being created, accessed, and processed. The importance of understanding characteristics of data and how their associated applications use them is to enable effective decision-making about performance, availability, capacity, and economics of data infrastructure resources.

Data Value

There is more to data storage than how much space capacity per cost.

All data has one of three basic values:

  • No value = ephemeral/temp/scratch = Why keep it?
  • Some value = current or emerging future value, which can be low or high = Keep
  • Unknown value = protect until value is unlocked, or no remaining value

In addition to the above basic three, data with some value can also be further subdivided into little value, some value, or high value. Of course, you can keep subdividing into as many more or different categories as needed, after all, everything is not always the same across environments.

Besides data having some value, that value can also change by increasing or decreasing in value over time or even going from unknown to a known value, known to unknown, or to no value. Data with no value can be discarded, if in doubt, make and keep a copy of that data somewhere safe until its value (or lack of value) is fully known and understood.

The importance of understanding the value of data is to enable effective decision-making on where and how to protect, preserve, and cost-effectively store the data. Note that cost-effective does not necessarily mean the cheapest or lowest-cost approach, rather it means the way that aligns with the value and importance of the data at a given point in time.

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software-defined data center (SDDC), software-defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Data has different value at various times, and that value is also evolving. Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. Continue reading the next post (Part IV Application Data Volume Velocity Variety Everything Not The Same) in this series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Volume Velocity Variety Everything Is Not The Same

Application Data Volume Velocity Variety Everything Not The Same

Application Data Volume Velocity Variety Everything Not The Same

This is part four of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we continue looking at application and data characteristics with a focus on data volume velocity and variety, after all, everything is not the same, not to mention many different aspects of big data as well as little data.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Volume of Data

More data is growing at a faster rate every day, and that data is being retained for longer periods. Some data being retained has known value, while a growing amount of data has an unknown value. Data is generated or created from many sources, including mobile devices, social networks, web-connected systems or machines, and sensors including IoT and IoD. Besides where data is created from, there are also many consumers of data (applications) that range from legacy to mobile, cloud, IoT among others.

Unknown-value data may eventually have value in the future when somebody realizes that he can do something with it, or a technology tool or application becomes available to transform the data with unknown value into valuable information.

Some data gets retained in its native or raw form, while other data get processed by application program algorithms into summary data, or is curated and aggregated with other data to be transformed into new useful data. The figure below shows, from left to right and front to back, more data being created, and that data also getting larger over time. For example, on the left are two data items, objects, files, or blocks representing some information.

In the center of the following figure are more columns and rows of data, with each of those data items also becoming larger. Moving farther to the right, there are yet more data items stacked up higher, as well as across and farther back, with those items also being larger. The following figure can represent blocks of storage, files in a file system, rows, and columns in a database or key-value repository, or objects in a cloud or object storage system.

Application Data Value sddc
Increasing data velocity and volume, more data and data getting larger

In addition to more data being created, some of that data is relatively small in terms of the records or data structure entities being stored. However, there can be a large quantity of those smaller data items. In addition to the amount of data, as well as the size of the data, protection or overhead copies of data are also kept.

Another dimension is that data is also getting larger where the data structures describing a piece of data for an application have increased in size. For example, a still photograph was taken with a digital camera, cell phone, or another mobile handheld device, drone, or other IoT device, increases in size with each new generation of cameras as there are more megapixels.

Variety of Data

In addition to having value and volume, there are also different varieties of data, including ephemeral (temporary), persistent, primary, metadata, structured, semi-structured, unstructured, little, and big data. Keep in mind that programs, applications, tools, and utilities get stored as data, while they also use, create, access, and manage data.

There is also primary data and metadata, or data about data, as well as system data that is also sometimes referred to as metadata. Here is where context comes into play as part of tradecraft, as there can be metadata describing data being used by programs, as well as metadata about systems, applications, file systems, databases, and storage systems, among other things, including little and big data.

Context also matters regarding big data, as there are applications such as statistical analysis software and Hadoop, among others, for processing (analyzing) large amounts of data. The data being processed may not be big regarding the records or data entity items, but there may be a large volume. In addition to big data analytics, data, and applications, there is also data that is very big (as well as large volumes or collections of data sets).

For example, video and audio, among others, may also be referred to as big fast data, or large data. A challenge with larger data items is the complexity of moving over the distance promptly, as well as processing requiring new approaches, algorithms, data structures, and storage management techniques.

Likewise, the challenges with large volumes of smaller data are similar in that data needs to be moved, protected, preserved, and served cost-effectively for long periods of time. Both large and small data are stored (in memory or storage) in various types of data repositories.

In general, data in repositories is accessed locally, remotely, or via a cloud using:

  • Object and blobs stream, queue, and Application Programming Interface (API)
  • File-based using local or networked file systems
  • Block-based access of disk partitions, LUNs (logical unit numbers), or volumes

The following figure shows varieties of application data value including (left) photos or images, audio, videos, and various log, event, and telemetry data, as well as (right) sparse and dense data.

Application Data Value bits bytes blocks blobs bitstreams sddc
Varieties of data (bits, bytes, blocks, blobs, and bitstreams)

Velocity of Data

Data, in addition to having value (known, unknown, or none), volume (size and quantity), and variety (structured, unstructured, semi structured, primary, metadata, small, big), also has velocity. Velocity refers to how fast (or slowly) data is accessed, including being stored, retrieved, updated, scanned, or if it is active (updated, or fixed static) or dormant and inactive. In addition to data access and life cycle, velocity also refers to how data is used, such as random or sequential or some combination. Think of data velocity as how data, or streams of data, flow in various ways.

Velocity also describes how data is used and accessed, including:

  • Active (hot), static (warm and WORM), or dormant (cold)
  • Random or sequential, read or write-accessed
  • Real-time (online, synchronous) or time-delayed

Why this matters is that by understanding and knowing how applications use data, or how data is accessed via applications, you can make informed decisions. Also, having insight enables how to design, configure, and manage servers, storage, and I/O resources (hardware, software, services) to meet various needs. Understanding Application Data Value including the velocity of the data both for when it is created as well as when used is important for aligning the applicable performance techniques and technologies.

Where to learn more

Learn more about Application Data Value, application characteristics, performance, availability, capacity, economic (PACE) along with data protection, software-defined data center (SDDC), software-defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Data has different value, size, as well as velocity as part of its characteristic including how used by various applications. Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. Continue reading the next post (Part V Application Data Access life cycle Patterns Everything Is Not The Same) in this series here.

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Application Data Access Lifecycle Patterns Everything Is Not The Same

Application Data Access Life cycle Patterns Everything Is Not The Same(Part V)

Application Data Access Life cycle Patterns Everything Is Not The Same

Application Data Access Life cycle Patterns Everything Is Not The Same

This is part five of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from chapter 2 of my new book Software Defined Data Infrastructure Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In this post, we look at various application and data lifecycle patterns as well as wrap up this series.

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

Active (Hot), Static (Warm and WORM), or Dormant (Cold) Data and Lifecycles

When it comes to Application Data Value, a common question I hear is why not keep all data?

If the data has value, and you have a large enough budget, why not? On the other hand, most organizations have a budget and other constraints that determine how much and what data to retain.

Another common question I get asked (or told) it isn’t the objective to keep less data to cut costs?

If the data has no value, then get rid of it. On the other hand, if data has value or unknown value, then find ways to remove the cost of keeping more data for longer periods of time so its value can be realized.

In general, the data life cycle (called by some cradle to grave, birth or creation to disposition) is created, save and store, perhaps update and read with changing access patterns over time, along with value. During that time, the data (which includes applications and their settings) will be protected with copies or some other technique, and eventually disposed of.

Between the time when data is created and when it is disposed of, there are many variations of what gets done and needs to be done. Considering static data for a moment, some applications and their data, or data and their applications, create data which is for a short period, then goes dormant, then is active again briefly before going cold (see the left side of the following figure). This is a classic application, data, and information life-cycle model (ILM), and tiering or data movement and migration that still applies for some scenarios.

Application Data Value
Changing data access patterns for different applications

However, a newer scenario over the past several years that continues to increase is shown on the right side of the above figure. In this scenario, data is initially active for updates, then goes cold or WORM (Write Once/Read Many); however, it warms back up as a static reference, on the web, as big data, and for other uses where it is used to create new data and information.

Data, in addition to its other attributes already mentioned, can be active (hot), residing in a memory cache, buffers inside a server, or on a fast storage appliance or caching appliance. Hot data means that it is actively being used for reads or writes (this is what the term Heat map pertains to in the context of the server, storage data, and applications. The heat map shows where the hot or active data is along with its other characteristics.

Context is important here, as there are also IT facilities heat maps, which refer to physical facilities including what servers are consuming power and generating heat. Note that some current and emerging data center infrastructure management (DCIM) tools can correlate the physical facilities power, cooling, and heat to actual work being done from an applications perspective. This correlated or converged management view enables more granular analysis and effective decision-making on how to best utilize data infrastructure resources.

In addition to being hot or active, data can be warm (not as heavily accessed) or cold (rarely if ever accessed), as well as online, near-line, or off-line. As their names imply, warm data may occasionally be used, either updated and written, or static and just being read. Some data also gets protected as WORM data using hardware or software technologies. WORM (immutable) data, not to be confused with warm data, is fixed or immutable (cannot be changed).

When looking at data (or storage), it is important to see when the data was created as well as when it was modified. However, you should avoid the mistake of looking only at when it was created or modified: Instead, also look to see when it was the last read, as well as how often it is read. You might find that some data has not been updated for several years, but it is still accessed several times an hour or minute. Also, keep in mind that the metadata about the actual data may be being updated, even while the data itself is static.

Also, look at your applications characteristics as well as how data gets used, to see if it is conducive to caching or automated tiering based on activity, events, or time. For example, there is a large amount of data for an energy or oil exploration project that normally sits on slower lower-cost storage, but that now and then some analysis needs to run on.

Using data and storage management tools, given notice or based on activity, which large or big data could be promoted to faster storage, or applications migrated to be closer to the data to speed up processing. Another example is weekly, monthly, quarterly, or year-end processing of financial, accounting, payroll, inventory, or enterprise resource planning (ERP) schedules. Knowing how and when the applications use the data, which is also understanding the data, automated tools, and policies, can be used to tier or cache data to speed up processing and thereby boost productivity.

All applications have performance, availability, capacity, economic (PACE) attributes, however:

  • PACE attributes vary by Application Data Value and usage
  • Some applications and their data are more active than others
  • PACE characteristics may vary within different parts of an application
  • PACE application and data characteristics along with value change over time

Read more about Application Data Value, PACE and application characteristics in Software Defined Data Infrastructure Essentials (CRC Press 2017).

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI) and related topics via the following links:

SDDC Data Infrastructure

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that Application Data Value everything is not the same across various organizations, data centers, data infrastructures, data and the applications that use them.

Also keep in mind that there is more data being created, the size of those data items, files, objects, entities, records are also increasing, as well as the speed at which they get created and accessed. The challenge is not just that there is more data, or data is bigger, or accessed faster, it’s all of those along with changing value as well as diverse applications to keep in perspective. With new Global Data Protection Regulations (GDPR) going into effect May 25, 2018, now is a good time to assess and gain insight into what data you have, its value, retention as well as disposition policies.

Remember, there are different data types, value, life-cycle, volume and velocity that change over time, and with Application Data Value Everything Is Not The Same, so why treat and manage everything the same?

Ok, nuff said, for now.

Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2017 (vSAN and vCloud). Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

Garbage data in, garbage information out, big data or big garbage?

StorageIO industry trends cloud, virtualization and big data

Do you know the computer technology saying, garbage data in results in garbage information out?

In other words even with the best algorithms and hardware, bad, junk or garbage data put in results in garbage information delivered. Of course, you might have data analysis and cleaning software to look for, find and remove bad or garbage data, however that’s for a different post on another day.

If garbage data in results in garbage information out, does garbage big data in result in big garbage out?

I’m sure my sales and marketing friends or their surrogates will jump at the opportunity to tell me why and how big data is the solution to the decades old garbage data in problem.

Likewise they will probably tell me big data is the solution to problems that have not even occurred or been discovered yet, yeah right.

However garbage data does not discriminate or show preference towards big data or little data, in fact it can infiltrate all types of data and systems.

Lets shift gears from big and little data to how all of that information is protected, backed up, replicated, copied for HA, BC, DR, compliance, regulatory or other reasons. I wonder how much garbage data is really out there and many garbage backups, snapshots, replication or other copies of data exist? Sounds like a good reason to modernize data protection.

If we don’t know where the garbage data is, how can we know if there is a garbage copy of the data for protection on some other tape, disk or cloud. That also means plenty of garbage data to compact (e.g. compress and dedupe) to cut its data footprint impact particular with tough economic times.

Does this mean then that the cloud is the new destination for garbage data in different shapes or forms, from online primary to back up and archive?

Does that then make the cloud the new virtual garbage dump for big and little data?

Hmm, I think I need to empty my desktop trash bin and email deleted items among other digital house keeping chores now.

On the other hand, just had a thought about orphaned data and orphaned storage, however lets leave those sleeping dogs lay where they rest for now.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO and UnlimitedIO All Rights Reserved

The blame game: Does cloud storage result in data loss?

I recently came across a piece by Carl Brooks over at IT Tech News Daily that caught my eye, title was Cloud Storage Often Results in Data Loss. The piece has an effective title (good for search engine: SEO optimization) as it stood out from many others I saw on that particular day.

Industry Trend: Cloud storage

What caught my eye on Carls piece is that it reads as if the facts based on a quick survey point to clouds resulting in data loss, as opposed to being an opinion that some cloud usage can result in data loss.

Data loss

My opinion is that if not used properly including ignoring best practices, any form of data storage medium or media could result or be blamed for data loss. For some people they have lost data as a result of using cloud storage services just as other people have lost data or access to information on other storage mediums and solutions. For example, data has been lost on tape, Hard Disk Drives (HDDs), Solid State Devices (SSD), Hybrid HDDs (HHDD), RAID and non RAID, local and remote and even optical based storage systems large and small. In some cases, there have been errors or problems with the medium or media, in other cases storage systems have lost access to, or lost data due to hardware, firmware, software, or configuration including due to human error among other issues.

Data loss

Technology failure: Not if, rather when and how to decrease impact
Any technology regardless of what it is or who it is from along with its architecture design and implementation can fail. It is not if, rather when and how gracefully along with what safeguards to decrease the impact, in addition to containing or isolating faults differentiates various products or solutions. How they automatically repair and self heal to keep running or support accessibility and maintain data integrity are important as is how those options are used. Granted a failure may not be technology related per say, rather something associated with human intervention, configuration, change management (or lack thereof) along with accidental or intentional activities.

Walking the talk
I have used public cloud storage services for several years including SaaS and AaaS as well as IaaS (See more XaaS here) and knock on wood, have not lost any data yet, loss of access sure, however not data being lost.

I follow my advice and best practices when selecting cloud providers looking for good value, service level agreements (SLAs) and service level objectives (SLOs) over low cost or for free services.

In the several years of using cloud based storage and services there has been some loss of access, however no loss of data. Those service disruptions or loss of access to data and services ranged from a few minutes to a little over an hour. In those scenarios, if I could not have waited for cloud storage to become accessible, I could have accessed a local copy if it were available.

Had a major disruption occurred where it would have been several days before I could gain access to that information, or if it were actually lost, I have a data insurance policy. That data insurance policy is part of my business continuance (BC) and disaster recovery (DR) strategy. My BC and DR strategy is a multi layered approach combining local, offline and offsite as along with online cloud data protection and archiving.

Assuming my cloud storage service could get data back to a given point (RPO) in a given amount of time (RTO), I have some options. One option is to wait for the service or information to become available again assuming a local copy is no longer valid or available. Another option is to start restoration from a master gold copy and then roll forward changes from the cloud services as that information becomes available. In other words, I am using cloud storage as another resource that is for both protecting what is local, as well as complimenting how I locally protect things.

Minimize or cut data loss or loss of access
Anything important should be protected locally and remotely meaning leveraging cloud and a master or gold backup copy.

To cut the cost of protecting information, I also leverage archives, which mean not all data gets protected the same. Important data is protected more often reducing RPO exposure and speed up RTO during restoration. Other data that is not as important is protected, however on a different frequency with other retention cycles, in other words, tiered data protection. By implementing tiered data protection, best practices, and various technologies including data footprint reduction (DFR) such as archive, compression, dedupe in addition to local disk to disk (D2D), disk to disk to cloud (D2D2C), along with routine copies to offline media (removable HDDs or RHDDs) that go offsite,  Im able to stretch my data protection budget further. Not only is my data protection budget stretched further, I have more options to speed up RTO and better detail for recovery and enhanced RPOs.

If you are looking to avoid losing data, or loss of access, it is a simple equation in no particular order:

  • Strategy and design
  • Best practices and processes
  • Various technologies
  • Quality products
  • Robust service delivery
  • Configuration and implementation
  • SLO and SLA management metrics
  • People skill set and knowledge
  • Usage guidelines or terms of service (ToS)

Unfortunately, clouds like other technologies or solutions get a bad reputation or blamed when something goes wrong. Sometimes it is the technology or service that fails, other times it is a combination of errors that resulted in loss of access or lost data. With clouds as has been the case with other storage mediums and systems in the past, when something goes wrong and if it has been hyped, chances are it will become a target for blame or finger pointing vs. determining what went wrong so that it does not occur again. For example cloud storage has been hyped as easy to use, don’t worry, just put your data there, you can get out of the business of managing storage as the cloud will do that magically for you behind the scenes.

The reality is that while cloud storage solutions can offload functions, someone is still responsible for making decisions on its usage and configuration that impact availability. What separates various providers is their ability to design in best practices, isolate and contain faults quickly, have resiliency integrated as part of a solution along with various SLAs aligned to what the service level you are expecting in an easy to use manner.

Does that mean the more you pay the more reliable and resilient a solution should be?
No, not necessarily, as there can still be risks including how the solution is used.

Does that mean low cost or for free solutions have the most risk?
No, not necessarily as it comes down to how you use or design around those options. In other words, while cloud storage services remove or mask complexity, it still comes down to how you are going to use a given service.

Shared responsibility for cloud (and non cloud) storage data protection
Anything important enough that you cannot afford to lose, or have quick access to should be protected in different locations and on various mediums. In other words, balance your risk. Cloud storage service provider toned to take responsibility to meet service expectations for a given SLA and SLOs that you agree to pay for (unless free).

As the customer you have the responsibility of following best practices supplied by the service provider including reading the ToS. Part of the responsibility as a customer or consumer is to understand what are the ToS, SLA and SLOs for a given level of service that you are using. As a customer or consumer, this means doing your homework to be ready as a smart educated buyer or consumer of cloud storage services.

If you are a vendor or value added reseller (VAR), your opportunity is to help customers with the acquisition process to make informed decision. For VARs and solution providers, this can mean up selling customers to a higher level of service by making them aware of the risk and reward benefits as opposed to focus on cost. After all, if a order taker at McDonalds can ask Would you like to super size your order, why cant you as a vendor or solution provider also have a value oriented up sell message.

Additional related links to read more and sources of information:

Choosing the Right Local/Cloud Hybrid Backup for SMBs
E2E Awareness and insight for IT environments
Poll: What Do You Think of IT Clouds?
Convergence: People, Processes, Policies and Products
What do VARs and Clouds as well as MSPs have in common?
Industry adoption vs. industry deployment, is there a difference?
Cloud conversations: Loss of data access vs. data loss
Clouds and Data Loss: Time for CDP (Commonsense Data Protection)?
Clouds are like Electricity: Dont be scared
Wit and wisdom for BC and DR
Criteria for choosing the right business continuity or disaster recovery consultant
Local and Cloud Hybrid Backup for SMBs
Is cloud disaster recovery appropriate for SMBs?
Laptop data protection: A major headache with many cures
Disaster recovery in the cloud explained
Backup in the cloud: Large enterprises wary, others climbing on board
Cloud and Virtual Data Storage Networking (CRC Press, 2011)
Enterprise Systems Backup and Recovery: A Corporate Insurance Policy

Poll:  Who is responsible for cloud storage data loss?

Taking action, what you should (or not) do
Dont be scared of clouds, however do your homework, be ready, look before you leap and follow best practices. Look into the service level agreements (SLAs) associated with a given cloud storage product or service. Follow best practices about how you or someone else will protect what data is put into the cloud.

For critical data or information, consider having a copy of that data in the cloud as well as at or in another place, which could be in a different cloud or local or offsite and offline. Keep in mind the theme for critical information and data is not if, rather when so what can be done to decrease the risk or impact of something happening, in other words, be ready.

Data put into the cloud can be lost, or, loss of access to it can occur for some amount of time just as happens with using non cloud storage such as tape, disk or ssd. What impacts or minimizes your risk of using traditional local or remote as well as cloud storage are the best practices, how configured, protected, secured and managed. Another consideration is the type and quality of the storage product or cloud service can have a big impact. Sure, a quality product or service can fail; however, you can also design and configure to decrease those impacts.

Wrap up
Bottom line, do not be scared of cloud storage, however be ready, do your homework, review best practices, understand benefits and caveats, risk and reward. For those who want to learn more about cloud storage (public, private and hybrid) along with data protection, data management, data footprint reduction among other related topics and best practices, I happen to know of some good resources. Those resources in addition to the links provided above are titled Cloud and Virtual Data Storage Networking (CRC Press) that you can learn more about here as well as find at Amazon among other venues. Also, check out Enterprise Systems Backup and Recovery: A Corporate Insurance Policy by Preston De Guise (aka twitter @backupbear ) which is a great resource for protecting data.

Ok, nuff said for now

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

Supporting IT growth demand during economic uncertain times

Doing more with less, doing more with what you have or reducing cost have been the mantra for the past several years now.

Does that mean as a trend, they are being adopted as the new way of doing business, or simply a cycle or temporary situation?

Reality is that many if not most IT organizations are and will remain under pressure to stretch their budgets further for the immediate future. Over the past year or two some organizations saw increases in their budgets however also increased demand while others saw budgets fixed or reduced while having to support growth. On the other hand, there is no such thing as an information recession with more data being generated, moved, processed, stored and retained for longer periods of time.

Industry trend: No such thing as a data recession

Something has to give as shown in the following figure which is that on one curve there is continued demand and growth, while another curve shows need to reduce costs while another reflects the importance of maintaining or enhancing service level objectives (SLOs) and quality of service (QoS).

Enable growth while removing complexity and cost without compromising service levels

One way to reduce costs is to inhibit growth while another is to support growth by sacrificing QoS including performance, response time or availability as a result of over consolidation, excessive utilization or instability as a result of stretching resources to far. Where innovation comes into play is finding and fixing problems vs. moving or masking them or treating symptoms vs. the real issue and challenge. Innovation also comes into play by identifying both near term tactical as well as longer term strategic means of taking complexity and cost out of service delivery and the resources needed to support them. For example determining the different resources and processes involved in delivering an email box of a given size and reliability. Another being supporting a virtual machine (VM) with a given performance and capacity capability. Yet another scenario is a file share or home directory of a specific size and availability. By streamlining work flows, leveraging automation and other tools to enforce polices as well as adopting new best practices complexity and thereby costs can be reduced. The net rest is a lower cost to provide a given service to a specific level which when multiplied out over many users or instances, results in cost savings however also productivity gains.

The above is all good and well for longer term strategic and where you want to go or get to, however what can be done right now today?

Here are a few tips to do more with what you have while supporting growth demands

If you have service level agreements (SLAs) and SLOs as part of your service category, review with your users as to what they need vs. what they would like to have. What you may find is that your users want or expect a given level of service, yet would be happy and ok with moving to a cloud service that had lower SLO and SLA expectations if lower cost. The previous scenario would be an indicator that you users want and thus you give them a higher level of service, yet their requirements are actually lower than what is expected. On the other hand if you do not have SLOs and SLAs aligned with cost for the services then set them up and review customer or client expectations, needs vs. wants on a regular basis. You might find out that you can stretch your budget by delivering a lower (or higher) class of services to meet different users requirements than what was assumed to be the case. In the case of supporting a better class of service, if you can use an SSD enabled solution to reduce latency or wait times and boost productivity, more transactions or page views or revenue per hour, that could prompt a client to request that capability to meet their business needs.

Reduce your data footprint impact in order to support growth using the ABCDs of data footprint reduction (DFR), that is Archive (email, file, database), Backup modernization, Compression and consolidation, Data management and dedupe, storage tiering among other techniques.

Storage, server virtualization and optimization using capacity consolidation where practical and IO consolidation to fast storage and SSD where possible. Also review storage configuration including RAID and allocation to identity if any relatively easy changes can improve performance, availability, capacity and energy impact.

Investigate available upgrades and enhancements to your existing hardware, software and services that can be applied to provide breathing room within current budgets while evaluating new technologies.

Find and fix problems vs. chasing false positives that provide near term relief only to have the real issue reappear. Maximize your budgets by identifying where people time and other resources are being spent due to processes, work flows, technology configuration complexity or bottlenecks and address those.

Enhance and leverage existing management measurements to gain more insight along with implementing new metrics for End to End (E2E) situational awareness of your environment which will enable effective decision making. For example you may be told to move some function to the cloud as it will be cheaper, yet if you do not have metrics to indicate one way or the other, how can that be an informed decision? If you have metrics that show your cost for the same service being moved to a cloud or managed service provider as well as QoS, SLO, SLA, RTO, RPO and other TLAs, then you can make informed decisions. That decision may still be to move functions to a cloud or other service even if in fact it is more expensive compared to what you can provide it for in order that your resources can be directed to supporting other important internal functions.

Look for ways to reduce cost of a service delivered as opposed to simply cutting costs. They sound like one and the same, however if you have metrics and measurements providing situational awareness to know what the cost of a service is, you can also then look at how to streamline those services, remove complexity, reduce workflow, leverage automation there by removing cost. The goal is the same, however how you go about removing cost can have an impact on your return on innovation not to mention customer satisfaction.

Also be an informed shopper, have a forecast or plan on what you will need and when, along with what you must have (core requirements) vs. what you would like to have or want. When looking at options, balance what is needed and then if you can get what you want or would like for little or no extra cost if they add value or enable other initiatives. Part of being an informed shopper is having support of the business to be able to procure what you want or need which means aligning technology resources and their cost to delivery of business functions and services.

What you need vs. what you want
In a recent interview with the associated press (AP) the reporter wanted to know my comments about spending vs. saving during economic tough times (you can read the story here). Basically my comments were to spend within your means by identifying what you need vs. what you want, what is required to keep the business running or improve productivity and remove cost as opposed to acquiring nice to have things that can wait. Sure I would like to have a new 85 to 120" 3D monitor for my workstation that could double as a TV, however I do not need or require it.

On the other hand, I recently upgraded an existing workstation adding a Hybrid Hard Disk Drive (HHDD) and some additional memory, about a $200USD investment that is already paying for itself via increased productivity. That is instead of enjoying a cup of dunkin donut coffee while waiting for some tasks to complete on that system, Im able to get more done in a given amount of time boosting productivity.

For IT environments this means looking at expenditures to determine what is needed or required to keep things running while supporting near term strategic and tactical initiatives or pet projects.

For vendors and vars, if things have not been a challenge yet, now they will need to refine their messages to show more value, return on innovation (ROI) in terms of how to help their customers or prospects stretch resources (budgets, people, skill sets, products, services, licenses, power and cooling, floor space) further to support growth, while removing costs without compromising on service delivery. This also means a shift in thinking of short term or tactical cost cutting to longer term strategic approaches of reducing costs to deliver a service or resources.

Related links pertaining to stretching your resources, doing more with what you have, increasing productivity and maximizing your budget to support growth without compromising on customer service.

Saving Money with Green IT: Time To Invest In Information Factories
Storage Efficiency and Optimization – The Other Green
Shifting from energy avoidance to energy efficiency
Saving Money with Green Data Storage Technology
Green IT Confusion Continues, Opportunities Missed!
Storage Efficiency and Optimization – The Other Green
PUE, Are you Managing Power, Energy or Productivity?
Cloud and Virtual Data Storage Networking
Is There a Data and I/O Activity Recession?
More Data Footprint Reduction (DFR) Material

What is your take?

Are you and your company going into a spending freeze mode, or are you still spending, however placing or having constraints put on discretionary spending?

How are you stretching your IT budget to go further?

 

Ok, nuff said for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

Industry trend: People plus data are aging and living longer

Lets face it, people and information are living longer and thus there are more of each along with a strong interdependency by both.

People living and data being retained longer should not be a surprise, take a step back and look at the bigger picture. There is no such thing as an information recession with more data being generated, processed, moved and stored for longer periods of time not to mention that a data object is also getting larger.

Industry trend and performance

By data objects getting larger, think about a digital photo taken on a typical camera ten years ago which whose resolution was lower and thus its file size would have been measured in kilo bytes (thousands). Today megapixel resolutions are common from cell phones, smart phones, PDAs and even larger with more robust digital and high definition (HD) still and video cameras. This means that a photo of the same object that resulted in a file of hundreds of Kbytes ten years ago would be measured in Megabytes today. With three dimensional (3D) cameras appearing along with higher resolution, you do not need to be a rocket scientist or industry pundit to figure out what that growth trend trajectory looks like.

However it is not just the size of the data that is getting larger, there are also more instances along with copies of those files, photos, videos and other objects being created, stored and retained. Similar to data, there are more people now than ten years ago and some of those have also grown larger, or at least around the waistline. This means that more people are creating and relying on larger amounts of information being available or accessible when and where needed. As people grow older, the amount of data that they generate will naturally increase as will the information that they consume and rely upon.

Where things get interesting is that looking back in history, that is more than ten or even a hundred years, the trend is that there are more people, they are living longer, and they are generating larger amounts of data that is taking on new value or meaning. Heck you can even go back from hundreds to thousands of years and see early forms of data archiving and storage with drawings on walls of caves or other venues. I Wonder if had the cost (and ease of use) to store and keep data had been lower back than would there have been more information saved? Or was it a case of being too difficult to use the then state of art data and information storage medium combined with limited capacities so they simply ran out of storage and retention mediums (e.g. walls and ceilings)?

Lets come back to the current for a moment which is another trend of data that in the past would have been kept offline or best case near line due to cost and limits or constraints are finding their way online either in public or private venues (or clouds if you prefer).

Thus the trend of expanding data life cycles with some types of data being kept online or readily accessible as its value is discovered.

Evolving data life cycle and access patterns

Here is an easy test, think of something that you may have googled or searched for a year or two ago that either could not be found or was very difficult to find. Now take that same search or topic query and see if anything appears and if it does, how many instances of it appear. Now make a note to do the same test again in a year or even six months and compare the results.

Now back to the future however with an eye to the past and things get even more interesting in that some researchers are saying that in centuries to come, we should expect to see more people not only living into their hundreds, however even longer. This follows the trend of the average life expectancy of people continues to increase over decades and centuries.

What if people start to live hundreds of years or even longer, what about the information they will generate and rely upon and its later life cycle or span?

More information and data

Here is a link to a post where a researcher sees that very far down the road, people could live to be a thousand years old which brings up the question, what about all the data they generate and rely upon during their lifetime.

Ok, now back to the 21st century and it is safe to say that there will be more data and information to process, move, store and keep for longer periods of time in a cost effective way. This means applying data footprint reduction (DFR) such as archiving, backup and data protection modernization, compression, consolidation where possible, dedupe and data management including deletion where applicable along with other techniques and technologies combined with best practices.

Will you out live your data, or will your data survive you?

These are among other things to ponder while you enjoy your summer (northern hemisphere) vacation sitting on a beach or pool side enjoying a cool beverage perhaps gazing at the passing clouds reflecting on all things great and small.

Clouds: Dont be scared, however look before you leap and be prepared

Ok, nuff said for now.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

Using Removable Hard Disk Drives (RHDDs)

Removable hard disk drives (RHDD) are a form of removable media which includes magnetic tape that address many common use cases. Usage scenarios include enabling bulk data portability for larger environments or for D2D backup where the media needs to be physically moved offsite for small and mid sized environments. RHDDs include among others those from Imation such as the Odyssey (which is what I use) and the Prostor RDX (OEMed by Imation and others). RHDD, tape along with other forms of portable media including those that use flash by being removable and portable presumable should have some extra packaging protection to safeguard against static shock in addition to supporting encryption capabilities.

Compared to disks including RHDD, tape for most and particularly larger environments should have an overall lower media cost for parking, preserving and when needed serving inactive or archived data (e.g. the changing roll of tape from day to back up to archive). Of course your real costs will vary by use in addition to how combined with data footprint reduction and other technologies.

A big benefit of RHDDs is that they are random meaning data can be searched and found quickly vs. tape media which has great sequential or streaming capabilities if you have a system that can support that ability. The other benefit of RHDD is that depending on their implementation, they should plug and play with your systems appearing as disk without any extra drivers or configuration or software tools making for ease of use. Being removable they can be used for portability such as sending data to a cloud or MSP as part of an initial bulk copy, or sending data offset or taking home as part of an offsite backup, data protection or BC/DR strategy as well as being used for archiving. The warning with RHDD is their cost per TByte will generally be higher than compared to tape as well as having to have a docking station or specific drive interface depending on specific product and configuration.

RHDD are a great compliment to traditional fixed or non removable disk, Hybrid Hard Disk Drive (HHDD) and Solid State Device (SSD) based storage as well as coexist with cloud or MSP backup and archive solutions. The smaller the environment the more affordable using RHDD become vs. tape for backup and archive operations or when portability is required. Even if using a cloud or managed service provider (MSP) backup provider, network bandwidth costs, availability or performance may limit the amount of data that can be moved in a cost effective way. For example placing an archive and gold or master copy of your static data on a RHDD that may be kept on site in a safe off-site place and then sending data that is routinely changed to the cloud or MSP provider (think full local and offsite plus partial full and incremental in the cloud).

By leveraging archiving and data footprint reduction (DFR) techniques including dedupe and compression, you can stretch your budget by sending less data to cloud or MSP services while using removable media for data protection. You would be surprised how many TBytes of data can be kept in a safe deposit box. For my own business, I have used RHDDs for several years to keep gold master copies as well as archives offsite as part of a disk to disk (D2D) or D2D2RHDD strategy. The data protection strategy is also complimented by sending active data to a cloud backup MSP (encrypted of course). It might be belt and suspenders, however it is also eating my own dog food practicing what I talk about and the approach has proven itself a few times.

Here are some related links to more material:
Removable disk drives vs. tape storage for small businesses
The pros and cons of removable disk storage for small businesses
Removable storage media appealing to SMBs, but with caveats
StorageIO Momentus Hybrid Hard Disk Drive (HHDD) Moments

Ok, nuff said for now

Cheers Gs

Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2011 StorageIO and UnlimitedIO All Rights Reserved

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

What is DFR or Data Footprint Reduction?

Updated 10/9/2018

What is DFR or Data Footprint Reduction?

Data Footprint Reduction (DFR) is a collection of techniques, technologies, tools and best practices that are used to address data growth management challenges. Dedupe is currently the industry darling for DFR particularly in the scope or context of backup or other repetitive data.

However DFR expands the scope of expanding data footprints and their impact to cover primary, secondary along with offline data that ranges from high performance to inactive high capacity.

Consequently the focus of DFR is not just on reduction ratios, its also about meeting time or performance rates and data protection windows.

This means DFR is about using the right tool for the task at hand to effectively meet business needs, and cost objectives while meeting service requirements across all applications.

Examples of DFR technologies include Archiving, Compression, Dedupe, Data Management and Thin Provisioning among others.

Read more about DFR in Part I and Part II of a two part series found here and here.

Where to learn more

Learn more about data footprint reducton (DFR), data footprint overhead and related topics via the following links:

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

Software Defined Data Infrastructure Essentials Book SDDC

What this all means

That is all for now, hope you find these ongoing series of current or emerging Industry Trends and Perspectives posts of interest.

Ok, nuff said, for now.

Cheers Gs

Greg Schulz – Microsoft MVP Cloud and Data Center Management, VMware vExpert 2010-2018. Author of Software Defined Data Infrastructure Essentials (CRC Press), as well as Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press), Resilient Storage Networks (Elsevier) and twitter @storageio. Courteous comments are welcome for consideration. First published on https://storageioblog.com any reproduction in whole, in part, with changes to content, without source attribution under title or without permission is forbidden.

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO. All Rights Reserved. StorageIO is a registered Trade Mark (TM) of Server StorageIO.

March Metric Madness: Fun with Simple Math

Its March and besides being spring in north America, it also means tournament season including the NCAA basket ball series among others known as March Madness.

Given the office pools and other forms of playing with numbers tied to the tournaments and real or virtual money, here is a quick timeout looking at some fun with math.

The fun is showing how simple math can be used to show relative growth for IT resources such as data storage. For example, say that you have 10Tbytes of storage or data and that it is growing at only 10 percent per year, in five years with simple math yields 14.6Tbytes.

Now lets assume growth rate is 50 percent per year and in the course of five years, instead of having 10Tbytes, that now jumps to 50.6Tbytes. If you have 100Tbytes today and at 50 percent growth rate, that would yield 506.3 Tbytes or about half of a petabyte in 5 years. If by chance you have say 1Pbyte or 1,000Tbytes today, at 25% year of year growth you would have 2.44Pbytes in 5 years.
Basic Storage Forecast
Figure 1 Fun with simple math and projected growth rates

Granted this is simple math showing basic examples however the point is that depending on your growth rate and amount of either current data or storage, you might be surprised at the forecast or projected needs in only five years.

In a nutshell, these are examples of very basic primitive capacity forecasts that would vary by other factors including if the data is 10Tbytes and your policies is for 25 percent free space, that would require even more storage than the base amount. Go with a different RAID level, some extra space for replication, snapshots, disk to disk backups and replication not to mention test development and those numbers go up even higher.

Sure those amounts can be offset with thin provisioning, dedupe, archiving, compression and other forms of data footprint reduction, however the point here is to realize how simple math can portray a very basic forecast and picture of growth.

Read more about performance and capacity in Chapter 10 – Performance and capacity planning for storage networks – Resilient Storage Networks (Elsevier) as well as at www.cmg.org (Computer Measurement Group)..

And that is all I have to say about this for now, enjoy March madness and fun with numbers.

Ok, nuff said.

Cheers gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press), The Green and Virtual Data Center (CRC Press) and Resilient Storage Networks (Elsevier)
twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2024 Server StorageIO and UnlimitedIO LLC All Rights Reserved

Missing Dedupe Debate Detail!

Storage I/O trends

The de-dupe vendors like to debate details of their solutions, ranging from compression or de-dupe ratios, to hashing and caching algorithms, to processor vs. disk vs. memory, to in-band vs. out-of-band, pre or post processing among other items. At times the dedupe debates can get more lively than a political debate or even the legendary storage virtualization debates of yester year.

However one item that an IT professional recently mentioned that is not being addressed or talked about during the de-dupe debates is how IT customers will get around vendor lock-in. Never mind the usual lock-in debates of whose back-end storage or disk drives, whose server a de-dupe appliance software runs and so forth.

The real concern is how data in the future will be recoverable from a de-dupe solution similar to how data can be recovered from tape today. Granted this is an apple to oranges comparison at best. The only real similarity is that a backup or archive solution sends a data stream in a tar-ball or backup or archive save set or perhaps in a file format to the tape or de-dupe appliance. Then, the VTL or de-dupe appliance software puts the data into yet another format.

Granted not all tape media can be interchanged between different tape drives given format, generations and of course using the proper backup or archive application to un-pack the data for use. Probably a more applicable apple to oranges comparison would be how will IT personal get data back from a VTL (non de-duping) disk based storage system compared to getting data back from a VTL or de-dupe appliance.

Today and for the foreseeable future the answer is simple, if your pain point is severe and you need the benefits of de-dupe, then the de-dupe software and appliance is your point of vendor lock-in. If vendor lock-in is a main concern, take your time, do your homework and due diligence for solutions that reduce lock-in or at least give a reasonable strategy for data access in the future.

Welcome to the world of virtualized data and virtualized data protection. Here?s the golden rule for de-dupe and that is like virtualization, who ever controls the software and management meta data controls the vendor lock-in, good, bad or in-different, that?s the harsh reality.

For the record, I like de-dupe technology in general as part of an overall data footprint reduction strategy combined with archiving and real-time compression for on-line and off-line data. I see a very bright future for it moving forward. I also see many of the heavy thinking and heavy lifting issues to support large-scale deployments and processing getting addressed over time allowing de-dupe to move from mid markets to large-scale mainstream adoption.

Now, back to your regularly scheduled de-dupe debate drama!

Cheers
gs