This is the second episode of “The Journey to a Hybrid Software Defined Storage Infrastructure”. It is a IBM TEC Study made by Angelo Bernasconi, PierLuigi Buratti, Luca Polichetti, Matteo Mascolo, Francesco Perillo.
Some episodes will follow and may be a S02 next year as well.
To read previous episode check here: https://ilovemystorage.wordpress.com/2016/11/23/the-journey-to-a-hybrid-software-defined-storage-infrastructure-s01e01/
Enjoy your reading!
From both Analyst Predictions and Client Surveys, there is a movement to Hybrid Cloud:
Source: IDC 2016 Predictions
Service Providers will see two significant Opportunities here:
- Service Providers are becoming a key part of Enterprise IT
- Clients (existing and new) are looking to extend their IT with Cloud services, especially DRaaS & Backup/Restore
To be specific analysts see a strong demand for future Cloud Based Storage Service Investments in DR Recovery and Collaboration and more as following:
Source: IDC 2016 Predictions
Analyst forecast try to predict what will happen in a short, medium or long term future and in the meanwhile customers realize that traditional infrastructure will not be longer able to correctly and quickly react to new business demand.
The only solution is to move towards a new IT infrastructure that will match the specific characteristics of:
- Velocity and Simplicity
The new concept infrastructure will easily match the above characteristics if deployed in Cloud.
It is important to highlight that creating this new Infrastructure does not mean that the traditional infrastructure should no longer exist. There will be in the future a coexistence between Traditional Storage infrastructure and the new Cloud Storage environment. This will be probably a never-ending coexistence if not any kind of workload will match the Storage Cloud requirements and not any kind of workload will be in the position to get benefits from the new infrastructure. That is the main reason why we will talk about Software Define Hybrid Storage infrastructure.
The investment and the new infrastructure growth mostly will go to Cloud direction depending by the business requirements.
The infrastructure could be deployed in Private, Public or Hybrid Cloud
To match the above characteristics, the new IT Storage Infrastructure will use new Software Defined Storage Solution Technologies creating a new Software Define Hybrid Storage infrastructure.
The information in this chapter will refer to a 2013 TEC study about Software Defined Environment and aim to give you an overview of what is a Software Defined Storage Infrastructure
Many were the traditional storage challenges:
- Constrained business agility
- Time that is required to deploy new or upgraded business function
- Downtime that is required for data migration and technology refresh
- Unplanned storage capacity acquisitions
- Staffing limitations
- Suboptimal utilization of IT resources
- Difficulty predicting future capacity and service level needs
- Peaks and valleys in resource requirements
- Over-provisioning and under-provisioning of IT resources
- Extensive capacity planning effort is needed to plan for varying future
- Organizational constraints
- Project-oriented infrastructure funding
- Constrained operational budgets
- Difficulty implementing resource sharing
- No chargeback or show-back mechanism as incentive for IT resource conservation
- IT resource management
- Rapid capacity growth
- Cost control
- Service-level monitoring and support (performance, availability, capacity, security, retention, and so on)
- Architectural open standardization
New Smart Data Center needs the following Storage functionalities:
- Dynamic scaling/provisioning (elasticity)
- No more forced in the single storage box or subsystem, but able to scale and provision space in a click.
- Faster deployment of storage resources
- With unique reference storage architecture, the storage deployment can be faster
- Reduced cost of managing storage
- With unique reference storage architecture, it is possible to reduce TCO leveraging people’s skill
- Greener data centers
- Consolidation based on Storage Virtualization (base of SDS) is key factor for space utilization and optimization contributing to build a Green Data Center.
- Multi-user file sharing
- Make data available to different end user or platform
- Self-service user portal
- Make the end user aware of his storage provisioning. The process can be easily monitored and faster.
- Integrated storage and service management
- Improved efficiency of data management
- Faster time to market
Software Defined Storage is characterized by several key architectural elements and capabilities that differentiate it from traditional infrastructure.
- Commodity Hardware
- All the intelligence in software-defined storage (SDS) is in the software layer
- Scale-Out Architecture
- Hardware in SDS needs to enable flexible and elastic configuration of storage resources through software by using a building-block approach to storage to dynamically add and remove resources.
- Resource Pooling
- The available storage resources are pooled into a unified logical entity that can be managed centrally
- Physical storage resources are virtualized and presented to the control plane, which can then be configured and delivered as tiered storage services.
- The storage layer provides extensive automation that enables it to deliver one-click, policy-based provisioning of storage. The system automatically configures and delivers storage as needed on the fly.
- The real power of Software Defined Storage lies in the ability to integrate itself with other layers of the infrastructure to build end-to-end application-focused automation.
Following are some common fallacies that customers and provider need to be vary of.
- “You can’t be software-defined storage unless you sell storage as just software”
- Some storage vendors that sell software-only solutions have tried to argue that “software only” is the same as “software-defined.” There is a big difference between storage software and software defined storage. The former is a technology delivery model, while the latter is architecture for how storage is deployed, provisioned and managed. All storage systems require hardware, whether the installation of software happens in the field or before the product is shipped.
- A Software Defined Storage system must run the storage controller in a virtual environment
- Some storage vendors run their storage controllers in virtual machines. This trend developed independently from Software Defined Storage, and offers interesting possibilities such as virtual controller redundancy and the ability to dynamically convert a server with disks into a virtual storage appliance. But running the storage controller in a VM is by no means a requirement for software-defined it’s simply a delivery method for software.
New requirements are surfacing and new paradigm will be satisfied to deliver a Software Defined Storage Hybrid Infrastructure
Gone are the days where IT’s primary focus is to “keep the lights on” and save money. As technology plays an increasingly important role in all aspects of business, IT is now in the driver’s seat to help the business differentiate with new products, services, and routes to market.
IT agility for the modern datacenter has a direct impact on business agility.
Few major motivations to move to Hybrid-Cloud, in term of infrastructure’s “agility”:
- It turns agile development into a truly parallel activity (unlimited testing/staging environments)
- It enhances continuous integration and delivery (+ it eases code branching and merging)
- It encourages innovation and experimentation
- It lowers impact outages and upgrades and provides disaster preparedness
- It enables virtually unlimited scalability
Cost-efficient. Hybrid Cloud Storage Infrastructure, which combines cost effective but inflexible private resources and flexible but premium priced public cloud services, allows organizations to operate cost efficiently under demand volume uncertainty
If the IT department is not able to keep up, business owners will resort to other unofficial channels (a.k.a. “shadow IT”) to get their applications set up quickly to drive revenue and ensure time-to-market goals are achieved. If the business uses other non-sanctioned IT alternatives, then IT loses control of the user experience but is still responsible for ongoing support and security. That’s not good at all.
- Few major motivations to move to Hybrid-Cloud, in term of infrastructure’s “efficiency” are:
- It lowers impact outages and upgrades and provides disaster preparedness
- It allows HA and DR solutions at lower costs
- Scales up (eventually down) to continuously adapt to business needs (cost-effectiveness)
- Infrastructure is always up to date (firmware and software upgrades
- Dynamic resource allocation and scheduling in line with various requirements
This statement is true in general, but it depends.
Few major motivations to move to Hybrid Cloud, in term of infrastructure’s “easiness” are:
- Ease of management and maintenance of the general infrastructure
- No need of a deep technical knowledge to provision and implement new HW/SW but cloud skill able to allow integration with traditional IT environment
- Takes advantages of the technical knowledge of the cloud providers to get always up to date, optimized and fully-efficient infrastructure
- Enables deeper insights on data and workloads (when needed)
Considering the Cloud for new applications or business processes, as business needs evolve can significantly reduce time to market when rolling out new software or processes.
Take the case of a new Customer Relationship Management (CRM) system, for example, where the typical in house CRM application deployment could require 4-6 weeks in user requirements analysis, 4-5 weeks in vendor selection, and another 12-18 months in customization, development and implementation.
By comparison, a cloud based solution can have an organization operational in a little over two months.
Allows companies to focus on business logic, rather than on the «how-to»
Finally, “Cognitive” is probably the most important and trendy statement. Today’s Storage need to be “Cognitive”, hence:
- Reacting to major IT issues is not enough.
- Keeping mission-critical applications available constantly requires the ability to anticipate potential problems and resolve them quickly before major issues occur.
- Self optimizing based on workload awareness, performance analysis and data content consciousness.
Cognitive capabilities in Hybrid-Cloud environments goes in this direction.
The idea is based on a metric called data value, which is analogous to determining the value of a piece of art. The higher the demand and the rarer the piece typically means it will have a higher value, requiring tight security.
For example, if 1,000 employees are accessing the same files every day, the value of that data set should be very high, just like a priceless Van Gogh. A cognitive storage system would learn this and store those files on fast media like flash. In addition, the system would automatically have backed up these files multiple times. Lastly, the files may want to have extra security so they cannot be accessed without authorization.
Of course, there is also the opposite. A data set, which is rarely accessed, like PDF files of 20-year-old tax documents, should be stored on cold media like tape and only available upon request. A cognitive storage system would also know that tax records need to be kept for at least 7 years and that they can be deleted after that period.
In many situations, data value can also change over time and a cognitive storage system can also adapt.
One way to determine its value is to track the access patterns of the data or the frequency it is used. Individuals can also add metadata tags to the data to help train the system, depending on the context in which the data is used. For example, an astronomer may tag a data set coming from the Andromeda galaxy as highly important or less important.
As detailed in the paper published in the IEEE journal Computer, IBM scientists have tested cognitive storage using 1.77 million files across seven users. Using a simple ranking of class 1, 2 and 3 based on metadata including user ID, group ID, file size, file permissions, date and time of creation, file extension, and directories in the path. They then split the server data into data per user, as each user could define different classes of files they deem important.
The result, a data value prediction accuracy of nearly 100% for the smaller class set.
This is the end of Episode #2. Next Episode will come shortly
Thank you for reading…Stay tuned!