10/01/2017 |

Storage ready for the post cloud era


A few weeks ago I watched this presentation from Peter Levin (a partner at the Andreesen Horowitz VC firm). He anticipates the next wave of compute "after the cloud" when we’ll be going towards a de-centralized world with powerful networks of smart devices that are able to de-centralize data collection, compute and store.

These devices are smart and connected to each other, with some compute power and they can act as a micro datacenter… in your car, in your house, office or wherever.

Your IoT-based datacenter

No matter how your car or house is connected to the internet, these kinds of systems will need to process more and more information locally. The amount of data generated will be astounding.  Just to make an example: a connected car will generate 25GB of data per hour, and it’s easy to envision the sort of problems we are likely to have in the next future, to upload (and process) this amount of data immediately to the cloud when millions of these cars are on the road. And what happens if you lose that connection? Even just for a few minutes?

Some data will be needed for real time operations, other for diagnostics and analytics.  Everything will be stored for a certain period of time and some for longer periods… But you don't need (or want) to send everything immediately to the cloud… unnecessary, inefficient and risky.

Your IoT-based micro datacenter will have enough power to do what is necessary to improve data footprint (by eliminating redundancies, normalizing data, discarding unnecessary information and so on) and upload only what is necessary asynchronously, when possible.

From the cloud to IoT

Last month, at AWS re:invent, Amazon announced an interesting tool set: AWS Greengrass. The idea is to bring some of the programming constructs you already use on its cloud to IoT devices… including AWS Lambda and the ability to store data locally.

At the same time, IoT devices are obviously becoming ever more sophisticated and powerful…. generating larger amounts of data with each new generation, but it’s becoming evident that they lack proper storage subsystems. What’s more, data is not shared locally but only through an upload operation to the cloud which, again, poses many challenges in terms of accessibility, security and reliability of the whole infrastructure.

A new way of thinking about object storageIt's not about the media, flash memory exists at any price and any size, it’s more about the absence of proper data protection, resiliency, availability and, above all it's not shared. We are just at the beginning of the IoT era and in the vast majority of the cases, especially in the consumer space, these limitations haven’t had an impact yet. But if you look at the future and complex industrial  systems, the absence of an adequate distributed storage layer will become a major limitation, reducing overall efficiency and limiting the effectiveness and abilities of the whole system.  At the end of the day, is it possible to think about a datacenter without a storage infrastructure…? This is why your IoT-based micro-datacenter needs it.

Shared storage for your IoT network

Shared storage as we know it doesn't work for IoT devices. NAS or SAN are just too complex and, even though most IoT devices are based on Linux,  far too many additional components would be needed for it to work, (drivers, file systems, etc.) and security could become an issue. Object storage is the way to go, it can be accessed directly via native APIs or HTTP and is easier to access from the application.

IoT storage must be distributed, you can't think about a single storage device but, on the contrary, a multitude of devices with a small amount of storage can easily be part of a large distributed storage system. Think about 1000 raspberry Pis for example, each one of them with 300GB available. It would be 300TB (100TB with a three-way replica)!

It’s a compelling idea but this approach has its challenges. 1000s nodes for just hundreds of TB of storage? It means massive scalability, a lot of node rebalancing when a node disappears, complex node discovery and management that could impact performance. All problems that could easily make the system unusable pretty quickly. And while this type of issue could be very challenging in this particular (but futuristic) scenario, it is true that small ARM-based servers are becoming more of interest with hyper-scalers and large organizations, driving up the number of servers and posing identical challenges when it comes to storage infrastructures.

Challenging, but not for OpenIO

Last month we launched our first hardware appliance. We call it ServerLess Storage (SLS-4U96). But what does a 96-disk appliance have to do with the internet of things and the future of IT. Much more than you might think!

This appliance is unique. Its architecture is based on nano-nodes, small Linux powered devices with a two-core ARM CPU, RAM, flash memory and a SAS 7.2RPM high capacity Disk (or an SSD) and two high speed Ethernet ports. Nano-nodes are all connected to two 40Gb/s switches, and the chassis has N+1 cooling units and power supplies in 4U. It's a remarkable design which allows to reduce the failure domain to a single nano-node (disk).

This appliance also has a great $/GB, as well as good performance thanks to the high parallelism and bandwidth available. Yes, it could be a good storage for many use cases, including IoT, but this doesn't tell the whole story… our software – SDS - does.

SDS is a scale-out object storage platform that runs on nano-nodes and… does all the magic. It doesn't require a lot of resources (a Raspberry Pi can easily run SDS, and our nano-nodes have even less resources in terms of CPU cores!). In fact, it was designed from day one to be lightweight (just like the kind of resources you could find in an IoT device….does it ring a bell?).  What’s more, SDS doesn't work like object storage systems, it isn't based on distributed hash tables and load balancing is dynamic, designed around what we call Conscience technology- a set of advanced algorithms which measure and make cluster resources available in real time. Adding and removing nodes happens very quickly, and failures are managed in a matter of seconds.

Conscience technology enables another feature: Grid for Apps. By knowing the amount and where resources are available, SDS can run applications directly into the storage, triggered by events, and without any additional orchestration tool. This is a compelling characteristic of our system which allows to have compute and storage close to each other enabling to run applications where data resides! For example, think about running an image recognition software on one of the devices with an ARM GPU available on the network every time a new image is taken, adding metadata to the object and then making it accessible to other applications in the micro-datacenter.

Bottom line

With SLS we’ve already demonstrated that we can run 96 nano-nodes in 4 RU (or 960 in 40 rack units!). It works very smoothly (and at a highly competitive price BTW!).

Thanks to conscience technology OpenIO SDS, with its lightweight design and other unique characteristics, can be installed on any type of hardware infrastructure including containers, the smallest of devices or larger x86 servers and create a resilient storage layer for a large number of use cases… and with Grid for Apps we can leverage unused networked resources to run applications where needed, where data resides!

This idea of the micro-datacenter based on IoT devices is really intriguing, but we aren’t quite there yet. It is exciting though that we have a technology ready for what could be the next wave of compute (if it ever happens) but can already give, today, the best freedom of choice to our customers.

A new way of thinking about object storage