07/02/2018 |

The Real Power of Computational Storage

iStock_000013569869Small.jpgToday’s CPUs are extremely fast, and one of the problems we have faced in recent years was feeding them properly to minimize wait time, get results faster, and keep overall system efficiency at an optimum level. If nothing else, the positive aspect of the Spectre and Meltdown bugs was to show how modern CPUs work, their level of optimization, and how data hungry they are.

Traditional ways to improve data flow and ensure its consistency to and from the CPU have reached some limits, and it is interesting to note that there are more and more initiatives to make storage platforms capable of running some compute tasks.

Bring Data Closer to the CPU

For many years, and because of the available technology at that time, the storage industry worked to speed up storage access through caching and automatic tiering. This was a compromise between cost and overall efficiency, but latency was measured in milliseconds, and throughputs were limited, with CPU, RAM, and the network craving for more.

Flash memory changed the game. Latency, throughput, and IOPS are less of a problem now, and data is much closer to the CPU than in the past. For example, NVMe removed the complexity and limitations of legacy interfaces and storage devices can now be directly attached to a server's PCI bus.

Lately, with the introduction of new classes of persistent memory devices, latency is reduced even more, and CPUs are finally getting what they ask for. Problem solved, right? Not exactly.

The problem is that we are creating much more data than in the past, and we are moving it around, back and forth, from large, shared devices, creating new bottlenecks (in the network, for example). All sorts of issues occur, especially when things are not going as expected, such as when hardware fails. And scale-out architectures are becoming more common, increasing latency and making data movements much more complex and expensive.

Bring CPU Next to Data

The concept of offloading some CPU tasks to the storage infrastructure is not entirely new. This has been the case since the rise of RAID controllers and the data services they provide. The most vivid example comes from volume cloning, or similar features designed to eliminate traffic between the server and the storage array by making copies of data directly within the array itself. 


Now, thanks to the power of modern CPUs, their efficiency, and smaller SoC (system on chip) designs, it is possible to do more and bring the CPU closer to data instead the opposite. By doing so, each device can perform some computational tasks that were, in the past, carried out on the CPU. Data is not moved anymore, saving bandwidth, improving overall parallelism, and enhancing overall system efficiency.

Yes, you can't compare the latest Intel x86 CPU with a small 2-core ARM device, but in the same rack footprint, you can have dozens of these smaller CPUs. With an adequate amount of RAM and connectivity, they can perform a lot of simple tasks. Those operations, executed by storage devices, make it possible to keep data local, minimizing latency, while improving overall parallelization and system efficiency. At the same time, the failure domain is smaller, and this improves the scalability of the entire infrastructure.

We at OpenIO have been working a lot on computational storage; nano-nodes are just an example. And now our research is going even further (but I can't talk about it here, sorry). As a company focused on open source software that runs on commodity hardware, our primary goal remains putting as much attention as we can on this storage+CPU model, and working with hardware vendors to make it happen.

In recent months, we began a collaboration with hardware startups, and the fruits of this work will be available soon. Most of the the work is still focused on hybrid nano-nodes, but it is interesting to note that flash density and speed now require more CPU to achieve a balanced architecture. The same problems we saw in the past will occur again. Some startups are already designing computational storage devices (here and here, for example) aimed at accelerating specific workloads, but this is only the beginning.

Serverless is Key to Computational Storage

Code.jpgIn order to make computational storage available to everyone we have to make it simple. And the simplest programming model available today is the latest: serverless computing.

Functions are small bits of code designed to run for a very short time, often triggered by events. They are stateless and they don't know anything about the underlying hardware. On the other hand, object storage is the perfect fit for serverless computing, and we shouldn't underestimate the effort of hard disk and flash manufacturers, that are increasingly pushing to market devices that work as Key-Value stores, accessible through APIs (KV store and object stores are not that different). If you start to connect the dots, it will become easier to understand the power of computational storage.

This process is very simple. Every new bit of data ingested (or read, or deleted) by these devices can create events that can trigger functions. Functions live temporarily in an ephemeral container spun up on the device when needed, and perform a simple task such as checking the validity of data, filtering unnecessary information, scanning for specific patterns, or even more complex operations like image recognition, image/video resampling, and so on. The range of applications is huge, and this could solve any scalability issues by hiding the complexity of the infrastructure.

The developer just needs to write the few lines of code that will be associated with a particular file type, event, or operation, and the storage infrastructure will deploy this code when and where it is needed to do the job.

You can see now why bringing the CPU closer to data is quite different than bringing data closer to the CPU!

This Isn’t the Future, it is happening now!

These devices are already available, and the software that makes the magic is open source.

I'm strongly convinced that the combination of object storage and serverless computing is the answer to the problems of scalability, optimization, and efficiency. With the former bringing a reliable and distributed storage layer, and the latter performing operations on data, locally, in a lightweight and hyperscalable manner, this is beyond hyperconvergence!

I'm a fan of the model I described above, but it only applies to unstructured data. It is interesting that some startups are taking similar approaches working, at a smaller scale, on primary storage devices with embedded computing capabilities.

Closing the Circle

Computational storage is not for everybody yet, and it its benefits, for now, are visible only in large scale infrastructures. But the number of applications is growing rapidly. Industrial IoT and edge computing, for example, need infrastructures that must be simple and robust, yet with persistent, reliable storage for sensor data, and with CPUs to validate data, optimize it, or make decisions on it while deciding what should be sent to the cloud for long term storage, big data analytics, or other operations.

In the near future, with the shrinking price and growing density of flash memory, I'm sure we will see more and more computational storage devices and serverless frameworks: nano-nodes, micro-servers, and specialized PCI cards are just the beginning.

Ready to know more about next-generation object storage and serverless computing?click here to get in touch!


Written by Enrico Signoretti