This week I came across three interesting IoT projects, and, even though they are totally different from each other, they present the same challenges: data storage. This isn’t news, but things are a bit more complicated than they appear.Edge computing comes from the necessity to have some compute operations take place close to where data is created. Not everything can be sent to the cloud to be processed, and sometimes you need immediate results. This creates new challenges, especially on the storage layer.
Data Created at the Edge
There are at least three types of data that can be created by a remote IoT device.
- Instant Data: sensors, cameras, and other sources constantly create data that is necessary for making decisions immediately. Think about an autonomous car, for example; it continuously scans the environment around it and makes decisions according to speed, road signals, other vehicles and people nearby, and more. Most of the data created has to be computed instantly, and the result is used as input during the entire driving process. You don't need to send any of this data to the cloud, but you need to store some of it for a short time.
- Short-term data: Some of the data collected by sensors needs to be saved and computed when needed. It might be necessary to sync it to the cloud, but not in real time. Back to the car example, think about a dash cam: you store all the video coming from your ride locally and it can be automatically uploaded to the cloud via Wi-Fi when you get home, and stored there for a short period of time. You may need to check it later for legal reasons, such as if you got a speeding ticket, or if you witnessed a crash. This type of data loses its value over time, and occupies more space than instant data. It is also highly likely that you won't access it frequently; you may, in fact, never need to access it.
- Long-term data: Some of the data created will last for a very long time, or even forever. Any single bit of information that may be needed for further analysis, after real-time/short-time diagnostics, or for historic reasons, can be sent to the cloud and feed big data analytics systems to help the car manufacturer to develop future car models, for example.
This means that you need an integrated storage infrastructure that spans from the remote device at the edge and the cloud.
Cloud storage for IoT devices is practical because the device itself becomes stateless, therefore expendable. If you live in a perfect world with inexpensive connectivity, unlimited bandwidth, and very low latency, connecting your devices directly to the cloud compute and storage infrastructure is the way to go. But reality is different, and, especially in projects with devices dispersed all over the world, or in underdeveloped areas, or in many other scenarios where you can't afford a service interruption, the only choice is to store data locally.
With the exception of bare connected sensors, most IoT devices run on a standard operating system; in many cases this is an optimized version of a common OS (like Linux or Windows) tweaked to deal with a limited amount of resources. Unfortunately, saving data locally on the file system is not a viable solution:
- The device has no redundancies,
- It doesn't have embedded data replication, or some sort of sync-with-the-cloud mechanism,
- Applications should be independent from the device and the OS,
- Security should be implemented at a lower level, not passed off to the application.
The solution could come from an abstraction layer that implements all the needed storage functionalities described above, simplifying application logic and overall infrastructure. Object storage could be the key: it can be accessed via APIs, it is easy to use, needs few resources, and can store unstructured and semi-structured data easily. The same functionalities available on large data center clusters can be implemented in a remote IoT network. This means that data can be automatically replicated between remote devices that are close to each other for redundancy, solving this problem.
A similar object storage architecture, implemented at the core, could simplify data movements. Remote devices could sync to a central repository using a common protocol, and all these tasks could be automated and delegated to the infrastructure, allowing developers to focus on their edge or cloud applications.
The Role of Serverless in IoT
Another key component for this type of infrastructure could be a serverless computing framework. By delegating most of the data management to the object store, it can be possible to catch events during data ingestion, deletion or updates, triggering specific functions to raise alarms, validate data, and optimize it, or even start a sync process when necessary.
Functions are abstracted from the hardware or OS, and are stateless, making them the perfect companion for disposable, insecure IoT devices, while improving the development process of applications and minimizing the impact of upgrading devices over time.
IoT and, more in general, edge computing, are introducing several new challenges.
The cloud is good when data is concentrated in a single place and accessed by many applications and devices concurrently. But if you are dealing with a relatively small amount of data in a remote location, it is highly probable that you'll need local compute and storage resources. It is also clear that the two can't live without each other anymore.
A strong cloud-edge integration is necessary to make things work properly without increasing complexity to an unsustainable level. We are at the beginning of a new era, and we will see an exponential increase of enriched data coming from the edge that will feed applications and services in the cloud. Think, for example, of the next generation of mobile processors: they will have embedded AI functionalities (like image recognition) to depend less on cloud services, but at the same time they'll be able to send more valuable, optimized information back to the cloud, because they did part of the work locally.
We, at OpenIO, have the technology to make this happen. Our lightweight open source object store (OpenIO SDS), associated with the Grid for Apps, our serverless computing framework, can be installed on devices as small as a Raspberry Pi Zero (1 ARM core and 512MB RAM) at the edge, but, at the same time, has already proven itself, scaling to dozens of petabytes in the cloud. Conscience technology and other core design elements allow customers to mix servers and devices of different generations and take advantage of all their resources, while our R&D effort with nano-nodes and computational storage completes our vision by making this kind of device commoditized and available for both datacenter and edge use cases.