Cloud-Based Time Series Data Repositories: Microsoft Azure Time Series Insights
Producer companies in upstream oil & gas and the power & utilities space are increasingly looking at performing analytical use cases of asset data in the cloud. This trend began with coarse, low frequency data sets including lab information product samples (LIMS data), asset reliability information and downtime tracking / lost opportunity data sets, traditionally drawn from back-end relational databases.
iSolutions has been following a more recent development wherein companies are beginning to host high resolution, time-series data, in the cloud. This trend began with the emergence of IOT devices and sensors not directly connected to control systems as a means of historizing simple data streams –air & water quality sensors for instance. More recently, iSolutions has been asked by a number of customers to look at supplementing and ultimately transitioning traditional site data historian use cases to the cloud. Drivers of this trend include cost considerations, a desire to access cloud-based analytics capabilities (e.g. streaming queries and notifications, machine learning, cloud-based business intelligence, etc.) and in some cases, performance limitations of existing on-premise data historians.
Until very recently, data cleansing, standardization and online storage of time series data streams in the cloud – in an efficient manner that supports querying the data over a broad base of assets and time frames – has served as the primary impediment to unlocking analytics and visualization use cases. We at iSolutions have had some success in tailoring offerings such as HDFS and InfluxDB for this purpose, but have yet to encounter an off-the-shelf solution supported by a large vendor. The introduction of Microsoft Azure Time Series Insights marks an attempt to mitigate these traditional performance issues and introduces an opportunity to efficiently manage data in the cloud for companies with a large number of assets.
What is ‘Time Series Insights’?
Time Series Insights is a cloud based, native storage repository for (analog) time series data hosted within Microsoft’s Azure offering. The repository serves as a publication end-point for IOT devices that need to stream data to the cloud, using Azure Event Hubs as the intermediary data aggregator. Microsoft has tailored the offering from the ground up to support large numbers of connected devices, streaming multiple millions of data samples per day. Received Information is catalogued and persisted with enough Metadata to determine source devices and sensor parameters to allow for analytics to be done on the aggregated sensor streams.
A Time Series Insights repository is serviced by an Azure Event Hub and supports a handful of data transfer protocols (MQTT) out of the box. Data publisher protocols can be extended to utilize OPCUA data sources by using Microsoft’s structure IOT Connected Factory Azure offering. Beyond this, custom data publisher adapters would be required to access non-OPC data servers (Energy Management Systems for instance).
Microsoft advises that the new engine allows for up to 300GB of online storage spanning 300 million events, with online retention of up to 100 days at the time of this writing. Details on the internal storage algorithms of the repository are not known, though Microsoft indicates that no compression of the sensor data is being performed during data ingestion.
Out of the box Time Series Insights offers a basic line series trend type and heat map control. These controls can be trellised together to display data from multiple related assets:
Generated visualizations can be persisted across sessions as user queries so that common analytics scenarios can be re-used over time. Based on experimentation in a demo environment, it appears that Microsoft is positioning the visualization component of Time Series Insight as a quick tool for identifying simple process issues rather than as a full-blown visualization engine.
While Time Series Insight does not provide native capability for stream analytics or machine learning, the repository is directly accessible to Analytics tools within the Azure environment such as Cortana & Stream Analytics. We believe that surfacing Time Series Insight data to these tools from within Azure will be the primary use case.
Pricing for the offering is quite reasonable and will allow Microsoft to compete with traditional historian-based offerings. For instance, a large compute unit is offered at USD $1305 / month:
iSolutions has invested time with a demonstration environment of Time Series Insight. Our impressions are that the storage engine appears to be reasonably fast in surfacing persistent time series data, retrieving 30 day’s of sensor data for a given stream in <5 seconds response time. The data visualization capabilities, while basic, are easy to use and will help users in identifying step changes in the behaviors of sensors they are monitoring. While we weren’t able to test independently, Microsoft states that persisted data generated by IOT / plant devices will be stored and available for analytics within 1 minute of hitting the Event Hub. This, in conjunction with the ability to retain data online for up to 100 days, will hit a number of sweet spots in IOT device sensor and analytics scenarios and warrants further investigation. In the near term, the sub-optimal (slow) data wrapper mechanism of using OPCUA as a bridge to control systems and the limited online storage retention policy of Time Series Insights make it a tough sell as a replacement for site-based historian solutions. Our sense is that Microsoft and their partner ecosphere will move to broaden these capabilities quickly as we believe there is demand for the cloud or the hybrid-cloud model.
iSolutions’ founding partners have over 70 years’ combined experience working with Operations Technology data sets, with a focus on traditional time series data historian platforms from a number of vendors. More recently, we have extended our practice into the areas of cloud-based analytics and business intelligence to allow Operations Technology datasets to be analyzed on a fleet-wide, long-term basis.
In the near term, the new class of IOT devices will by-pass site historian-based solutions and utilize cloud-based time series repositories for data persistence, visualization & analytics. In the mid-term, we perceive that the data storage market is at a transition point wherein many existing point solution offerings for time series repositories focused at site level (single facility) data aggregation of SCADA & DCS data sets are undergoing intense commoditization and will ultimately be replaced by 1) multi-site / enterprise scale corporate data historians such as those offered by OSIsoft, and 2) cloud based offerings such as Azure Time Series Insights. This change is being driven by cost dynamics, limitations in performance of many site-based historian offerings, and the increasing need to interface Operations Technologies data sets with Enterprise class analytics and visualization toolsets. iSolutions is actively building bench strength in the area of cloud-based storage and analysis of time-series data sets (both using Microsoft Azure and Hadoop / open-source technologies) to service customer needs and drive cost savings in our delivered solutions and integrations.
iSolutions is working with a number of clients to develop concrete proof of concept use cases around cloud-based analytics for time series data sets in the upstream oil & gas and power & utilities spaces, using both Azure Time Series Insights and open source technologies such as Apache Kafka / HDFS, InfluxDB. Contact us to discuss your particular use case or learn more about where this space is headed & where some near-term opportunities may lie for your company.
iSolutions Inc. is a Data Management consulting company that specializes in designing, implementing and supporting real-time, historian based, reporting and integration solutions for Oil &Gas and Power Generation &Transmission companies. Our implementations allow clients to make educated, site specific production and operational decisions based on the analysis of real time data from integrated field and plant systems.
Azure IOT Suite Connected Factory Announcement:
Azure Time Series Insights Announcement
Azure Stream Analytics Primer: