- May 3, 2021
- Posted by: Abhay Das
- Category: Data engineering
The global streaming analytics market is growing at a Compound Annual Growth Rate (CAGR) of 25.2% and is expected to touch USD 38.6 billion by 2025 from USD 12.5 billion in 2020. One of the key growth drivers for real-time data is the need to accurately forecast trends for faster decision-making. However, one of the bottlenecks to streaming analytics is inadequate system integrity.
While businesses have access to lots of data thanks to the growth in IoT-based devices, cloud, enterprise systems, and so on, they face two problems. One is that the data is in raw format and two, it is stored in multiple systems and multiple formats. As a result, businesses need a solution that can pull structured and unstructured data in one place and convert this data into a unified format to act as the single source of truth.
A Gartner survey for the data integration tools market, titled ‘Adopt Stream Data Integration to Meet Your Real-Time Data Integration and Analytics Requirements’ and published in March 2019, indicates that 47% of organizations require streaming data that can help them build a digital business platform. However, only 12% had an integrated streaming data solution for their data and analytics requirements.
Traditionally, businesses depended on the ETL model – Extract, Transform and Load – but this is run as batch jobs periodically, rendering the data outdated and of limited use for some use cases.
In these times when businesses have to take quick decisions and respond to changing external and internal in a timely manner to remain competitive, depending on the ETL can be limiting to growth.
Real-Time Data Movement with In-Flight Processing
The face of data has changed tremendously today. Data is not only that which is stored in tables but also textual and documents across different formats stored in document stores such as MongoDB, Amazon DynamoDB, Couchbase Server and Azure Cosmos DB. An ETL can transfer data from one database to another, but with unstructured documents stored in these data stores, businesses need in-flight processing and built-in delivery validation along with real-time data movement.
MongoDB, for instance, is a document store where many of the sources are relational, flat, or unstructured. It will require a real-time continuous data processing solution such as Striim to create the necessary document structure as required by the target database.
Striim Features for Data Movement
Striim, an end-to-end, in-memory platform, collects, filters, transforms, enriches, aggregates, analyzes, and delivers big data in real-time. Designed especially for stream data integration, it uses low-impact change data capture to extract real-time data from different sources such as IoT devices, document stores, cloud applications, log files, and message queues and deliver it in the format needed and can deliver to or extract from MongoDB (or equivalent) as required.
With CDC, it delivers to MongoDB one collection per table, inserting, deleting, and updating documents based on the CDC operation, the row tuple contents with metadata, and the fields containing data elements.
It facilitates filtering and transforming data using SQL. Data enrichment is made possible by coupling it with external data in caches. Query output or custom determine the JSON document structure.
Custom transformations are also possible for complex cases with custom processors. While it is possible to achieve granular document updates, moving data from master/detail-related tables into a document hierarchy is also possible.
Benefits of Striim
Some of the key features of Striim that enable businesses to improve operational efficiency and deliver from and to document stores in real-time with in-flight processing for data integrity include:
- Low-impact change data capture from enterprise databases that allows for continuous and non-intrusive ingestion of high-volume data. It can support data warehouses such as Oracle Exadata, Amazon Redshift and Teradata; and databases such as MongoDB, Oracle, SQL Server, HPE NonStop, MySQL, PostgreSQL, Amazon RDS for Oracle, and Amazon RDS for MySQL. It enables data collection in real-time a variety of sources such as logs, sensors, Hadoop and message queues to enable real-time analytics.
- Non-stop data processing and delivery are effected through an inline transformation using processes such as denormalization, filtering, aggregation, and enrichment. This facilitates storing only the relevant data in the required format. A hub and spoke architecture is supported using real-time data subsetting and optimized delivery is enabled in both streaming and batch modes.
- Built-in monitoring and validation allow for non-stop verification of the consistency of the source and target databases. In addition to interactive, live dashboards for streaming data pipelines, it also enables real-time alerts via web, text or email.
Striim makes it possible for businesses to upgrade from ETL solutions to streaming data integration at an extreme scale by providing a wide range of supported sources. Any data can be made available in platforms such as MongoDB in real-time, in the required format to leverage scalable document storage and analysis.
Some of the key benefits of Striim include continuous data movement from a variety of sources with sub-second latency in real-time; a non-intrusive collection of real-time data from production systems with least disruption; and in-flight denormalization and other transformations of data.
Leverge your Biggest Asset Data
Read More
Indium – A Striim Partner
Indium Software is a strategic partner of Striim, empowering businesses to make data-driven decisions by leveraging the real-time Big Data Analytics platform. Indium offers innovative data pipeline solutions for the continuous ingestion of real-time data from different databases, cloud applications, etc., leveraging Striim’s highly scalable, reliable, and secure end-to-end architecture that enables the seamless integration of a variety of relational databases. Indium’s expertise in Big Data coupled with the capabilities on the Striim platform enables us to offer solutions that meet the transformation and in-flight processing needs of our customers.
To find out how Indium can help you with your efforts to replace your traditional ETL solutions with a next-gen Striim platform for real-time data movement with in-flight processing, contact us now: