Businesses of all kinds are generating ever-increasing amounts of data, but while this should be an invaluable resource in driving decision-making, the sheer volume can create difficulties.
Analyzing data in real time is ideal, but it can be surprisingly difficult to achieve. We spoke to Ariel Assaraf, CEO of the data transmission specialist Coralogix, to discover how companies can meet the challenges of real-time analytics.
BN: What challenges do companies face today when analyzing data in real time?
AA: First of all, most companies don’t analyze data in real time. For example, traditional log solutions rely on indexing to analyze data and provide information. In that case, we are already seeing some latency before the data flows through the analytics pipelines and the scheduled queries can be executed.
On top of that, these solutions are notoriously expensive because data must be stored to extract information. With this approach, all data is treated essentially the same and paid for at a basic fee. The cost of the analysis increases proportionally to the data itself. For modern applications, exponential data growth means that the cost of analyzing it exceeds the revenue.
These challenges and limitations of real-time analysis mean that companies are generally forced to select which data to analyze, and this of course leads to coverage gaps. This makes it more difficult to identify problems, especially unknown problems. When a problem is identified, it is not uncommon for the data needed to fix the problem to be unavailable or out of context.
Also, real-time data processing alone is not enough, as it loses the value of long-term analysis required by the state of the data. This is particularly true for modern applications that can experience major spikes and fluctuations over time.
What we really need to overcome these challenges is to combine real-time data analytics with health analytics and decouple it from storage so that we can reduce costs and improve performance.
BN: What are the advantages and disadvantages of analyzing data in real time versus storing large amounts of data in, say, data lakes?
AA: Teams need the ability to monitor their data at a granular level. This involves both performance trends over time and real-time alerts to quickly resolve issues.
Data lakes are great for long-term ad-hoc queries, but not good for high query concurrency for alerting and data enrichment use cases. Unlike data warehouses, data lakes can ingest structured and unstructured data, which means achieving low latency analytics is challenging.
For the most part, companies that store data in a data lake employ subject matter experts who can extract information when needed. Still, compared to traditional analytics solutions that index and store data, this can be a cheaper option.
Real-time analytics are crucial for proactive incident response and immediate problem resolution, but without data trending, real-time analytics can only take us so far.
For example, it’s great to know that I have X latency between data calls right now, but knowing that it doubled in the last six months adds a lot more value and context (but this is a more expensive query).
The best way to achieve high performance system monitoring is to combine real-time analytics with state transformations that allow us to track data trends over time. With effective anomaly detection, we can immediately alert you when system behavior changes. This can dramatically reduce the time it takes to identify and resolve problems.
BN: What kinds of applications and use cases require real-time data? How has the need for convenience changed in recent years?
AA: For the most part, we see the need for real-time data in modern cloud-native software or internet companies, where every minute counts and thousands of customers notice every little delay or incident.
That said, these days, companies in almost every industry are turning into software companies. Traditional industries like insurance and finance, for example, are dabbling in technology in a big way.
With the number of people depending on these companies and the implications if something goes wrong with their systems, there is a significant emphasis on reducing time to identify and resolve problems, as well as eliminating or reducing latency in all processes. Both goals depend on real-time data.
BN: How can companies ensure that they can analyze data in a timely and cost-effective manner? What are some of the best practices?
AA: Many companies are now working on new approaches that solve the problem of exponential data growth. The approach that exists in today’s market is to use storage tiers. But then you have to compromise on the quality and speed of the analysis. In that case, you must decide where you want to store the data before you really know what information it might contain.
What we are doing in Coralogix is using Streama, our streaming analytics engine, to ingest and analyze everything in real time, including transformation and state-of-the-art analytics. Then only frequently searched data is sent to hot storage and the rest can be sent to archive. The data in the file can still be queried at any time with relatively low latency. We can return the query results from the file in about a minute.
It’s basically about decoupling data analytics from storage, which improves performance and is more cost-effective.