A short guide to real-time behavioral analytics managed by TerrariumDB

There are numerous cutting-edge technologies for real-time analytics that can be categorized into various types, each catering to distinct purposes and necessitating different technologies and approaches. Examples include Streaming Processing, Time Series Databases, Real-time Monitoring and Alerting, and more. In this article, we will explore how our customer classifies Synerise Real-time Analytics, which is based on our proprietary database engine TerrariumDB built from scratch for real-time behavioral analytics.

Figure 1 TerrariumDB is specifically designed to execute real-time queries, fully accessible to external users.

Events: The Building Blocks of Behavioral Data

In the realm of behavioral data analytics, the fundamental building block is the touchpoint, represented by events. An event is a discrete unit of unstructured data that captures a specific interaction between a user and a system. These events are pivotal because they serve as direct indicators of user behavior and preferences. For example, an event can represent a user viewing a product in a mobile application, which is a direct interaction indicating interest. Similarly, making a transaction is another event type, showcasing a different level of engagement and decision-making by the user. Each event is mapped to real-world interactions, providing a granular view of user behavior that can be analyzed to draw meaningful insights.

Figure 2 Events, stemming from a diverse array of interactions, each produce unique strands of unstructured data.

To refine and expand our definition further, it's crucial to delve into the nature of events more deeply. Events are not just simple triggers; they come laden with a plethora of additional internal data, often referred to as parameters, which are inherently unstructured and may also be nested. This complex data architecture enhances the richness of the information captured by each event, allowing for a multi-dimensional analysis of user interactions. These parameters can include anything from user behaviors, timestamps, geographical locations, device types, to more nuanced user engagement metrics. The unstructured nature of this data, coupled with its potential for nested layers, presents unique challenges and opportunities for data processing and analysis, necessitating sophisticated algorithms and technologies to unlock the full spectrum of insights they offer.

{
 "eventType": "pageVisit",
 "timestamp": "2024-03-06T12:34:56Z",
 "user": {
     "userId": "12345",
     "sessionId": "67890",
     "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64)",
     "ipAddress": "192.168.1.1",
     "location": {
       "country": "USA",
       "city": "New York"
     }
  },
  "pageDetails": {
     "url": "https://www.example.com/products/new-arrivals",
     "title": "New Arrivals - Example.com",
     "referrerUrl": "https://www.google.com/"
   },
  "interaction": {
  "timeOnPage": null,
  "actionsTaken": []
         "objectId"
  }
}

The provided pageVisit event example highlights the complexity of processing multidimensional data across various event types. TerrariumDB adopts a schema-less approach, treating incoming data as an unstructured stream, which offers the flexibility needed to address the dynamic nature of event data without predefined schemas. This strategy underscores TerrariumDB's innovative capacity for adaptive data management and analysis.

How does Real-Time Analytics Look in the Context of Behavioral Data?

According to Gartner

Realtime analytics is the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly. For some use cases, real time simply means the analytics is completed within a few seconds or minutes after the arrival of new data.

In the competitive arena of data analytics platforms, several of our competitors attempt to devise similar solutions using modern OLAP technologies. These efforts typically involve materializing audiences or segments overnight and updating them as new events occur, coupled with the construction of external services to monitor the status of each segment. Commonly, many providers advise customers to anticipate delays of up to 24 hours for the computation of audience segments and provide regular progress updates. However, this approach not only demands considerable effort but also compromises accuracy due to the inherent delays in processing and updating data. Furthermore, this strategy significantly increases the total cost of ownership (TCO) for businesses. The need for additional external services, along with the complexities of managing and maintaining these systems, escalates operational costs, making it a less economically viable option in the long term.

As a dedicated representative of Synerise, I would assert with conviction that the implementation of real-time technology is not just an addition to our operational capabilities—it is the cornerstone of our strategy to minimize the distance between us and our end users. Real-time technology does not merely enable us; it revolutionizes the way we interact with our users by allowing for instantaneous data processing. This immediacy means we can respond to user actions, preferences, and feedback without delay, thereby creating a highly personalized and engaging user experience. It's this level of responsiveness and personalized engagement that opens up limitless possibilities for us to understand, anticipate, and cater to the evolving needs of our users in ways previously unattainable.

TerrariumDB: Powering Real-Time Insights for Diverse User Needs

Our technology serves a diverse user base, each with unique demands and expectations. Among these, Business Professionals stand out as a primary user group. They aim to enhance their revenue through the monetization of data collected from various sources, such as e-commerce transactions, clickstreams, interactions with their brands, and third-party services. For businesses in many scenarios, the ability to operate on fresh, real-time data is vital.  In scenarios like loyalty programs with limited rewards, accurate insights into customer behavior become essential. It is critical to monitor remaining resources and promptly end the program upon reaching its limits. This data must be available instantaneously.

The second key user group is the End-Users themselves, who benefit from immediate access to data, such as the number of points earned from their latest transaction. This requires our database to ingest data immediately post-transaction to calculate totals and, in some cases, unlock new coupons or offers. This user group has the potential to generate huge traffic with spikes that can reach up to 10 times the normal amount.

The third category includes external services leveraging Synerise-collected data for their internal purposes, such as BI tools, AI models, or Data Scientists. Unlike the first two groups, real-time data is not as critical for these users but they are looking for a modern stack with excellent performance. Their operations are more focused on batch processing and data engineering, indicating a different set of requirements and challenges.

Figure 3 TerrariumDB is adept at managing various types of workloads, adeptly catering to the distinct needs and requirements of different end-users.

The necessity for a real-time database engine, such as TerrariumDB, is emphasized by the needs of the first two user groups. We are dedicated to providing immediate access to information that drives the development of technologies like TerrariumDB, ensuring our users have access to the most up-to-date data for their decision-making processes. This approach not only enhances the user experience but also significantly contributes to the success and operational efficiency of our clients.

Conclusion

As a company with over 10 years of experience working with behavioral data analytics, we understand the immense value that data can offer when it’s presented effectively.  Our technology is built from scratch without any trade-offs. We recognize that our customers value the ability to minimize the time it takes to act on incoming data, aligning with their specific requirements and business scenarios We also recognize that new data has a short shelf life in the context of creating engaging interactions with end users.

Figure 4 In our scenarios, data that is rapidly ingested and instantly available for analytics represents the most valuable asset, driving immediate insights and actions.

Our cutting-edge technology, TerrariumDB, opens up limitless and effortless scenarios:

  • For querying entire data sets to uncover valuable insights.
  • For querying data and obtaining information about specific end users to deliver personalized content - user-facing query & analytics.
  • Including all operations in our database engine (read & write) executed in real-time, with response times ranging from subsecond for user-facing queries to just a few seconds in analytics queries where we’re looking for insights into the whole data set.

TerrariumDB empowers seamless real-time querying and analytics of comprehensive datasets to deliver personalized content and valuable insights, offering response times ranging from subsecond to just a few seconds. Specifically tailored to handle the workload of event streaming generated by real-world scenarios, TerrariumDB adeptly manages traffic spikes and characteristics inherent to behavioral data. This capability underlines our commitment to innovation and the development of robust, in-house storage solutions, ensuring swift and effective data operations that cater precisely to the dynamic nature of user interactions.

Author: Miłosz Baluś | Chief Database Architect, co-founder at Synerise