IoT Spotlight - EP 205 - Al and IoT: Building Trustworthy Time-Series Data - Bert Baeck, Co-founder & CEO, Timeseer.AI

Podcasts > Ep. 205 - Al and IoT: Building Trustworthy Time-Series Data

Ep. 205

Al and IoT: Building Trustworthy Time-Series Data

Bert Baeck, Co-founder & CEO, Timeseer.AI

Monday, August 05, 2024

In this episode of the Industrial IoT Spotlight Podcast, we interviewed Bert Baeck, Co-founder and CEO of Timeseer.AI. Timeseer.AI is revolutionizing time-series data operations by automating data orchestration across organizations. Bert shared his journey from a background in data science to becoming a serial entrepreneur, with successful ventures like TrendMiner and now Timeseer.AI.

We discussed the impacts of data downtime and strategies to ensure data quality, empowering data teams to detect, prioritize, and investigate data quality issues before they affect operations.

Key Discussion Points:

The critical role of data quality in industrial IoT and OT environments.

How to establish metrics for measuring data quality.

Real-world examples where data quality significantly impacts business operations.

Timeseer.AI’s philosophy of scanning data where it resides, whether on the edge, in storage units, or in the cloud.

The interplay between IT and OT teams in ensuring data quality.

To learn more about our guest, you can find him at:

Website: https://www.timeseer.ai/

LinkedIn: https://www.linkedin.com/in/bertbaeck/?originalSubdomain=be

Q&A Summary.

What inspired you to dive deeply into time-series data and establish Timeseer.AI, particularly given your broad involvement in multiple ventures?

My journey started with my academic background in data science, at a time when the term 'data mining' was more common than AI or machine learning. Early in my career at Bayer, now Covestro, I was struck by the enormous untapped potential in operational technology (OT) and IoT data, much of which was merely collected but rarely used effectively. This insight sparked my first entrepreneurial venture, TrendMiner, aimed specifically at empowering process engineers with tools for efficient root cause analysis and troubleshooting.

Following TrendMiner's successful exit to Software AG, I moved into venture capital. Here, I observed an intriguing rise in data quality issues, particularly as organizations scaled up their data storage and analytics efforts. Data quality, although often seen as mundane, emerged as critical for truly unlocking the value of big data and advanced analytics applications.

Yet, existing solutions at the time were primarily built around relational databases and were ill-equipped for IoT time-series data. Recognizing this gap, I founded Timeseer.AI, dedicated specifically to addressing the nuanced challenges of IoT and OT data quality. Timeseer.AI is my focused commitment and passion today, uniquely positioned to solve these specialized problems.

Could you explain how Timeseer.AI fits into the existing technology stack used by industrial companies, and who typically benefits from your solutions?

Timeseer.AI occupies a unique middleware position, bridging the gap between data storage solutions like data historians or IoT platforms and advanced analytics or AI applications. Initially, organizations invest significantly in data storage infrastructure, ensuring vast quantities of data can be reliably collected. Subsequently, they move towards consuming this data through analytics dashboards and predictive models, where data quality quickly becomes a bottleneck.

Our solutions specifically target this gap, enhancing the data right at the source and ensuring its integrity before it reaches downstream applications. This positioning greatly benefits IT teams responsible for data governance and DataOps, ensuring smoother workflows and reduced errors.

Moreover, operational technology (OT) professionals also see direct benefits from Timeseer's solutions. Given their deep understanding of physical devices and sensors, OT teams find great value in our ability to maintain data integrity and sensor health.

Ultimately, while budgets typically originate from Chief Data Officers or digital transformation leaders, the effective implementation and success of our solutions rely heavily on close collaboration between both IT and OT teams.

Why does time-series data, specifically from industrial IoT and OT sources, pose unique challenges compared to other data types?

Time-series data from IoT and OT environments differ significantly from other forms of structured data due to their physical origins. These sensors are continuously exposed to environmental fluctuations, mechanical stresses, and calibration variances, creating inherently noisy data. Unlike relational databases or structured business data, these sensor outputs can experience unpredictable anomalies, signal drift, and transient errors.

Additionally, the lineage of IoT and OT data introduces complexity. This data typically originates at the edge, in environments with limited computational resources and stringent connectivity constraints. Traditional data quality tools, designed for structured, centralized databases, struggle to handle these conditions effectively.

Because of these unique challenges, specialized tools like Timeseer.AI are necessary. Our solutions are specifically designed to handle the variability, real-time demands, and distributed nature of IoT and OT environments, ensuring data quality and reliability throughout the data lifecycle.

How do you operationalize data quality checks at different levels, from sensors at the edge to centralized cloud systems?

Our operational model employs a distributed, hybrid approach. Similar to antivirus agents, we deploy small, intelligent agents directly where the data originates—on edge devices, sensors, and local gateways. These agents perform initial data quality checks, minimizing the need to move sensitive or costly-to-transfer data.

This distributed approach has several advantages. Primarily, it addresses common enterprise concerns about data security, latency, and cost associated with centralized data management. By conducting quality checks in situ, we significantly reduce the overhead associated with data migration.

The results of these quality checks are then propagated up to a centralized, cloud-based control tower. Here, stakeholders across the enterprise gain comprehensive visibility into data health and can intervene proactively. This hybrid model combines local agility with centralized oversight, ensuring robust and secure data governance.

What are some specific use cases or scenarios where Timeseer.AI's data quality solutions have proven particularly impactful?

One powerful use case is data sharing. Enterprises frequently need to share data with internal stakeholders, suppliers, or customers. In these scenarios, accuracy and consistency are paramount, as errors could lead to substantial liabilities or reputational damage. Timeseer.AI ensures shared data meets strict quality standards, significantly mitigating these risks.

Another critical area is regulatory compliance. Data related to environmental impact, such as emissions or energy consumption, must be rigorously accurate. Our tools provide necessary validation and verification, helping companies avoid compliance penalties and maintain transparency.

Furthermore, Timeseer has become essential in scaling analytics and AI models across enterprises. Many companies struggle to move beyond proof-of-concept due to the manual burden of data preparation. By automating this data prep process, Timeseer dramatically accelerates AI deployment and scaling, significantly reducing time and cost.

Finally, predictive maintenance applications particularly benefit from Timeseer. Often, predictive algorithms generate numerous false alerts because of poor input data quality. By validating sensor inputs in real-time, Timeseer.AI substantially reduces these false positives, improving maintenance efficiency and overall operational uptime.

Considering the complexity and diversity of industrial data ecosystems, what practical steps should organizations take to improve their IoT data observability and quality?

Organizations should begin by recognizing data quality as an essential business enabler rather than a secondary operational detail. This means adopting specialized tools designed specifically for the nuances of time-series data rather than attempting to apply generic data quality solutions.

Moreover, clear governance policies must be established to define quality standards, assign clear roles for data stewardship, and ensure robust cross-functional collaboration between IT and OT teams. Without clearly defined governance structures, improving data quality is nearly impossible.

Continuous monitoring and automated remediation should be embedded into data workflows. Real-time anomaly detection, validation, and correction processes ensure issues are addressed proactively, minimizing downstream impacts.

Finally, categorizing data by quality tiers—such as bronze, silver, and gold—can also help organizations prioritize efforts and resources effectively. This structured approach ensures the most critical datasets receive the attention they need, significantly enhancing data integrity and overall business outcomes.

How does Timeseer integrate into predictive maintenance scenarios, particularly in large-scale deployments?

Predictive maintenance solutions commonly struggle to scale because deploying them across multiple production lines often turns into separate projects, undermining the business case. Timeseer resolves this by embedding seamlessly as a middleware component. Our technology significantly improves model quality, with results showing models can be up to 70% more accurate because they're trained on verified, high-quality data. This robust data preparation drastically shortens the onboarding process, turning a previously complex scaling challenge into a streamlined operation.

Scaling effectively from one asset to thousands requires uniform and contextualized data. Timeseer facilitates this by creating unified namespaces and data models, making it straightforward to replicate successful models across multiple assets. Additionally, our solution continuously monitors data quality and system health, reducing false alarms during operational inference. In short, Timeseer provides comprehensive support for each stage—training, scaling, and inference—without end-users even realizing it's embedded in their existing dashboards.

Can you explain the concept and maturity of DataOps in industrial environments today?

DataOps, originating from the principles of DevOps, is evolving rapidly in industrial settings. Previously seen as peripheral or project-based, companies now recognize DataOps as essential to unlocking advanced analytics and AI capabilities. Critical practices such as asset modeling, digital twins, and unified namespaces have increasingly become standardized, driven by the necessity of structured, high-quality data.

However, industrial organizations often face a capability gap, particularly in operational technology (OT) teams that are attempting to reduce reliance on traditional IT departments. This trend towards operational independence highlights the industry's growing maturity in DataOps, although practices still vary significantly between companies.

Where does generative AI fit into the current and future landscape of industrial data management?

Generative AI is already transforming industrial data management, particularly in managing and interpreting time-series data. While traditional use cases of generative AI emphasize language-based models, Timeseer leverages foundational time-series models specifically designed for industrial data. These models can generate synthetic data to fill gaps or correct inconsistencies, dramatically improving data reliability.

Another impactful application is metadata augmentation, where generative AI enhances incomplete or inconsistent metadata, critical for building accurate context models. We're currently integrating these generative AI components to enrich and cleanse data, providing greater confidence in the quality of industrial datasets.

What specific technologies underpin Timeseer's AI approach, and how are these deployed in practice?

Timeseer's technology stack employs open-source large language models (LLMs), like Llama, for user interaction, providing intuitive and context-aware communications. For actual data augmentation and cleansing, we utilize specialized time-series foundational models capable of federated learning—allowing knowledge gained from one asset to benefit the entire fleet.

While data scanning and basic AI functions are effectively executed at the edge, more complex operations still predominantly utilize cloud infrastructure, including platforms like Databricks, Fabric, or Amazon S3. Although complete edge-based solutions remain a work in progress, significant strides are being made in edge computing capabilities.

What industries and applications represent Timeseer's primary market, and how is the product adapted to their needs?

Timeseer primarily serves sectors heavily reliant on time-series data, notably within operational technology (OT) such as manufacturing, utilities, pharmaceuticals, oil, and gas, as well as Internet of Things (IoT) domains, including connected healthcare and smart appliances. Regardless of the specific industry, Timeseer's core offering—comprehensive data cleansing, validation, and verification—remains effective wherever sensors and connected devices are used.

Additionally, OEM manufacturers increasingly adopt Timeseer within their service models (Everything-as-a-Service). Companies like Baker Hughes and Veolia, for example, embed Timeseer to enhance their predictive maintenance offerings, transitioning from capital expenditure to operational expenditure business models, reflecting broader market trends.

How do you see data monetization evolving, and can Timeseer contribute to this process?

Data monetization currently faces regulatory and business-model hurdles, limiting widespread adoption. However, early examples, such as chemical plants sharing analyzer data with sensor manufacturers, demonstrate tangible mutual benefits—like transitioning from scheduled to condition-based maintenance.

Timeseer can facilitate such data-sharing scenarios by ensuring data quality and integrity, crucial for building trust among stakeholders. While widespread monetization beyond direct manufacturer-to-supplier relationships remains limited, Timeseer’s capability to reliably specify and manage shareable data positions it strategically for future expansions into broader data monetization applications.

What's unique about managing data quality for sensors and industrial applications compared to traditional data quality tools?

Sensor data quality management significantly differs from traditional methods. Unlike databases, sensor data lacks established reference models for verifying accuracy. Data originates on edge devices and travels across various pathways before reaching central storage, adding complexity to data lineage management.

Moreover, troubleshooting sensor data typically requires skilled technicians physically inspecting or repairing hardware—vastly different from the IT-driven triage typical in conventional data quality contexts. Recognizing these nuances is essential for developing effective quality management solutions tailored specifically for industrial IoT environments.

Looking ahead, what's next for Timeseer's product development and innovation roadmap?

Given the complexity of our deep-tech AI components, we’re continually working to enhance scalability and robustness, supporting millions of sensors simultaneously. Our recent product release laid a strong foundation, but we’re now aggressively integrating advanced AI concepts like Copilots for results interpretation, smart summarization, automated configurations, and enhanced federated learning.

We see AI as essential not only for improving data quality but also for enabling scalable, meaningful insights across large industrial networks. Over the next year, Timeseer will further develop these innovative features, significantly advancing how companies manage and utilize sensor-generated data.

Transcript.

Erik: Welcome back to the Industrial IoT Spotlight Podcast. I'm your host Erik Walenza, the CEO of IoT One. Our guest today is Bert Baeck, Co-founder and CEO of Timeseer.AI. Timeseer.AI is a time-series data operations software company that automates data orchestration throughout an organization.

In this talk, we discussed the growing challenges of managing time-series data as data volumes grow, and as organizations become increasingly reliant on real-time analysis for automation and decision making. We explored the impacts of data downtime and approaches to make time-series data more trustworthy by empowering data teams to detect, prioritize and investigate data quality issues before it hit the operations. We also discussed how to define data quality metrics and establish building blocks for improving IoT and OT data quality and observability. Finally, we defined the responsibilities of the DataOps team and their fit in the broader IT organization.

If you find these conversations valuable, please leave us a comment and a five-star review. If you'd like to share your company's story or recommend a speaker, you can email us at team@iotone.com. Finally, if you have a research, strategy or innovation initiative that you'd like to discuss, please email me directly at erik.walenza@iotone.com. Thank you.

Welcome to the Industrial IoT Spotlight, your number one spot for insight from industrial IoT thought leaders who are transforming businesses today, with your host Erik Walenza.

(interview)

Erik: Bert, thanks for joining us on the podcast today.

Bert: Hey. Welcome, Erik. Nice to be here.

Erik: Yeah, great. Bert, I'm looking forward to this one. Because you're an entrepreneur. I'd say kind of a serial entrepreneur in that you had a successful exit. Maybe you can share a little bit about that. But then you've also been involved, from what I see from your LinkedIn profile, with, I don't know, 20 quite interesting companies right now as an investor, a director, and advisor of some sort. So maybe a good point to start here would be explaining a little bit why did you choose, out of all the things that you're involved in right now, why did you choose this particular topic to dedicate this part of your life on?

Bert: Yeah, so that's a long story that I will bring short. It all started with a background in data science very early on. Back then, nobody talked about AI. It was, I think it's called data mining. And so even statistics, we were trying to do. I have an engineering degree. I ended up in a chemical company called Bayer, a German company, that's now called Covestro. Very early on, I got a spark on the potential of process data like OT, IoT data, where not a lot was happening and the majority was just stored over a size of byte. I got a spark on that idea, right? So that was the first transition.

That company became TrendMiner, which is, long story short there, it's a self-service analytics company. It's basically a search tool for OT data, like a tool for process engineers where they can do root cause analysis and troubleshooting. That was an interesting ride. Because in the end, it was like a competition, a race between another company, us, TrendMiner and Seeq. We both did quite nice. We got acquired in 2018 by Software AG. Seeq is still in the market as an owned company with private investors. After that exit, I didn't stay that long at Software AG. I was quite more motivated to try something else. I was on board as a partner in a venture capital firm. So I was in Belgium. SmartFin Capital, they were in my cap team with TrendMiner. The partner said like, Bert, come in. You will be a good partner for AI investments, analytics, industrial software.

Basically, what happened as the why of Timeseer is, three years in the role, I see a lot of deal flows. So companies do good review. I started to notice deal flow with respect of data quality. Data quality. This is old category, right? This is boring. Nobody likes data quality. But why it's suddenly so investable? And it made sense at a certain point of time by, one, companies were solving the problem of data storage at scale. So you can store data to the cloud at scale. That's called big data. They can compute at scale, check the box. They can do that, too. So it was exactly the moment in time where I call it the Great unlock, where companies start to build data-driven applications. It's also the moment in time where quality is really mattering. That brings me to Timeseer. Because all the companies I noticed popping up in data quality, data observability, they were all focused on two things — structured data, relational data and, mainly, the data was already landed in Snowflake or Databricks, and you start there. While if you look to IoT data or OT data from factories, that has a complete different lineage and the technology was not really built for IoT data. So I decided that we wanted to build a second company after TrendMiner. That became Timeseer. So ```100% focus on data quality, data observability but for IoT data. That is the story from start to now. The other companies are more advisory, investor angel roles. But Timeseer is my full-time commitment and passion, what I'm doing.

Erik: Okay. Great. Yeah, the last podcast that I hosted last week was with the CEO of a company called Barbara. I don't know if you — are you familiar with them?

Bert: I know them but never looked it in detail of what they're doing. But I know the name.

Erik: So they're providing a different solution than you are, but they're also dealing with IoT solutions, a lot of process industries. I think what the trend that they're building around is this trend of companies migrating away from cloud-based solutions towards edge solutions, because of some of these data challenges, right? So the cloud solution is great for hosting a lot of data. But of course, that's very expensive. It's great for AWS. It's not necessarily great for the end customer. And so this challenge of cleaning up the data, making sure you're only storing what is the high-quality data that you need in the cloud, I think, is an increasingly significant challenge. Then once you start migrating things to the edge, of course, you need to make sure that you have actual good quality data that you're working with there.

Where do you fit into this overall solution stack? I guess a lot of companies are going to be familiar with their cloud providers, and they'll be familiar with the end-user applications that they're using. Where would you fit into that ecosystem, and who would be the direct buyer and user of your solutions?

Bert: Good question. Basically, when you say data quality, data quality management is part of a bigger team, which is data governance where data governance typically has a driver. We want one version of the truth. We cannot depend on too many different sources that have the same data and people manipulate them. So data governance is a very important dominant factor. Then DataOps is also part data operations, part of the data governance market and category. There indeed you see in DataOps, there's data quality management. It will be taken care of that you go from raw, uncurated, unverified data to high level, to high-quality data, repair it, augment it in the right format. So transformed. So this whole transition is part of what we call data quality management. That's basically where we fit in as a category.

If you look to the solution space, basically, if we wind back, what happened to us, the companies out there were focusing on digital transformation. So our clients, they first invested in data storage. That typically was indeed acquiring and buying and implementing data historians, IoT platforms, so making sure data is landed somewhere, that they can move. That's what happened to us. After they have the data in their stores, the whole ecosystem moved from data storage to data consumption, analytics, to dashboards, AI models. So we basically jumped from storage to consumption. What happened a little bit in hindsight is that all these things are not really scaling. So a lot of AI is also in a state what we call 'AI sobering' where it was not fulfilling the desires, stuck at proof of concept, not really scaling from one asset to thousand assets that they have. One of the reasons indeed is the data quality. What you now see happening is that the solutions in the AI stack, some of them have data quality as a feature, but it's really moving out of the analytic stack and moving into the middleware. So we're already, I would say, between storage and between data activation, data consumption. It's the data middleware where data will be prepared for usage and also laying things like the medallion concept of Databricks where you have silver, bronze, gold tiers of quality. Then all the tools that use data will tap into that kind of data catalog. So we really fit in between the storage and the consumption.

Erik: Okay. So I guess the dynamic is, somebody invests in their infrastructure. Then they start deploying applications. Then they figured out that the applications are not scaling or quite doing the job that they wish they were doing. Then they start investigating the data quality. Does that mean that you're typically going to be selling directly to the IT team, or is it often going to be some sort of collaboration with the cloud provider or the application, the key applications that they're obtaining data for?

Bert: A combination. Because, for example, if you only talk to IT, you see that they would tend to work with their IT tooling. Like, "Oh, we have Databricks. Let's solve it in Databricks." They start building things themselves in a MacGyver way. Like, do it yourself is very popular or very good tooling like data bricks, for example. But the moment you talk more to the operations and they know the sensors and the devices, then they do understand how difficult it is to apply data quality on the edge for devices and for sensors. Because you have the physics. You have the first principles. There's like the central specifications. Is this possible? They are the relationships between devices and how they need to behave. So you tend to go very deep into domain expertise, where you need to understand devices and sensors. Then, indeed, you need to glue the OT people with IT. So it's really like a combination. But if you look from a budget perspective, it's mainly IT. The chief CDO, chief data officer, chief digital officer is typically the driver to get the data into shape. But you need the OT people too, and also the data stewards. Because if data is not good enough and somebody is to validate, you need to have people understanding the OT, IoT data. So we see a mixture there, to be honest.

Erik: Got you. It sounds like you're getting pretty close down to the edge. So where would you be deployed? You'd be deployed on the sensor, on the hub or the gateway. Would you also be deployed in the cloud cleaning up data after it applies or making sure you're not duplicating data there? Where would you be touching the data?

Bert: Mainly, there's a big rejection to move data, right? For example, if I bring in technology and I first tell you as a client you need to move your data to Fabric, or to Amazon, or put it in Databricks, there's big resistance towards that. So our philosophy was, from day one, let the data reside where it's generated. So what is happening with time-series, what we developed is a sort of an agent. You can compare that with virus scanners. Back in the days where you were scanning for viruses, everywhere you had systems. While you're scanning for quality, everywhere the data is residing with our agents. That can be the edge. That can be the fork or the store unit. It can be cloud, move the cloud, manage the cloud. So we are very hybrid in a way where we can scan the data. But what we do is we propagate our results of data quality up to a control tower philosophy. That's, of course, mainly cloud. Right? But it's a very important design criteria that we can scan data were generated. That can be even on the edge like on a connected car, or on a sensor, or an aggregator.

Erik: So a hybrid approach, orchestrated. But then you have these certain principles about not moving data. I guess there's a cost perspective. There's quality or there's security perspectives there. Of course, you're managing the quality of data. I guess the cost is another issue, right? There's a question of, what data do we save? How long do we save it? So what are the different types of problems that you're focused on solving? Then maybe you can also help us to understand at these different levels that you're operating in at, there's probably different sets of problems by level.

Bert: Yeah, so what we see as a common thing is, of course, when you talk sensors, devices, you work with operations. Right? So a lot of Fortune 5000 companies have sensors in the fleet or in the heart of their operations. That is where we'll start in the business process. But to see where the use case, what we typically see is there's a couple of them. Of course, data quality should be a must-have. If it's just like, yeah, data quality, we understand. But where is it impacting us? It's a very big problem.

So I'll give you a couple of examples where it has no impact and why. Then I'll give you a couple of examples where data quality has impact and also why. Let's first, for example, start with an industrial factory in mind. If there is, for example, a control room and an operator, a human in the loop, there is like DCS, SCADA, PLC, a safety system, an alarm system. Well, what we see is, for factories, data is typically good enough for operations. So they don't care about quality because when it starts, they are operating the plant, for example. So there, we see not a good fit for data quality. But data quality is, of course, once it starts and it's in your database, it will be used for something.

There are typically use cases where we see it's really impacting. One of them is, for example, data sharing. We see a lot of our clients that have sensor and a device data that are sharing data. One, internal with people like their stakeholders but also external with suppliers, with their customers. Of course, if you share data with your suppliers or with your customers, it should be spot on, right? Because you create a liability. So data sharing is a very important use case where we prove them that data quality should be spot on.

Another one is dashboards that are, for example, public for government, like emission data, data for energy consumption. If I'm reporting energy, it should be right. There's also governance why data quality important, that kind of data product. Then there's a third kind of bucket of use cases where companies or clients of us are ramping up with analytics and AI. They were a little bit stuck. They tried something, and they were stuck at proof of concept and figured out that if I want to go from 1 asset to 20, 30,000 assets, it will never scale. Because it will take us too long, the onboarding time, the data prep time. They need something that helps them in scaling up their AI models. There we see data quality as a game-changing element. One, at training, you get better data models. Second, you will experience team learning efficiency, like data teams spend less time on cleansing and prepping data.

The third one is, sometimes these models in production like predictive maintenance, they generate false alerts. These alerts come because the input data, the sensor data, was not trustworthy. That's what something we do at inference. So we check data integrity at inference to, for example, when there's a model running, you can say, "Oh, don't trust this prediction. It's a false alert, because your sensor is behaving wrong." So we see value over there.

A fourth use case is indeed just embedded in data governance. We see a lot of companies buying data governance tools, creating data catalogs. Of course, there also needs to be an IT/OT counterpart in a data catalog. That is something we built for catalogs. So we try to be very ecosystem-friendly, not cannibalizing but just helping to unlock all these use cases.

Erik: So the first two seem quite straightforward. Those are kind of compliance use cases, right? So just hitting a certain accuracy level. The third one, maybe we can drill into that a little bit deeper. Because this is a topic that I've seen a lot of companies that we're working with struggle with. So things like this predictive maintenance. Working on a production line, you have 50 production lines across some plants. They all have similar hardware, so they should be the same more or less. But then, in the end, deploying a predictive maintenance solution ends up being 50 different projects for each production line. Then you kill the business case. So if we use that as a situation, how would you fit into that scenario?

Bert: So it's very important, those predictive maintenance companies. They would be a partner from us, so we would work with them from an OEM perspective. That means the end customer does not know Timeseer is in the middleware of the PDM, predictive maintenance solution. I think that's one thing to come clear. What's happening there is, indeed, you just want to move the needle in the different steps of the value chain. I think, as always, the first thing is you have to build a model. And if you build a model and you can create your model based on verified data, for example, by Timeseer, what we've seen from results is that we see much better model quality. Sometimes up to 60%-70% better models because it was trained on well-designed data. It was already cleansed. That was quickly pre-processed. So we see the data quality there is paramount and moving the needle at training. That's training your model, right? That's the first thing.

The second one is scaling it up. If you don't have one asset but you have thousands, what we typically see is that there's a struggle indeed with too long onboarding times. I always have to do the same training over and over. What's helping there is data uniformization, unified namespace. You see there indeed conventions like data contextualization. So adding context so you can have better data models to train your AI models on. It's all helping to get in a faster shape from how to move from one to thousand assets, for example, if that's a desire. So that's more on like the scaling element. How can you scale from one to x assets?

The third one is indeed more when the model runs in production. Indeed, what you see there at inference, you see a lot of issues with scalability but also with robustness and accuracy. There you need to help move away from the false alerts. Your model should only come forward when they have a real alarm, something that is really going on the model sensor problem. But from a value perspective, we fit in and the end client does not know that Timeseer is used. So you see dashboards with health scores of how good the data is. So it's completely end to end integrated. I also listen to a couple of the podcasts here. As some of these companies do working on predictive maintenance, well, they could embed Timeseer, right? So it's an OEM mindset that we have there.

(break)

If you're still listening, then you must take technology seriously. I'd like to introduce you to our case study database. Think of it as a roadmap that can help you understand which use cases and technologies are being deployed in the world today. We have catalogued more than 9,000 case studies and are adding 500 per month with detailed tagging by industry and function. Our goal is to help you make better investment decisions that are backed by data. Check it out at the link below, and please share your thoughts. We'd love to hear how we can improve it. Thank you. And back to the show.

(interview)

Erik: You mentioned earlier this concept of DataOps. How mature is that? Because I guess the concept kind of stems off of DevOps, which is quite well-structured, and people who kind of received training and specialized to some extent in that capability. If you're working with companies today, are you finding kind of a relatively well-unified understanding of how data should be managed, what data ops should be looking like? Are you finding that each company or different companies are kind of inventing their own processes there? What's your sense of how mature that part of this equation is?

Bert: It's for sure maturing. The popularity of use cases like asset models, digital twin, topics like UNS, unified namespace. We've seen indeed vendors, automation vendors, adapting and also understanding that it's really critical. That, for example, you need a unified namespace and you need good quality data if you all want to unlock all these use cases we discussed a couple of minutes ago. So these are prerequisites. What I mean, besides a couple of years ago, companies saw them as like, "Yeah, there are projects. We don't have time. We don't have resources." But if you're really blocked in your attempt to scale up data products, to scale up analytics, and you figure out the why — the why is like our data is not good enough and not in the right structure — you need to solve it first, right, if you really want to move forward with your analytics and AI. So we see now more and more companies understanding that.

Erik: The reason I'm asking is that we also see quite a trend of operations building their own kind of IT capability and trying to move away from IT or reliance on the traditional IT department. I think in many companies, that's been seen as somewhat of a bottleneck in terms of rolling out new systems. But then, of course, there's this capability issue on the ops side. I mean, how do you see this? Maybe one extension of this could be the topic of generative AI, which I know is something that will be probably late to come to operations. But already, I hear about companies using generative AI in some areas of their applications, as extensions of how to interpret time-series data, for example. But that also would enable the ops teams to be more independent when it comes to developing applications. So can you share your thought on that trend and then maybe also on the trend of generated AI, if you think that there's something interesting to share here?

Bert: Yeah, there's an interesting link to make. So I indeed see a lot of generative AI blogs and then posts on LinkedIn. If they frame manufacturing or IoT, it's always indeed LLM-based where you can interrogate and communicate with your results, which completely makes sense by the way. But there's also like a generative AI concept, where in time-series there are also now foundational models popping up. You see more and more than time-series foundational models. These can be used, for example, to create data. For example, if we identify we have the gaps, or we have data that's not good enough, and we want to clean, how will you clean? Well, that's also generative AI, right? But you just apply background models. What I also see popular now is the concept of federated learning where, for example, you are learning on the behavior of one device across your entire fleet of assets. Then of course, the more devices you see, the better models you can train. You can again use them for generative cleansing if there are, for example, gaps. So for data cleaning, I see a big part of generative AI being used.

Second is also for metadata. There's a lot of metadata out there that is describing how your sensor fleet is, like specifications, configurations, not good enough or not complete or inconsistent. Of course, metadata is very critical if you want to move on to the next step. Also, for metadata, there's generative AI techniques applied to start augmenting that to start creating better context models, better metadata. So I see it more used in different than typically we talk about language models, I see it really used based on time-series foundational models.

Erik: And for your tech stack, have you already started to adopt some of these foundational models?

Bert: Yes, for the augmentation and the cleansing and the traditional LLM stuff is used for communicating with TimeSeer. Like, what kind of data check I need to apply, for example, or how do I need to interpret these results? So you can communicate with the TimeSeer using LLMs. Using the time-series foundation mode is more for the augmentation. Because if you have to see it like that, if data is used for billing, for arbitrage or for sharing, and it's not good enough, you want to have conviction that it's good enough, the version that you share. So you need to augment. So it's really a must-have to unlock certain use case to have a clean version of the data.

Erik: Got you. Are you using the same LLMs that somebody might be using for — I guess, language model, that's going to be separate. So are you using more open source, or what would be the LLMs that you'd be building on?

Bert: Open source like Llama. We also have the cost of the racks to use and see if people are using the tools to optimize your LLM usage. So we're also experimenting with that learning. For us, learning is very critical. Because you need to adopt something, turn off other assets, all the prior events. So learning is critical.

Erik: Yeah, okay. Got it. Are you finding that they also deploy fairly well on the edge already or still a work in progress in terms of being able to deploy it?

Bert: Work in progress. I see the data scanning on the edge is something mainstream. That's really possible already with the current technologies. But indeed, the complete cycle end to end, that is sort of still early. I see the majority of applications built on data stores, like Databricks-oriented, or Fabric, or Amazon S3. So I see the majority still built on cloud infrastructure.

Erik: Okay. Let's talk a little bit more then about the customers that you're working with. I mean, this is, to some extent, a horizontal problem. But are you finding specific industries, specific types of challenges, that end up being kind of the 80-20 or the 80% of your customer base, or is it just a very fragmented market?

Bert: The company's name is TimeSeer. So it links with time series. So when we started up, it was like horizontal time series. But what we figured out is, we needed to make it smaller and for three now today OT/IoT. So both. That means when you talk OT, you would work more with manufacturing plants, utilities, pharmaceuticals, biotech, oil and gas, upstream, downstream. So that's more OT, to do in the wired space. But the moment you look to IoT, you end up more with connected things, connected assets, connected patients, connected athletes, everything with a sensor on, connected washing machines. So what we see is that we have a variety of clients, as long as they have sensors in the hardware operations, the time-series tech holds. And what we build from start — first, cleansing and then verification, validation — it holds as long as we talk sensors and devices.

Erik: Okay. But that's an interesting point, right? You could have an FMB manufacturer who would deploy your solution across their facility, but then you could also have a lot of OEMs right now are trying to differentiate their solutions by adding sensors and then capability on top of that. Then, as much as possible, they want to keep some degree of control over the application. And so are you finding a lot of those equipment OEMs, are those sensor OEMs becoming direct customers of yours?

Bert: Yeah, so that's what I would call XaaS, Everything as a Service, with, indeed, oil and gas companies. For example, we have a couple of names like Baker Hughes or Veolia who are indeed having their own assets and IoT devices, and selling an application to their clients for a higher degree of service or for getting warrant on — they have empty tanks, or for helping in predictive maintenance. There, indeed, you see that the quality part is critical. You see the TimeSeer fit in their OEM solution. That's always linked with Everything as a Service. They're trying to monetize. They're trying to move from CapEx to OpEx and charge their customers different and go to OpEx model. So I see that transition. We sell, indeed, to these clients.

Erik: Yeah, there's maybe an offshoot that in my experience has been underdeveloped. But this offshoot of data sharing, which is data monetization, that somebody is developing data that has valuable value for multiple stakeholders but just sharing the bulk of the data would be challenging for regulatory or other reasons. It strikes me that your solution could be also applicable here. And if you're able to specify what data can be shared with, which stakeholders, do you see any movement towards efforts to monetize data? I mean, it feels like one of those problems where the technical feasibility is there, but there's still a lot of legal or business challenges that need to be resolved.

Bert: Well, I see first use cases but not in a way that we can. I can already say that, yeah, it's happening at scale. For example, one example I can give is where you have a chemical factory who is sharing data of their very complex analyzers with the sensor manufacturer. They found a win-win. They're sharing data back to the supplier. What's happening is, they go from time-based to condition-based calibration. So that's a win-win in cost saving. And because that cost saving is helping the chemical manufacturer because it's saving costs, also, the vendor of the sensors can give a better kind of service. So where I see where data monetization is being used is where, indeed, when I share it, it's generating a win-win. That's what I'm already seeing. That indeed chemical companies are experimenting first steps when it's happening.

Erik: Got you. Okay. So it's happening now with these very direct relationships from the operator to the OEM, but maybe there would be third parties that would also find some value in that. Let's say the financial services partner companies would love to have direct streams of data. I guess that there's maybe a bridge too far right now. Although I've heard some companies that were working on that problem but never, to your point, never found the right way to scale that up.

Bert, I feel like we've covered a good bit of ground here. What haven't we touched on yet that's important for folks to understand about time-series data?

Bert: If you would analyze the category data quality, you will probably end up always with the horizontal vendors. What is maybe interesting to re-emphasize or to talk about for a second is, why is that the sensor different? Right? Why is sensor data that different? Sometimes it takes the hard way to figure it out. One of the reasons is that, for example, often you don't have background models. For example, let's take a relational database. There's like names and addresses. Well, we know our own address should look like. We know postal codes, so we know what's the good and the bad. But if we have time-series data from a sensor, how do we know it's good? Often you don't have that context of what is good and what is bad. You need to have the people who understand the data to make an interpretation. That is one.

What also is really different with time-series data is the lineage. The data is not in Databricks, right? It started on the edge generated by a sensor. It came maybe via data storage, and it ends up in the cloud. But it has an all different kind of lineage or path it travels. So that is quite different. Also, typically, when we talk about normal horizontal data quality solutions, the first triage on how you solve problems, it's done by IT or different people. When you talk sensors and there's something wrong, well, your triage is: I need to find instrument technician who needs to go in the field, who needs to repair or needs to check the sensor or do some maintenance. So the concept of first triage is completely different than the sensor world. So that is why I see IoT and the OT sensor data quality as something completely different than working on top of Databricks and SnowFlake.

Erik: Yeah, great, and maybe even becoming more so, right? Because in the past, a lot of that data was either operated on quite simply on the edge, or it was transferred somewhere. Now we're really doing more complex things with AI on the edge, which requires different processes than we're required in the past. Maybe last question then. You guys are what? About four years old now. I'm sure you have a very ambitious product development roadmap. If we look forward over the next year, what is on the roadmap for you guys?

Bert: What is on the roadmap? So this is a deep tech product. Sometimes when you talk to investors, you have a SaaS mindset. This is not SaaS because it's not jumping any development over the weekend. It really has deep down AI components embedded. So I'm making that robust and scalable for millions of sensors, not just like a couple. We need a long time. So we had version one only ship the end of last year. What's really on the roadmap is incorporating some more AI concept into data quality. So what I typically tend to say is everybody today accepts that AI needs data quality. Because without good data, you don't have scalable AI. But tackling data quality at scale also needs AI, and it comes in various forms like Copilots that help you in result interpretation, or making smart summaries, or helping you in configurations or in communicating with results. The learning, can you learn from different assets that are similar across clients even? All these AI concepts, we really see that it really is game changing and necessary to win this category. That is something we are heavily investing in and launching over the next couple of months.

Erik: Great. Well, it must be a super interesting time right now, right? I mean, AI is evolving very quickly in a number of different ways. So lots of change going on that you can benefit from. Bert, what is the best way for folks to reach out to you or to the team if they're interested in learning more?

Bert: There's, of course, our website timeseer.ai. There's LinkedIn. We try to post a lot of content because thought leadership is very important in early categories on educating prospect clients on why data quality is important, what use cases are typically happening. We are working on a product like growth version that can be used by data scientists for free just to play with Timeseer. Because how is the onboarding? So best is to engage directly, website, contact us, or via LinkedIn wherever there's interesting content.

Erik: Yeah, fantastic. Bert, thanks so much for your time today.

Bert: Welcome, Erik. I enjoyed this. Thank you.

No account yet?

Q&A Summary.

Transcript.

Contact us