ClickHouse

概述
总部
美国
|
成立年份
2021
|
公司类型
私营公司
|
收入
$10-100m
|
员工人数
51 - 200
|
网站
|
推特句柄
|
公司介绍
ClickHouse is an open-source, column-oriented OLAP database management system that allows users to generate analytical reports using SQL queries in real-time. Its technology works 100-1000x faster than traditional database management systems and processes hundreds of millions to over a billion rows and tens of gigabytes of data per server per second.
Supplier missing?
Start adding your own!
Register with your work email and create a new supplier profile for your business.
实例探究.
Case Study
Leveraging ClickHouse for Efficient OpenTelemetry Tracing: A Resmo Case Study
Resmo, a tool that gathers configuration data from Cloud and SaaS tools using APIs, faced a significant challenge in managing the large volume of network calls resulting from collecting data from thousands of APIs. The traditional approach of logs was too verbose and difficult to query, while aggregated metrics lacked sufficient context for detecting and diagnosing specific issues. Resmo utilized tracing, which provided a better view of the flow of requests and their associated responses. However, the volume of spans generated by Resmo's data collection was excessive, and the usual approach of sampling could cause blind spots, making it difficult to identify issues on non-happy paths of execution that happen rarely. Furthermore, many vendors charge by the number of ingested events and the volume of data per GB, which can be costly without any sampling. Only a few vendors allow custom SQL queries on the data.
Case Study
Network Traffic Monitoring and Optimization for Telcos: A Case Study on BENOCS and ClickHouse
BENOCS, a company that provides network traffic optimization and monitoring for some of the world's largest telecommunications providers, faced the challenge of monitoring and analyzing massive amounts of data traffic. The data was not static but constantly moving through cyberspace, requiring the company to factor in time as an additional dimension. BENOCS Flow Analytics users needed to investigate incidents that occurred in specific time frames, necessitating fast access to specific time ranges while ignoring irrelevant data. The company also had to deal with the challenge of analyzing network traffic at high complexity and speeds, especially in diverse environments with asynchronous data feeds. Across different network setups, BENOCS had to unify the data sources and correlate the incoming network information.
Case Study
MessageBird's Transformation with ClickHouse: A Case Study on Enhanced Performance and Cost Efficiency
MessageBird, a cloud communications platform, processes billions of messages, calls, and emails for over 29,000 customers. The company heavily relies on data-driven insights for efficient operations, with ClickHouse, an analytical backend, playing a crucial role since 2017. However, MessageBird faced challenges with its initial setup on MySQL due to scalability and latency issues. The company needed a solution that could handle high-volume data ingestion, provide low response times, and support real-time analytics for customer-facing dashboards and APIs. Additionally, the company required a system that could monitor the delivery performance of SMS messages and promptly identify anomalies. The challenge was to find a solution that could meet these needs while also being cost-effective.
Case Study
Leveraging ClickHouse Kafka Engine for Enhanced Data Collection and Analysis: A Case Study of Superology
Superology, a leading product tech company in the sports betting industry, was faced with the challenge of effectively collecting and analyzing quantitative data to improve customer experience and business operations. The company needed to gather metrics such as app or site visits, customer clicks on specific pages, number of comments and followers in their social section, and various conversion events and bounce rates. The data collected varied in structure, requiring a dynamic approach to data collection and analysis. Superology was using Google Protocol Buffers (Protobuf) to collect this data, but needed a more efficient and scalable solution to handle the large volume of data and its dynamic nature.
Case Study
Optimizing Customer-Facing Analytics with Luzmo and ClickHouse
Software applications generate terabytes of data that can be used to make informed decisions. However, transforming this data into visual insights for users can be a challenge. The task of delivering analytics in SaaS is two-fold: providing an easy-to-use, interactive, and personalized experience for end-users, and building a coherent and high-performing data architecture with tailored visualizations quickly and painlessly. The more advanced the analytics, the harder it becomes for developers to maintain. Additionally, there is the complexity of data security, ensuring that each user only has access to their personal data. The challenge lies in scaling tailored insights to hundreds or even thousands of users.
Case Study
Boosting Game Performance: ExitLag's Transition from MySQL to ClickHouse
ExitLag, a tool that optimizes the gaming experience for over 1,700 games on over 900 servers worldwide, was facing performance issues with MySQL. They were encountering bottlenecks and slowdowns with specific analytical queries about user behavior analysis and network route mapping, especially as their data volume increased. In their continuous effort to resolve common connection problems for gamers, ExitLag developed a sophisticated method for sending connection packets from users. These packets are sent simultaneously through different routes, thus increasing the guarantee that the packet will be delivered. However, the increasing data volume was causing performance issues with their existing MySQL system.
Case Study
Accelerating GraphQL Hive Performance: Migration from Elasticsearch to ClickHouse
GraphQL Hive, an open-source tool for monitoring and analyzing GraphQL APIs, was facing significant scaling issues. The tool, which tracks the history of changes, prevents API breakage, and analyzes API traffic, was initially using Elasticsearch for data storage. However, as the volume of data increased, the average response time began to slow down significantly. Additionally, the indexing process was problematic, with larger users affecting the query performance of smaller users. Despite attempts to improve performance by creating an index per user, the overall speed of Elasticsearch was still below expectations. The team at The Guild, the company behind GraphQL Hive, also found the JSON-based query language of Elasticsearch challenging, as they were more familiar with SQL.
Case Study
Plausible Analytics Leverages ClickHouse for Privacy-Friendly Web Analytics
Plausible Analytics, a privacy-friendly alternative to Google Analytics, faced a significant challenge as it scaled its services. Since its launch in April 2019, the platform had grown to service over 5000 paying subscribers, tracking 28,000 different websites and more than 1 billion page views per month. However, the original architecture using Postgres to store analytics data was unable to handle the platform’s future growth. The loading speed of their dashboards was slow, taking up to 5 seconds, which was not conducive to a good user experience. The team realized that to continue their growth trajectory and maintain customer satisfaction, they needed a more efficient solution.
Case Study
Coinpaprika Enhances Cryptocurrency Data Aggregation with ClickHouse
Coinpaprika, a leading cryptocurrency market data platform, was facing challenges with their existing data management system. They were using InfluxDB for their time-series data and MySQL for transactional data. However, as their data volume grew, they encountered several issues with InfluxDB. The team found it difficult to extract useful metrics from the system, and extending the timeframe for queries often led to server overload. They also experienced problems with response times due to merging data blocks. The open-source version of InfluxDB lacked built-in replication and scalability, which were critical for Coinpaprika's infrastructure. Coinpaprika needed a solution that could handle their increasing data volume, provide useful metrics, and offer improved performance and scalability.
Case Study
ClickHouse: The Backbone of Dassana's Security Data Lake
Modern enterprises are investing heavily in security products due to the increasing cyber risks and their impact on businesses. A typical large enterprise today uses more than a dozen security technologies, which emit data in various shapes and sizes, making it difficult to make sense of the data. Security Information and Event Management (SIEM) systems, designed for immutable time series event data, struggle with the mutable nature of security data. For instance, the state of an alert could change from 'open' to 'closed', and SIEMs cannot update this change. The solution is to re-insert the updated data and query the most recent data, which is challenging on append-only systems like SIEMs. Additionally, SIEM companies have stopped innovating and investing in solving basic problems such as data normalization. Dassana, a security data lake, aims to address these challenges.
Case Study
DeepL’s Transformation Journey with ClickHouse: A Case Study
DeepL, a language translation service, was looking to enhance its analytics capabilities in a privacy-friendly manner in 2020. The company wanted to self-host a solution that could handle large amounts of data and provide quick query times. They evaluated several options, including the Hadoop world, but found it too maintenance-intensive and time-consuming to set up. DeepL also wanted to automate the process of changing table schemas when frontend developers created new events, which would have otherwise overwhelmed the team. The company needed a system that could handle complex events and queries to understand user interactions, something that traditional tools like Google Analytics couldn't provide. Additionally, DeepL wanted to maintain full control over the data while keeping user privacy in mind.
Case Study
DENIC Enhances Query Times by 10x Leveraging ClickHouse
DENIC eG, the administrator and operator of the German namespace on the Internet, was facing challenges in improving the user experience of the internet community due to limitations in data analytics. The data relevant for their analytics was distributed among relational databases, server log data, and various other sources. These sources were already used for monitoring and system improvements, but their analytical features were limited and cross-evaluations across a wide range of sources were costly or not feasible. The initial steps of developing the data science platform involved using a database based on a relational DBMS. The data from different sources was consolidated by Python agents in containers on Kubernetes and the results were written to target tables in the database. This approach resulted in a considerable number of target tables and containers, which were difficult to administer and became somewhat overcomplicated. Furthermore, relational databases were only suitable for larger amounts of data to a limited extent, as the processing time of a query could take several minutes to hours.
Case Study
HIFI's Transition from BigQuery to ClickHouse for Enhanced Music Royalty Data Management
HIFI, a company providing financial and business insights to music creators, was facing challenges with its data management system. The company ingests a significant amount of royalty data, with a single HIFI Enterprise account having over half a gigabyte of associated royalty data representing over 25 million rows of streaming and other transaction data. This data needs to load into the user interface as soon as a customer logs in, and there can be multiple customers logging in simultaneously. Previously, it could take up to 30 seconds to load the data, and sometimes it would not load at all due to timeouts. HIFI was using Google Cloud's BigQuery (BQ) to store royalty data, but the pricing structure of BQ was a major challenge. It discouraged data usage and contradicted HIFI's data-driven values. Google's solution to purchase BQ slots ahead of time was not feasible for HIFI as a startup, as usage patterns could change dramatically week to week.
Case Study
Instabug's Successful Migration to ClickHouse for Enhanced APM Performance
Instabug, an SDK that provides a suite of products for monitoring and debugging performance issues throughout the mobile app development lifecycle, faced significant challenges with performance metrics. These metrics heavily relied on frequent and vast events, posing a challenge in receiving and efficiently storing these events. Additionally, the raw format of performance events was not useful for users, requiring heavy business logic for querying and data visualization. Instabug's backend is large scale, with APIs averaging approximately 2 million requests per minute and terabytes of data going in and out of their services daily. When building their Application Performance Monitoring (APM), they realized it would be their largest scale product in terms of data. They were storing approximately 3 billion events per day at a rate of approximately 2 million events per minute. They also had to serve complex data visualizations that depended heavily on filtering large amounts of data and calculating complex aggregations quickly for user experience. Initially, they designed APM like their other products, but faced performance issues with Elasticsearch, especially for reads, and writes were also not fast enough to handle their load.
Case Study
Opensee: Harnessing Financial Big Data with ClickHouse
Opensee, a financial technology company, was founded by a team of financial industry and technology experts who were frustrated by the lack of simple big data analytics solutions that could efficiently handle their vast amounts of data. Financial institutions have always stored large amounts of data for decision-making processes and regulatory reasons. However, since the financial crisis, regulators worldwide have significantly increased reporting requirements, insisting on longer historical ranges and deeper granularity. This has led to an exponential increase in data, forcing financial institutions to review and upgrade their infrastructure. Unfortunately, many of the storage solutions, such as data lakes built on a Hadoop stack, were too slow for at-scale analytics. Other solutions like in-memory computing solutions and query accelerators presented issues with scalability, high hardware costs, and loss of granularity. Financial institutions were thus forced into a series of compromises.
Case Study
Building a Unified Data Platform with ClickHouse: A Case Study on Synq
Synq, a data observability platform, faced the challenge of managing the complexity, variety, and increasing volumes of data that powered their software system. The company needed to merge operational and analytical needs into a unified data platform. They were dealing with a continuous stream of data from dozens of systems, with frequent bursts of volume when customers ran large batch processing jobs or when new customers were onboarded. The company had set ambitious performance goals for backfilling data and wanted to provide immediate value to customers as they onboarded their product. They also wanted an infrastructure that could serve their first set of defined use cases and provide functionality to support new use cases quickly. Lastly, they aimed to build a single platform that could store their raw log data and act as a serving layer for most data use cases needed by their applications and APIs.
Case Study
Leveraging Zing Data and ChatGPT for Mobile Querying and Real-Time Alerts in ClickHouse
Many companies use ClickHouse for its ability to power fast queries. However, the process of having an analyst write a query, create a dashboard, and share it throughout the organization can add significant delay to getting questions answered. This challenge is compounded by the fact that many business intelligence (BI) tools require someone at a computer to pre-create dashboards or limit users to certain filters. Furthermore, the need for real-time alerts and the ability to query based on a user's current location are increasingly important in today's fast-paced business environment.
Case Study
Real-Time Analytics Enhancement for Adevinta with ClickHouse Cloud
Adevinta, a global online classifieds specialist, operates over 25 platforms across 11 countries, reaching hundreds of millions of users monthly. Their mission is to provide the best user experience for buying and selling goods and services online. To achieve this, they needed a centralized analytics and dashboarding tool to monitor their seller's advertisements, track interactions, and improve performance in real-time. The Central Data Products team at Adevinta was tasked with building data and machine learning products to support their various marketplaces. They faced a complex challenge of needing a solution that could scale, provide end-user facing analytics capabilities with low latency and high throughput, and consider aspects such as reusability, uptime, and scalability. Adevinta required a user-facing real-time analytics and dashboarding solution that would allow the sellers to monitor their advertisements in real-time, tracking views, favorites, and likes, and capturing every interaction that occurs on their marketplaces.
Case Study
AdGreetz's Transformation: Processing Millions of Daily Ad Impressions with ClickHouse Cloud
AdGreetz, a leading AdTech and MarTech personalization platform, specializes in creating and distributing millions of intelligent, data-driven, hyper-personalized ads and messages across 26 diverse channels. The company processes millions of ad impressions daily, which necessitates a high-performance, cost-effective solution for their data storage and analytics needs. Initially, AdGreetz used AWS Athena for their data processing needs, but it failed to meet their increasing performance and data demands. They then turned to Snowflake, but the cost proved to be prohibitive for their data volume and query performance. The company needed a solution that could handle their vast data volume, provide quick query times, and fit within their budget.
Case Study
Admixer's Transformation: Handling Over 1 Billion Unique Users a Day with ClickHouse
Admixer, an Ad-Tech company, was facing a significant challenge in managing the increasing load on their advertising exchange platform. Initially, the platform was based on the sale of local inventory by external DSPs, but as it began to aggregate the traffic of external SSPs, the load on their processing and storage increased significantly. By the end of 2016, the share of external inventory had risen from 3% to over 90%, translating to an increase from 100 million to 3 billion requests. The existing relational databases could not handle the massive influx of inserts for statistics records. Furthermore, the company was using Azure Table Storage for storing and issuing statistics, but as the number of transactions and the amount of data increased, this solution became suboptimal due to the charges for the number of transactions and the amount of data. Admixer needed a solution that could display real-time advertising transaction statistics, handle a significant amount of data for insertion, aggregate received data, scale the data warehouse as requests grew, and provide full control over costs.
Case Study
Contentsquare's Successful Migration from Elasticsearch to ClickHouse: A Case Study
Contentsquare, a SaaS company, was facing significant challenges with its existing Elasticsearch setup. The company had 14 Elasticsearch clusters in production, each with 30 nodes. However, they were struggling with horizontal scalability, as they were unable to assemble larger clusters and maintain their stability for their workload. This limitation in cluster size meant that they could not handle any tenant that would not fit into a single cluster, severely restricting their ability to grow. The upper bound on the amount of traffic they could handle was slowing down the company's growth for technical reasons, which was unacceptable. They were left with two options: either find a way to host each tenant efficiently in a multi-cluster setup or migrate to a more scalable technology.
Case Study
ClickHouse: Powering Darwinium's Security and Fraud Analytics
Darwinium, a digital risk platform, was facing several challenges in the security and fraud domain. The platform needed to ingest and process data at a high throughput, deal with large volumes of data, and have capabilities to analyze data in a complex way. The database backend needed to handle high-speed writes and serve data for analysis as soon as it was ingested. Darwinium's real-time engine continuously profiles and monitors a digital asset, resulting in large volumes of data. The database needed to be capable of analyzing data at scale, and potentially process an entire year's worth of data. Technical types of fraud and security challenges required storing most digital datapoints for future investigations. The nature of analyzing fraudulent data required complex interactive analysis, and a database system that could respond in timeframes of 1 second or less, while providing a feature-rich functional toolbox.
Case Study
Integrating ClickHouse and Deepnote for Enhanced Collaborative Analytics
The challenge at hand was to provide a seamless and efficient platform for teams to discover and share insights from their data. The existing systems lacked a central place for collaboration and efficient work on data science projects. Moreover, the transitions between Python and SQL were not smooth, requiring a Python connector. There was also a need for a SQL editor with features like formatting, autocomplete, and linting right in the notebook.
Case Study
High-Speed Content Distribution Analytics for Disney+ with ClickHouse
Disney+'s Observability team was faced with the challenge of processing and analyzing access logs for their content distribution system. The team had to deal with a massive amount of data generated by the users of Disney+, which required a highly scaled and distributed database system. The existing solutions, such as Elasticsearch, Hadoop, and Flink, were not able to handle the volume of data efficiently. Elasticsearch, for instance, required a lot of rebalancing and used a Java virtual machine, adding an unnecessary layer of virtualization. The team was struggling to ingest all the logs due to the size of the data.
Case Study
Highlight.io's Observability Solution Powered by ClickHouse: A Comprehensive Case Study
Highlight.io, an open-source observability platform, initially focused on session replay and frontend web development features. However, as the need for full-stack observability grew, the platform needed to expand its offerings. This expansion was necessary to enable developers to track user experiences within web apps, identify backend errors, and analyze associated logs across their infrastructure. The challenge was to integrate these features into a single-pane view to streamline the troubleshooting process. Furthermore, the platform aimed to add logging capabilities to its stack, powered by ClickHouse, to provide deeper insights into applications by capturing and analyzing server-side logs. The goal was to handle high data ingestion rates and ensure that developers could access up-to-date information in real-time.
Case Study
Harnessing ClickHouse and Materialized Views for High-Performance Analytics: A Case Study of Inigo
Inigo, a pioneering company in the GraphQL API management industry, was in search of a database solution that could handle a high volume of raw data for analytics. They explored various alternatives, including SQLite, Snowflake, and PostgreSQL, but none of these options met their needs. Snowflake was too slow and costly for their needs, especially when handling real-time customer data within a product. PostgreSQL, while an excellent transactional database, was unsuitable for large analytic workloads. The company was able to get it to work with around 100K rows, but past that, the indexes were growing out of control and the cost of running a PostgreSQL cluster didn’t make sense. There was significant performance degradation once they hit the 100K - 1M rows mark. Inigo needed a solution that could handle billions of GraphQL API requests, create aggregated views on high cardinality data, generate alerts, and create dashboards based on extensive data.
Case Study
Juspay's Real-Time Transaction Analysis and Cost Reduction with ClickHouse
Juspay, an Indian fintech company, is responsible for over 50 million daily transactions for clients such as Amazon, Google, and Vodafone. As a pioneer in the payments industry, Juspay's mission is to streamline online payments for merchants, acting as an intermediary between payment providers and merchant systems. To ensure a seamless transaction environment, Juspay needed to provide monitoring and analytics services to guarantee the performance of the payment system. With a diverse array of merchants and their ever-evolving needs, Juspay had to keep up with frequent releases, often multiple releases daily. They needed a monitoring solution that would enable them to maintain their rapid release pace while ensuring that the latest release did not impact the running payment systems. Furthermore, Juspay was facing high operating costs with their previous solution, BigQuery from GCP, which was costing over a thousand dollars per day.
Case Study
Driving Sustainable Data Management with ClickHouse: Introducing Stash by Modeo
Modeo, a French data engineering firm, faced the challenge of managing increasing data volumes while accurately measuring the real-time carbon emissions generated by data usage and storage. This was part of their Corporate Social Responsibility initiative focusing on climate change. The company needed a solution that would allow their customers to monitor and optimize their data platform's cost, carbon footprint, and usage. The challenge was to balance the growing data volumes with the need to minimize environmental impact, a complex issue in the field of data engineering.
Case Study
Supercolumn: NANO Corp.'s Journey from Experimentation to Production with ClickHouse
NANO Corp., a French startup founded in 2019, was on a mission to revolutionize network probes. They aimed to create versatile, lightweight probes capable of handling bandwidths up to 100GBit/s on commodity hardware. Their vision was to offer a new kind of observability, one that combined network performance and cybersecurity. However, to fully utilize the potential of their network probes, they needed a robust database. The database had to handle fast and constant inserts, run periodic queries for alerting and custom queries launched by multiple users, and manage large volumes of data efficiently. It also needed to have a hot/cold data buffering system, be easy to maintain and deploy, and be efficient in RAM usage. Traditional RDBMS, which their main engineers had used in their previous careers, were not up to the task. They were too reliant on update speed and required clustering when overall performance became an issue. NANO Corp. needed a database as groundbreaking as their probe.
Case Study
OONI's Transformation: Enhancing Internet Censorship Measurement with ClickHouse
The Open Observatory of Network Interference (OONI) is a non-profit organization that provides free software tools to document internet censorship worldwide. Their tools allow users to test their internet connection quality, detect censorship, and measure network interference. However, OONI faced significant challenges in handling the vast amounts of data generated from these tests. They initially used flat files, MongoDB, and PostgreSQL to store metadata from measurement experiments. As the dataset grew into hundreds of millions of rows, performance issues arose, requiring a shift from an OLTP database to an OLAP one. OONI needed a solution that could simplify their architecture while handling complex data visualizations and enabling searches and aggregations on their 1B+ row dataset.
Case Study
QuickCheck's Transformation of Unbanked Financial Services Using ClickHouse
QuickCheck, a Fintech startup based in Lagos, Nigeria, is on a mission to provide financial services to over 60 million Nigerian adults who are excluded from banking services and 100 million who do not have access to credit. The QuickCheck mobile app, which has been downloaded by more than 2 million people and has processed over 4.5 million micro-credit applications, leverages artificial intelligence to offer app-based neo-banking products. However, the company faced challenges in analyzing the vast amount of financial data, fraud analysis, and monitoring data. They needed a solution that could handle hundreds of thousands of rows of data loaded daily for portfolio risk analysis and financial metrics building.
Case Study
TrillaBit Leverages ClickHouse for Enhanced Analytics and Reporting
TrillaBit, a dynamic SaaS platform for reporting and business intelligence, initially used Apache Solr as its data backend. However, they soon encountered several challenges. Solr, being a key-value store, was more suited to search than high-volume non-linear aggregation or data compression for performance. Its query language wasn’t as mature as SQL and it didn’t handle joins effectively. When implementing real company data from various sources, TrillaBit found that more flexibility was required in different scenarios. They needed a solution that could be managed at a low cost and could be implemented within their environment for hands-on experience and understanding. However, popular contenders like Snowflake were too expensive and didn’t allow for full on-prem implementation.