Case Studies.
Add Case Study
Our Case Study database tracks 22,657 case studies in the global enterprise technology ecosystem.
Filters allow you to explore case studies quickly and efficiently.
Download Excel
Filters
-
(20)
- (10)
- (6)
- (4)
-
(19)
- (19)
-
(9)
- (8)
- (1)
- (1)
- View all
-
(4)
- (3)
- (1)
-
(1)
- (1)
- View all 5 Technologies
- (15)
- (4)
- (4)
- (4)
- (3)
- View all 16 Industries
- (23)
- (13)
- (10)
- (10)
- (8)
- View all 7 Functional Areas
- (18)
- (18)
- (7)
- (7)
- (7)
- View all 15 Use Cases
- (35)
- (21)
- (15)
- (4)
- (4)
- View all 5 Services
- (40)
Selected Filters
![]() |
Reducing Operational Overhead and Building Value through Improved Visibility
As Arc XP grew, operational overhead increased, and it became more challenging for the Arc XP team to keep track of their environment and meet SLOs. The team needed to find a monitoring tool that could work in their complex environment and allow the organization to focus on building value for its customers. The company’s cloud infrastructure was expanding to multiple regions around the globe, and adequate visibility was vital to ensure a high quality experience for its customers. Furthermore, the engineering teams wanted a better tool to help them with technical support and IT operations. Engineers had no universal, proactive alerting system that could inform them as soon as issues arose, or a solution that could help them diagnose issues quickly. These technical obstacles, as significant as they were, ultimately presented the organization with an even more fundamental business challenge. The more time the Arc XP team spent bogged down in operations and support, the less it was using its strengths to pursue its organization’s founding mission.
|
|
|
![]() |
Ensuring a Highly Available and Highly Scalable Platform
Braze, a cloud-based company that develops customer relationship management software, was facing challenges in scaling their systems and resolving customer support tickets. The company processes more than 8 billion API requests and sends more than 2.7 billion daily messages to a network of over 2.4 billion monthly active users. With a growing engineering organization, teams had different techniques and tools for evaluating the performance of their applications and scaling their systems. The organization needed a uniform way of determining whether provisioned infrastructure was appropriate for the traffic, forecasting future needs, and investigating performance-related issues. Additionally, many technical customer support tickets were escalated directly to the Product and DevOps engineering teams, effectively bypassing the Global Services and Support team when the questions were about performance, uptime, or throughput. This led to distractions for engineering and deprived Support and Success of the ability to quickly resolve customer tickets.
|
|
|
![]() |
Toyota deploys at scale faster and more securely by monitoring AWS with Datadog
Toyota Motor North America (TMNA) began using Amazon Web Services (AWS) in 2015 to simplify and standardize application development in the cloud and improve time to market. However, the team lacked a consistent monitoring tool, which created reliability concerns. Some developers used open source tools, others used log management tools, and some didn't use anything. As a result, team members often spent multiple hours trying to get to the bottom of an outage because they didn’t know what to look for or where. With 1,600 total applications (300 in the cloud) and more than 100 teams, that was a challenging task. On top of gaining unified visibility, the cloud platform team also sought to improve mean time to detection (MTTD) and ensure they could meet SLAs for 99.9 percent uptime while simultaneously reducing costs and helping engineers become more efficient.
|
|
|
![]() |
Easy-to-Use, Scalable Monitoring with Minimal Maintenance
Vistar Media, a company providing a programmatic platform for the digital out-of-home advertising industry, was facing a challenge as its infrastructure grew. They needed a powerful way to monitor their real-time service but didn't want to spend time managing their own monitoring solution. They were looking for a scalable, low-overhead solution that would integrate with their existing services and databases, such as Python, StatsD, Amazon EC2, and Amazon S3. This would allow their teams to focus on serving clients. In their previous setup, they had relied heavily on the open-source project Graphite and had more than 12 servers solely dedicated to collecting metrics for monitoring. As the company grew, they had to expend an increasing amount of time to make sure that the monitoring servers scaled at the same rate as the company.
|
|
|
![]() |
Growing a Global Company
Fintonic, a personal financial planning and mobile banking service, was experiencing rapid growth and expansion into new markets. They adopted a microservices-based architecture driven with Kubernetes to facilitate their global expansion. However, with their existing monitoring tools, it took time to onboard new engineers as they needed to learn a specialized query language to manually correlate logs across multiple tools. Teams struggled with alert coverage and prioritization since their monitoring tools could not target alerts based on tags. As they grew quickly, onboarding new engineers became difficult with their incumbent monitoring tools. They needed to adopt a stateless architecture with Kubernetes, Terraform, and Ansible to be able to replicate what they had built for Spain in Chile and Mexico.
|
|
|
![]() |
Charm Industrial uses Datadog to access critical data in real time as they reduce the effects of climate change
Charm Industrial’s goal is to reduce the effects of global warming and climate change. Accomplishing that goal will require Charm to sequester gigatons of carbon dioxide (CO₂) from the atmosphere annually using a fleet of fast, mobile pyrolyzers. Charm will eventually operate tens of thousands of pyrolyzers 24/7. For Edward Young, Head of Software and Electronics/Staff Scientist at Charm, this presented a significant challenge. “When you have tens of thousands of systems, you can’t have operators at every single site,” he says. “To scale the business we needed a way to simultaneously monitor numerous systems in real time remotely.” Charm's pyrolyzer systems use high temperatures to decompose agricultural and forest biomass residue and convert it into bio-oil for use in carbon removal. These systems perform various jobs and have demanding safety standards. Each system includes sensors that measure critical data—such as temperature and pressure—to ensure Charm does not exceed safety thresholds. The team needs to monitor all that data in real time.
|
|
|
![]() |
Alpiq streamlines MuleSoft monitoring with the Datadog Marketplace
Alpiq, a Swiss energy services provider and electricity producer, was using five separate monitoring tools which reduced productivity, increased costs, and prevented it from achieving end-to-end visibility. As part of a company-wide cloud migration initiative, the team also needed to migrate from its Tibco on-premises integration platform to the cloud-based MuleSoft platform and enable MuleSoft monitoring within the Datadog environment. Without proper visibility into its integration platform’s performance, Alpiq’s team would struggle to promptly integrate apps and troubleshoot issues, potentially impacting essential functions like trading and power plant operations.
|
|
|
![]() |
A Proactive Approach to Data-Driven Observability
Compass, a real estate brokerage company, was experiencing exponential growth and needed a sustainable monitoring strategy that could scale along with the engineering organization. They were using a suite of different monitoring products, which meant that engineers typically had to loop through multiple tools to solve one problem. This constant context switching created friction between teams and contributed to engineer burnout. Additionally, their point solution for monitoring the frontend stack introduced excessive administrative overhead due to its poor support for user provisioning and often generated false positives due to poor configuration. As Compass quickly expanded its engineering team, this process became a bottleneck and was no longer acceptable.
|
|
|
![]() |
Cloud Evolution
As Nordcloud expanded its reach throughout Europe, they faced the challenge of developing their next generation cloud services. They recognized that some needs weren’t easily or efficiently tackled in-house. Nordcloud sought Independent Software Vendors (ISVs) with a natural alignment to fill specific gaps in their offerings. They needed a partner to help extend their cloud solutions services, particularly in the area of monitoring. Monitoring a cloud solution is the final stage of a managed service environment and is key to reducing costs and SLA failures. Nordcloud needed to address situations where clients have multitenant environments or need more visibility than open source tools or legacy network monitoring tools could provide.
|
|
|
![]() |
Game Server Monitoring
EA DICE was preparing for the scale of traffic they expected following Battlefield V’s beta launch. They wanted to plan the launch to ensure stability and low latency and be confident that their customers could enjoy the new game with no interruptions. The game server team was continually on the lookout for a central log management solution to complement their infrastructure monitoring with Datadog, and they had evaluated a number of logging solutions. From the outset, they were only interested in a solution that could provide insight into their logs without their team needing to run and maintain the logging system or incur any other overhead. A second requirement was finding a cost-effective logging solution for game server monitoring due to the large volume of logs. Finally, they wanted a logging solution that integrated with everything in their tech stack.
|
|
|
![]() |
Partnership Brings Joint Success in LATAM
Econocom, a B2B reseller and technology consulting company, needed a monitoring solution to support the modernization and cloud migration projects of their customers. They required a solution that could deploy quickly and easily, allowing them to focus on growing their high-value-add consultative services. Econocom was looking for a product that could cover any customer’s stack and could be implemented quickly to accelerate their deal cycles. They also needed a modern monitoring platform designed to facilitate DevOps-style communication within ephemeral, cloud-based and Kubernetes-based systems.
|
|
|
![]() |
Eight Sleep achieves end-to-end observability with Datadog
Eight Sleep, a sleep fitness company based in New York City, was in need of a robust observability solution. The company wanted to better understand how users were experiencing its app and prevent any issues before they occurred. At the time, Eight Sleep used a solution that performed uptime testing for a few public endpoints, but it was too basic and lacked configurability. Engineers often got paged in the middle of the night for what ultimately proved to be false alarms. With a small development team, Eight Sleep needed a tool that could help it accomplish tasks quickly and easily. The company's competitors had three to four times as many engineers, so they needed a tool that could do the job it said it could do with minimum work required on their side.
|
|
|
![]() |
Glovo builds an enterprise-wide culture of security with Datadog Cloud Security Management
Glovo, an on-demand delivery service, was facing the challenge of securing their cloud infrastructure with limited resources. As the company grew its feature set and AWS cloud environment, Eloi Barti, Head of Platform Security at Glovo, wanted to scale security at the same rate. To do so, Barti sought to bring a culture of security to the forefront for engineering teams. However, when it came to security, Barti said Glovo used several different tools, and engineers didn’t know which security tool to look into when an incident occurred and required investigation. Engineers had to search through the various tools, quickly acclimate themselves to the context the tool provided, and manually correlate disjointed fields to understand if alerts were false positives or true security incidents.
|
|
|
![]() |
Prevent Future Technical Issues by Centralizing Alerts, Events, and Metrics
CircleCI's team was using several patched-together monitoring tools. As CircleCI's application infrastructure scaled, it became tedious to track the health and performance of their servers, databases, and other IT components as they had to spend hours every week manually correlating the outputs of their existing monitoring solutions. The final straw occurred when CircleCI missed an outage that should have been caught early by its monitoring system. Lowe knew then that he had “hit the limit with [their] tools” and needed to implement a more effective and sensitive monitoring solution that would scale automatically with CircleCI’s growth.
|
|
|
![]() |
Improving Application Performance and DevOps Collaboration with a Unified Monitoring Platform
HashiCorp’s self-hosted monitoring tools had poor usability, which led to a lack of visibility into their systems. This left engineers without quick feedback on new product features and ill-equipped to effectively troubleshoot issues. The limited access to real-time monitoring and alerting hindered the team’s responses to issues, causing unnecessary delays in incident diagnosis and resolution. The lack of visibility was attributed to the poor usability of the self-hosted monitoring tools that HashiCorp was using at the time, which left engineers ill-equipped to effectively troubleshoot issues or get real-time feedback on new product features. The limited access to real-time monitoring and alerting hindered the team’s responses to issues, causing unnecessary delays in problem diagnosis and resolution. Without the ability to track and compare current and historical states, troubleshooting became a reactive, time-consuming, and tedious task.
|
|
|
![]() |
Complete Observability of IoT Systems
Automotus, a curb management company, was facing challenges with their IoT devices and scaling cloud resources. They needed a robust monitoring solution that would provide visibility into their IoT devices, as well as their scaling cloud resources. Their manual and reactive approach to monitoring was proving to be inefficient. They were unable to collect important hardware metrics, such as network throughput, I/O load, and memory, which meant they often missed the first signs of degraded device performance. If their devices stopped sending messages, they were forced to SSH into the system and sort through logs by hand, which was an extremely time-consuming process that required all hands on deck. They also didn't have visibility into the management and backend services that are crucial to their system, such as AWS IoT Core. These problems were compounded by the absence of a centralized platform to view and analyze this data in context. The resulting blind spots stymied their troubleshooting process, leaving them to cross their fingers that nothing would go wrong.
|
|
|
![]() |
Datadog Helps SNCF Take High-Speed Track to Digital Transformation
SNCF, France’s state-owned railway operator, embarked on a major digital transformation initiative in 2016. The goal was to update its IT infrastructure and improve its competitiveness by migrating 90% of its applications to the cloud and embracing PaaS and containerization. However, SNCF discovered that it had no coordinated approach to monitoring. Business units had been adopting monitoring solutions independently, leading to the company using a total of 11 different monitoring tools. This lack of a single, standard monitoring tool severely restricted the scope of what each team monitored, making it difficult for different IT teams to cooperate on shared problems. This was a clear impediment to the organization’s goal to improve its competitiveness and agility. Additionally, SNCF’s existing monitoring tools weren’t cloud-native, leading to user friction and extra administrative overhead.
|
|
|
![]() |
Building a large-scale, highthroughput platform with Datadog APM and Continuous Profiler
Cvent, a market-leading meetings, events, and hospitality SaaS provider, had to pivot their entire product roadmap strategy to focus on building a new solution for virtual, hybrid, and in-person events due to the global COVID-19 pandemic. They planned to launch their new platform, the Cvent Attendee Hub, at Cvent CONNECT, their annual customer conference, and host the event on it. This meant that its performance had to be impeccable, and they only had six months to deliver. Ian Schell, Site Reliability Architect at Cvent, was tasked with ensuring that the Attendee Hub could accommodate the broad reach and increased registration volume of virtual events, which can often exceed that of in-person events. There were many unknowns surrounding usage patterns and scale, because the product was completely new, and the number of participants could be an order of magnitude higher compared to some in-person events.
|
|
|
![]() |
Glovo scales on-demand delivery app and eliminates downtime with Datadog Database Monitoring
Glovo, an on-demand courier service operating in 25 countries, was facing a significant challenge as its database resource consumption couldn't keep pace with its projected growth. As the company launched a migration to microservices, it needed better visibility into its databases to reduce CPU usage and prevent costly downtime. With an increased number of databases and queries running, they found databases were provisioned incorrectly and would often reach CPU capacity. This resulted in outages adding up to three or four hours of downtime in 2022. The existing monitoring products Glovo used didn’t provide the insight they needed. Limited access to real-time monitoring and alerting hindered the team’s response to issues. They also lacked the ability to track and compare current and historical performance data, making investigations manual and tedious.
|
|
|
![]() |
Achieving Operational Visibility for all Development Teams
MercadoLibre, the largest online marketplace in Latin America, was facing challenges with visibility into their distributed applications and dynamic hybrid cloud infrastructure. They had been using various open source tools to monitor their framework, but these disparate solutions made it difficult and time-consuming for them to correlate telemetry data from across their stack. The constant changes being made by separate teams in a shared hybrid cloud environment proved to be too dynamic for these basic monitoring tools to handle. They needed a tool that was purpose-built for monitoring multiple applications in a dynamic hybrid cloud infrastructure.
|
|
|
![]() |
Improving Staff Productivity by Providing Developers with a Workflow-Oriented Operational Monitoring System
As SimpleReach’s platform grew, teams began spending more time tracking and comparing performance metrics during infrastructural updates. Their existing open source monitoring tools created a disconnect between development and operations teams, making it difficult to assess the performance implications of frequent changes in the production environment. The underlying problem was a familiar one: a disconnect between development and operations. “The developers didn’t realize how the changes they were making were affecting the production environment,” Lubow recalls. “Some of the impacts were significant, and the need for frequent changes in both application and system software was making the situation untenable.”
|
|
|
![]() |
E-Commerce Platform Increases Resilience at Scale with Datadog and AWS
Neto was looking to move its existing legacy infrastructure to the cloud in order to drive automation and support their customers’ growth. However, their existing monitoring tools were unable to scale dynamically and could not track services across ephemeral infrastructure components. This posed a challenge as they needed a monitoring solution that could provide real-time visibility across a highly-automated environment. Prior to moving to the Amazon public cloud (AWS), maintaining and scaling Neto’s legacy infrastructure was slow, reactive, and prone to technical difficulties. Neto’s infrastructure environments often drifted out of sync, making it hard to increase capacity or deploy changes to production without engaging in manual, time-consuming processes.
|
|
|
![]() |
Provide a Flexible Solution to Suit a Service-Based Architecture and Scale With a Rapidly Growing Business
Airbnb, a leading community-driven hospitality company, faced the challenge of maintaining the reliability of their services while adapting quickly to new business opportunities. They developed a service-based architecture for some components of the site, while other components continued to be part of their main application. Separate engineering teams were created to support the separate components and features. Over time, they added many different systems for monitoring, some reporting to the central dashboard application, others being more standalone. This approach became difficult to scale, leading them to look for a comprehensive and more holistic operations performance solution.
|
|
|
![]() |
Partners Find a New Revenue Stream in the Datadog Marketplace
RapDev, a Boston-based technical consulting company, wanted to expand its integration and implementation service offerings to unlock more revenue growth potential. They saw an opportunity in the Datadog Marketplace to augment Datadog's existing monitoring capabilities by providing support for legacy OSs and internal IT. The challenge was to leverage the Datadog platform to implement projects and transformations at scale, diversify their customer base, and create a new revenue stream.
|
|
|
![]() |
Streaming Live Experiences to Millions, with Confidence
Seven.One Entertainment Group, a leading player in Germany's multi-channel entertainment industry, was facing a highly competitive market with rapidly changing viewer habits. Users were moving away from traditional TV and towards video-on-demand and interactive, second-screen experiences. The company needed to execute with the agility that only DevOps practices could provide. However, the lack of a single monitoring tool that provided visibility over the entire application and enabled engineers to trace requests across services was hindering their DevOps mindset. Each team used its own monitoring solution, so no tool provided visibility over the whole application or enabled engineers to trace requests across services. This lack of adequate monitoring also made it challenging for Seven.One Entertainment Group to deliver live interactive shows, which draw up to 10 million simultaneous viewers online.
|
|
|
![]() |
Arc XP secures applications in production with real-time visibility from Datadog
Arc XP wanted to boost its security monitoring capabilities and its defense-in-depth strategy so it could quickly detect and respond to attacks on its web applications and APIs. As an organization with divisions that operate autonomously, Arc XP wanted a single source of truth that could enable more effective collaboration among its distinct teams. In addition, Arc XP needed to detect suspicious behavior in its customers' code. The Arc XP platform allows customers to run their own code inside the Arc XP application, creating a shared security responsibility model with Arc XP responsible for the platform and its customers responsible for its code.
|
|
|
![]() |
Materials Project of Berkeley Lab Uses Datadog Cloud Monitoring to Simplify Observability on AWS
The Materials Project, a research initiative at Berkeley Lab supported by the US Department of Energy, wanted to make its materials research more accessible to a continually growing number of users by updating its monolithic website. The project’s computations drastically reduce the time for researchers to invent new materials, saving months or even years of painstaking work. However, as it scaled to meet US and global demand, its on-premises, monolithic stack strained to power both user and internal needs. The project also lacked insight into service usage and faults. Because the Materials Project is publicly funded, it needed an affordable solution to go along with the modernization of all aspects of its infrastructure stack for a microservice architecture.
|
|
|
![]() |
Monitoring that Scales with a Decentralizing and Growing Development Team
Bazaarvoice was in need of a monitoring solution that can support their evolving and decentralized approach to development. Scaling their existing monitoring systems would have required each team to not only host and manage their own observability tools, but also learn specific domain expertise in infrastructure monitoring. Bazaarvoice was already using a number of different monitoring solutions on their legacy systems. Scaling the existing monitoring systems would have required each team to not only run their own monitoring server, but also, to learn specific domain expertise in infrastructure monitoring. For Bazaarvoice, this would have involved more than 50 different monitoring servers, each with their own custom setup, metrics, and naming conventions. Maintaining easy and effective operations on such a disjointed system would have been nearly impossible for a team that was doubling in size every 6 to 9 months. Bazaarvoice required a monitoring solution that could support their evolving and decentralized approach to development.
|
|
|
![]() |
Monitoring Scales Alongside Blue State Digital’s Rapidly Growing Infrastructure
Blue State Digital (BSD) had a complex stack with multiple tiers of web services, databases, and load balancers that relied on varied systems including Linux, PHP, MySQL, RabbitMQ, and more. They were also in the process of migrating sections of their infrastructure to Amazon Web Services (AWS) to further support rapid infrastructure growth. As BSD moved to a more dynamic cloud environment that included automated server provisioning, manually updating server counts, instrumentation and alerts were beginning to take up a lot of time and overhead. They needed a monitoring tool that would easily integrate with their existing technical setup and scale effortlessly alongside their infrastructure.
|
|
|
![]() |
Monitoring a Complex and Elastically Scaling Cloud Infrastructure to Avoid Performance Issues
GameChanger runs a complex and elastically scaling cloud infrastructure hosted on Amazon Web Services (AWS) to support its mobile and web-based applications. This environment includes multiple databases and services, each of which requires monitoring. Taking data from tens of thousands of sources, transforming it into reader-friendly snippets, and then pushing it to fans in real-time means GameChanger has to be ready to handle high traffic, heavy I/O and to troubleshoot issues at a moment’s notice. GameChanger first built its own infrastructure monitoring tools in-house from the open-source components Graphite and StatsD. These homegrown monitoring tools got the job done but at a steep price: they required an extra $1,000 of AWS resources and more than half an FTE’s hours each month just to keep GameChanger running.
|
|