To understand the challenge at Amberdata, you have to start with scale. Today, our platform:
Produces 8–10 TB of data per day
Consumes 24–30 TB daily via Redpanda
Stores 2.6 PB in object storage
Handles 50B messages per day
Three years ago, we were operating at roughly 10% of this scale, yet our AWS bills were already unsustainable. Scaling the system as it existed would have broken the business financially. That forced a hard realization: efficiency could not be an afterthought. It had to become a core engineering objective.
“Growth at all costs” is a myth. Growth has to be defensible.
When I joined, we lacked the observability needed to define what "efficient" even meant. Costs were opaque, targets were aspirational, and tradeoffs were implicit. We changed that by treating cost as a first-class engineering metric, on equal footing with uptime and latency. Our North Star became clear: a cost-accountable organization built on strong observability, explicit ownership, and architectures designed to scale economically.
Early-stage startups optimize for speed and product-market fit. Efficiency usually comes later. We didn’t have the luxury of waiting. Our first step was brutally simple: we stopped paying for things we did not need. Instead of chasing marginal optimizations, we focused on a few governing principles:
We paired these changes with commitment-based savings, such as Compute Savings Plans and the Enterprise Discount Program (EDP). This phase did not make us cheap; it made us stable, buying us time to redesign the system properly.
The original architecture was well-suited for finding product-market fit, but eventually became a bottleneck for high-volume ingestion. Over three years, we systematically rebuilt the platform without interrupting 24/7 operations. Stabilizing costs in Phase 1 provided the leverage to transition to Redpanda and a Delta Lake foundation.
For a high-frequency data platform, ingestion resiliency and cost-efficiency are paramount. We achieved this by decoupling our ingestion layer from the streaming backbone, migrating our collectors to Redpanda Connect (Benthos) on Kubernetes. This transition significantly accelerated our speed-to-market and established High Availability (HA) by distributing collectors across multiple Availability Zones (AZs). This elastic architecture enables us to scale instantly to meet market volatility while maintaining a lean resource profile.
The move from self-managed Kafka to Redpanda was the turning point for our platform's economics. Redpanda’s streamlined architecture requires a much smaller infrastructure footprint, delivering superior performance with far fewer resources than the legacy setup. By running across multiple AZs, we achieved the necessary redundancy that the old system lacked, while significantly reducing the hardware costs and operational burden of the legacy environment.
The result is a system capable of handling 10x the throughput with vastly improved cost efficiency and near-zero data gaps.
Running commodity nodes at scale was a significant cost center that did not differentiate our product. By outsourcing standard node infrastructure to specialized partners like StreamingFast and QuickNode, we effectively traded high and unpredictable infrastructure costs for scalable, predictable OpEx. Their specialized streaming primitives allow for massive parallelization at a fraction of the total cost of ownership (TCO) required to maintain an in-house platform team.
At the same time, we continue to operate proprietary nodes where they provide a strategic advantage or a direct revenue-generating opportunity. This hybrid approach eliminates the overhead of routine maintenance while ensuring we capture the highest margins on the infrastructure that actually moves the needle for our business.
Legacy PostgreSQL and TimescaleDB instances did not scale economically. The cost of maintaining enough “always-on” compute and high-performance EBS storage to handle our growing volume was a primary driver of our move to S3.
Today, our entire 2.6 PB footprint lives in our Delta Lake, which has fundamentally changed our cost structure:
We decoupled real-time and historical access to avoid resource contention. Traders need low-latency streams, while researchers need high-throughput scans. Forcing both through the same compute path was inefficient and fragile.
Our data now lives in a unified Delta Lakehouse, providing ACID reliability and schema consistency. We are migrating our compute layer to Trino to enable high-concurrency SQL directly on S3-backed Delta tables, reducing reliance on expensive warehouse credits.
For sub-second analytical workloads, we continue to use Apache Pinot, where its indexing excels. Real-time delivery leverages WebSockets across horizontally scalable microservices, with Istio enforcing edge traffic controls to protect infrastructure margins.
Infrastructure changes alone are not enough. We decentralized cost ownership to domain teams. Cost stopped being a leadership concern and became part of everyday engineering decisions. Every team owns:
When spending spikes, the owning team investigates within hours and fixes it immediately. The result is a tight feedback loop where engineers see the impact of architectural decisions in near real time.
Three years ago, our infrastructure was unsustainable. Today, we process ten times the data volume with significantly higher reliability, while reducing our total infrastructure spend by nearly 50% from where we started.
By treating cost as a primary engineering metric, we evolved a fragile startup stack into a resilient, enterprise-grade platform. We’ve proven that speed and efficiency are mutually reinforcing; by architecting for scale, we unlocked faster delivery cycles and redirected our talent from maintenance to delivering the product features our customers care about most.