Skip to content
Request a Demo

Over the last two years, we have completely re-architected our system to meet the evolving demands of customers, digital asset markets, and our business. The broader adoption of crypto assets, recent all-time trading highs, and persistent market volatility have tested the limits of our data infrastructure. To meet these ever-increasing needs, we’ve built a modern architecture designed for scalability, reliability, and agility.

We’re proud to share a few key milestones achieved by the new system:

  1. Processing over $500B in daily notional transaction value
  2. Ingesting around 2 million events per second into Apache Pinot
  3. Handling ~2 TB of new data daily
  4. Storing over 2 petabytes of compressed data in our object store
  5. Three billion REST API calls last year.

Amberdata provides institutional-grade digital asset intelligence. Our customers rely on us for different use cases, from real-time market data and blockchain protocol information to normalized order book events and spot analytics.

Despite this variety, our core capabilities are:

  • Collecting relevant and timely data from diverse sources
  • Normalizing, enriching, and transforming market and blockchain data into proprietary, industry-leading insights
  • Deliver data consistently and accurately across all channels
  • Provide historical data through batch exports and marketplace integrations to clients’ AI and machine learning platforms
  • Support discovery and insight through an intuitive UI and APIs

Our goal is simple yet ambitious.

Build a scalable and resilient platform optimized for adding new datasets, deriving insights, and publishing them across delivery channels on demand while maintaining impeccable data quality, high availability, simplicity, and cost-efficiency.

System Architecture

Our distributed system comprises three main technical components:

  1. Redpanda and Redpanda Connect form the core of our high-throughput event ingestion and stream processing architecture.
  2. Databricks and Delta Lake serve as the core infrastructure of our data processing pipeline, supporting both real-time and batch workloads, enforcing centralized governance, and providing a unified view across diverse storage tiers.
  3. Apache Pinot for real-time analytics.

d98c1906-631a-4faf-a381-125d794b6717

All services are deployed across multiple availability zones in AWS to ensure fault tolerance and high availability.

Redpanda: High-Throughput Streaming

We rely on Redpanda as the core of our ingestion pipeline, with Redpanda Connect handling data integration and transformation. While Redpanda is compatible with Kafka, it offers faster performance, easier management, and significant cost savings. This capability enables us to ingest and process millions of events per second from blockchain, exchanges, and third-party data with high throughput and low latency.

  • Redpanda Connect transforms, enriches, and routes events in-stream.
  • Following a Lambda-style architecture, data is ingested simultaneously into low-latency and long-term storage, ensuring real-time access and historical retention.

Databricks + Delta Lake: Our Unified Data Platform

The Delta Lake serves as our single source of truth, following a Medallion Architecture that organizes data across three layers—raw (Bronze), cleaned and enriched (Silver), and business-level, analytics-ready (Gold)—all centrally governed through Unity Catalog. It supports ACID transactions and schema enforcement, ensuring data consistency and reliability throughout the pipeline. Shared notebooks, orchestrated workflows, and interactive dashboards promote collaboration.

As the foundation of our data mesh architecture, the data lakehouse facilitates domain ownership, data-as-a-product thinking, and self-serve access. Meanwhile, Unity Catalog enforces fine-grained access control, audit logging, and enterprise-grade security and governance.

Databricks has significantly increased execution speed, allowing our domain teams to ship new datasets faster with less overhead, while simplifying data quality and governance. With the adoption of Unity Catalog’s Iceberg Interface, we gain seamless interoperability with external systems like Snowflake and Trino, further reducing integration overhead and accelerating time to insight.

Apache Pinot: Real-Time Analytics Engine

Pinot powers all our low-latency data needs. Its capability to handle high ingest rates with sub-second query times makes it perfect for near real-time, complex analytical queries. Additionally, its query-side scalability enables us to maintain high performance even as data volumes and user concurrency scale.

With its columnar storage format, innovative indexing strategies (e.g., star-tree index), and real-time ingestion capabilities, Pinot is ideal for scenarios that demand fast filtering, aggregation, and slicing of large volumes of time-series data.

Accelerating Time to Market

Scalability and reliability are foundational, but the real differentiator is how quickly we can analyze, transform, and generate proprietary insights from a new data source—a blockchain or a crypto exchange. Reducing this lead time has been a central focus in designing our new system, enabling faster integration and value delivery.

Architectural simplicity is fundamental—we prioritize straightforward, composable systems that lower operational overhead. Each team manages its domain and shares data via clear contracts, allowing for autonomy and reducing integration friction.

Loosely coupled services allow teams to deploy independently and recover smoothly when issues arise. We have integrated observability, automation, and cost-awareness at every level, enabling us to move quickly while maintaining control.

Concluding Thoughts

Amberdata’s new platform has been in development for two years. It’s now a production-grade system providing institutional-quality insights across the digital asset ecosystem.

By integrating real-time streaming, a governed lakehouse architecture, and low-latency analytics, we’ve established a scalable foundation to support the next wave of growth in crypto and beyond.

Whether you’re developing crypto trading applications, auditing blockchain operations, or assessing liquidity across exchanges, Amberdata offers the infrastructure to confidently support those applications.

Q1 Bitcoin Market Intelligence report

Stefan Feissli

Stefan Feissli is an accomplished Engineering Leader with over 18 years of experience and a passion for working on complex problems, delivering measurable business outcomes, and building simple and beautiful products. I've worked for large organizations and fast-paced startups that went through hyper-growth. I enjoy...

Amberdata Engineering Blog

View All Posts