From 120 TB to 1.7 PB: The Engineering Challenges of Scaling Amberdata
Two years ago, when I joined Amberdata as VP of Engineering, we processed roughly $1 billion in daily notional transaction value. Today, that number exceeds $500 billion.
In the same period, our data lake grew from 120 terabytes to more than 1.7 petabytes of highly compressed data. Every day, we ingest terabytes from centralized crypto exchanges and blockchains to power hundreds of analytical datasets.
This is the reality of a scale-up. While early-stage startups focus on survival and product-market fit, scale-ups face a different challenge: exponential complexity. “Data gravity” is not a buzzword for us. It is a real force that makes collecting, processing, and distributing data increasingly expensive and time-consuming as scale increases.
The Growing Pains
The systems that get a startup to product-market fit rarely survive the transition to growth. Early engineering is rightly optimized for speed and learning, often at the expense of long-term architecture. But once a company scales, that technical debt comes due.
At Amberdata, our core objective is simple: deliver timely, accurate, and relevant digital asset data and analytics. Even though we doubled our customer base last year, our biggest challenge was not customer growth. It was building a platform that could absorb exponential data growth while maintaining exceptional data quality, without runaway costs or latency.
Technical Hurdles
Managing this scale introduced a set of hard, unavoidable problems:
- System reliability at scale: Digital asset markets never sleep. Volatility creates sudden, unpredictable spikes in load. Many centralized exchange datasets are ephemeral. If we miss a block or a tick, it is gone forever. Our collectors must run with near-zero downtime.
- The cost of big data: Processing petabytes is a luxury few startups can afford. We had to design systems that keep compute and storage costs low, even as data volume and customer demand grow rapidly.
- Data freshness and integrity: Our customers rely on our data for real financial decisions. Accuracy and speed are not nice-to-haves. They are the product. Most validation must happen in-stream and in real time.
- High-interest technical debt: In a data-heavy system, the interest rate on technical debt is brutal. A poorly optimized query or schema does not waste milliseconds. It can burn thousands of dollars in compute and storage.
- Real-time versus historical data: We face a dual mandate. Traders need real-time insights to gain an edge, while researchers need instant access to massive historical datasets for backtesting. Both must be fast, reliable, and cost-efficient.
Evolving the Culture
Growth is never linear. Having helped scale a company from 300k to 3 million users, I learned that once traction hits, the pressure on engineering becomes unpredictable and relentless.
At Amberdata, that forced a cultural shift. We focused on three areas:
- Ruthless prioritization: Engineering must be tightly aligned with business outcomes. Shipping features that do not drive value is one of the most expensive mistakes at scale.
- Observability and operations: We moved from reacting to incidents to deliberately operating the platform, with clear ownership, strong observability, and a fast mean time to repair.
- Automated feedback loops: Early startups rely heavily on qualitative feedback. Scale-ups need automated signals such as product usage, performance, and cost per feature to prioritize effectively.
What’s Next
Scaling is hard, but the results are deeply rewarding. Over the past two years, we focused relentlessly on two goals: continuously shipping products that solve real customer problems and building a high-performing engineering culture that consistently exceeds SLAs.
In the coming months, this blog will go deeper into how we addressed these challenges, including:
- Surviving volatility: How we ingested over one million events per second as Bitcoin crossed $100 k.
- Speed and efficiency: How we now collect entire blockchains in hours instead of days and onboard new exchange datasets in a week, at a fraction of the previous cost.
- Performance: How we optimized REST endpoints to deliver datasets up to 17× faster.
- Infrastructure: How we reduced infrastructure costs by 50 percent while improving availability.
- Deployment velocity: How our teams deploy to production more than eight times per day without breaking the build.
Scaling does not happen overnight. It requires the right people, changes to deeply ingrained habits, and hard trade-offs. But looking at how far we have come, managing petabytes of data with a lean and efficient architecture, I am incredibly proud of this organization.
Stefan Feissli
Stefan Feissli is an accomplished Engineering Leader with over 18 years of experience and a passion for working on complex problems, delivering measurable business outcomes, and building simple and beautiful products. I've worked for large organizations and fast-paced startups that went through hyper-growth. I enjoy...