• scale
  • delivery reliability

The Operations Behind 10 Billion Messages a Year

A big number is not the point. The real work is sending all of it on time, at scale.

Thumbnail representing the infrastructure behind sending 10 billion messages a year

FlareLane sends over 10 billion messages a year. More than 200 million active users, over a million event calls a minute. Some days a single campaign goes out to millions at once.

Here is the whole thing on one page first. The top row is the send path a message travels; the bottom row is the data path that keeps the send targets current in real time.

FlareLane large-scale delivery architecture. The top row is the send path; the bottom row is the realtime data path that keeps send targets current.
FlareLane large-scale delivery, overview

Sending to millions at once

Push a send to millions and the pressure hits from two sides. The users who tapped pour into the app together, and that traffic piles onto the customer's servers. The channels carrying push, SMS, and Kakao downstream each have their own limits too. Problems you never saw at small scale show up all at once as the numbers climb.

The way the pacing works is simple. FlareLane does not fire requests straight out; it parks them in a message queue first, then caps the rate at which messages are pulled off that queue. A sudden few million wait in the queue. The actual send goes out at a rate the customer's servers and the outside channels can absorb. All you do is set messages per minute in the console. The queue and the rate-limiting logic are not yours to build.

The workers that drain the queue run serverless. Instead of a fixed fleet kept running around the clock, the number of workers handling messages scales up and down with how much is queued. When volume jumps to dozens of times the usual, more workers spin up to match. When the send finishes, they scale back down. So there is no server for you to add and no architecture to stand up. A sudden spike still goes out without backing up.

Settings screen for capping messages per minute to ease load on a customer's servers
FlareLane console, send-rate control

When an outside channel fails

At scale, an outside channel slowing down or throwing errors is routine. One channel lagging cannot be allowed to stall the whole send. So FlareLane puts a circuit breaker on each channel: when a channel's error rate crosses a threshold, that path is cut for a cooldown, then reconnected automatically once it recovers. Channels with several providers, like SMS and Kakao, fail over to another provider at the routing layer.

Messages that fail are retried, not thrown away. A naive retry risks sending the same message twice, so every message carries a unique key and is handled idempotently: if a key that already went out comes back, that one is skipped. A retry never turns into a double-send, and an alert that should arrive does not quietly go missing.

You never see any of this on a normal day. It surfaces only when one channel fails: the whole campaign keeps running, and that one channel gets skipped for a moment.

Handling an external channel failure. A circuit breaker cuts a failing channel for a cooldown, and idempotent retries resend without duplicates.
Handling an external channel failure

A million events a minute, in real time

Handling more than 200 million users means managing that many push tokens and that much behavioral data. Tokens expire, devices change, subscriptions end. FlareLane captures those changes through change data capture (CDC) and reflects them in a query index right away, so a segment gets built with dead tokens already filtered out. Building and running that yourself costs real money and headcount.

A million events a minute is the same. What a user viewed and bought arrives live. Instead of collecting those events and crunching them in a later batch, a streaming pipeline processes each one the moment it lands. A user's last action reflects in the next send condition almost immediately. Behavior-matched messages going out on time depend on that speed.

Realtime event and data flow. User events go through streaming and token changes through change capture to refresh segments and the query index in real time.
Realtime event and data flow

Small teams need reliability too

Big numbers can feel like they only matter to big customers. But the queue-based pacing, the circuit breakers, the idempotent retries, and the real-time reporting all work the same regardless of campaign size. On a system built to take millions, a campaign of a few thousand gets its numbers tallied just as fast.

Performance view of sends, open rate, conversion rate, and revenue contribution by channel on one screen
FlareLane analytics

If your sending is climbing fast, it is worth checking now whether your current setup can take that scale steadily. Tell us your send volume and traffic below and request a check.

Is your send volume climbing?

Tell us your current volume and traffic, and we will check whether your setup can handle it reliably.

Contact us
FlareLane

FlareLane

Contents Team, FlareLane (FlareLabs, Inc.)

Written by people who've actually run CRM marketing and growth, not just written about it.


FlareLane is a CRM marketing solution that automatically delivers push, SMS, KakaoTalk, and in-app/in-web messages aligned with each customer's behavior and journey. From startups to enterprises, we help everyone design and run hyper-personalized marketing and customer journey automation with ease.