Overview of the data processing pipeline
Queue
, the data will be retried if a failure occurs.session
, pageview
, order
…), ensuring that the data is not duplicated in case of concurrent processing.Collector
, Queue
, API Servers
& Database nodes
can be scaled horizontally to handle more load.Queue
will automatically slow down the data ingestion if the API Servers
are not able to keep up with the load.Collector
.
All data sources (your website, apps, API…) are sending data to the Collector
, that pushes it to the Queue
, that dispatches it to the API Servers
for processing.
The API Server
does the following for every data point:
Persist the data
Database
to avoid any loss.Enrich or reject the data
on_validation
hook are called to enrich or reject the data (e.g: IP-to-location, IP filtering…).Validate the data
Acquire a lock
Reconciliate the user identity
Reattribute conversions
order
& sessions
events.Recompute user segments
Trigger workflows
order
is created).Forward the data
on_success
hook are called for further processing (e.g. sending the data to a CRM).Collector
should contain all the contextual information needed to heal the dataset.
In the following example of a web pageview
hit sent to the Collector
by the JavaScript SDK, the user
, device
and session
objects are attached to the pageview
, to make sure we can heal the dataset if they went missing.