Skip to content

Retail Media and Personalization: The Infrastructure Behind the Data

A
abemon
| | 5 min read | Written by practitioners
Share

The promise and the plumbing

Retail media is the fastest-growing advertising segment in Europe. IAB Europe estimates the continental retail media market surpassed EUR 14 billion in 2024, with 25-30% year-over-year growth projected through 2026. The promise is irresistible: monetize your first-party data while delivering relevant personalization to customers. Amazon does it. Walmart Connect does it. Carrefour Links operates across France and is expanding.

Behind the promise sits a data infrastructure problem that most mid-size retailers underestimate. Personalizing in real time requires unifying behavioral, transactional, inventory, and preference data into a system that responds in milliseconds, not hours. And doing so under GDPR in an environment where every piece of customer data is a regulatory risk.

The CDP architecture

The Customer Data Platform (CDP) is the centerpiece. It’s not a CRM on steroids (though many vendors sell it that way). It’s a data infrastructure that:

  1. Ingests data from multiple sources: Point of sale, ecommerce, mobile app, loyalty program, in-store interactions (WiFi, beacons), email campaigns.
  2. Unifies profiles: Identity resolution. The same customer who buys online with an email, in-store with a loyalty card, and browses the app with a device ID must resolve to a single profile.
  3. Segments in real time: Not static segments recalculated nightly, but audiences that update with every interaction.
  4. Activates across channels: Pushes the segment or decision to the right channel (email, push, display, retail media) with appropriate latency.

Identity resolution

This is the hardest technical problem and the one that determines whether the CDP works or is just another data lake. Identity resolution connects disparate identifiers (email, phone, cookie ID, device ID, loyalty card, POS transaction ID) into a unified profile.

Two approaches:

Deterministic: Connects identifiers on exact matches (same email, same phone). High precision, low recall. Works well for registered customers.

Probabilistic: Uses indirect signals (same device, same IP, similar behavioral patterns) to infer two identifiers belong to the same user. Higher recall, lower precision. Useful for anonymous visitors.

In practice, a retail CDP needs both. Deterministic matching for the known customer base (typically 30-40% of traffic), probabilistic for the rest. The key is a configurable confidence threshold on the probabilistic model: too aggressive and you merge profiles; too conservative and you lose coverage.

Tools like Segment (Twilio), mParticle, or open-source solutions like RudderStack implement identity resolution out of the box. For retailers with specific requirements (offline POS data, legacy system integration), custom development on Apache Spark or Flink is more common than vendors will admit.

Real-time decisioning

The personalization that matters happens in milliseconds: which product to recommend when the customer opens the app, which banner to show during web browsing, which offer to send after cart abandonment. This requires a real-time decision engine combining:

  • Customer profile (purchase history, preferences, segment).
  • Current context (device, time, location, page being viewed).
  • Available inventory (don’t recommend a product that’s out of stock at the nearest store).
  • Business rules (margins, stock to rotate, supplier agreements).

The typical architecture:

Event (click, pageview, purchase)
  -> Stream processing (Kafka + Flink/ksqlDB)
  -> Feature store (updated profile)
  -> Decision engine (ML model + rules)
  -> Response (< 100ms)

The feature store deserves attention. It’s the layer maintaining customer features (average spend, purchase frequency, preferred categories, recency) updated in real time and available to the decision engine with low read latency. Feast and Tecton are open-source and SaaS options respectively. Redis as the serving backend is common for sub-10ms latencies.

Why can’t you solve this with a relational database and queries? Because personalization at retail scale means thousands of decisions per second, each combining data from multiple sources. A SQL query joining 5 tables to compute a user’s recommendation doesn’t scale to 10,000 requests per second. The feature store precomputes and keeps features ready for reads.

Privacy under GDPR

This is where retail media in Europe gets complicated. GDPR establishes clear restrictions on personal data processing, and a CDP unifying customer profiles is, by definition, a large-scale personal data processing system.

Critical points:

Legal basis: For personalization based on browsing behavior, you need explicit consent (opt-in). For personalization based on purchase history with your company, you can argue legitimate interest, but the line is blurry and depends on the national data protection authority. Practical recommendation: obtain consent whenever possible.

Data minimization: GDPR requires collecting only data necessary for the declared purpose. Collecting “everything just in case” is not legal. Define exactly what data each personalization use case needs and discard the rest.

Right of access and erasure: Customers have the right to know what data you hold and to request deletion. Your CDP must be able to export a complete profile and verifiably delete it. This has serious technical implications: deletion must propagate to all downstream systems, not just the primary database.

Data Protection Impact Assessment (DPIA): A retail CDP almost certainly requires a DPIA. It’s a legal requirement when processing involves large-scale profiling.

The architecture must incorporate privacy-by-design:

  • Consent management integrated. The CDP must respect consent preferences in real time. If a customer withdraws personalization consent, segmentation must exclude them immediately, not in the next nightly batch.
  • Automated data retention. Browsing data: 90 days max. Purchase history: what the contractual relationship justifies. Inactive profiles: anonymize or delete after the defined period.
  • Encryption at rest and in transit. Pseudonymization of personal identifiers in the data pipeline.

The practical stack for mid-size retail

For a mid-size retailer (50-200 stores, ecommerce, app), a realistic stack:

LayerPragmatic optionAlternative
IngestionSegment / RudderStackCustom Kafka
CDP / IdentitySegment Unify / mParticleCustom on Spark
Feature storeRedis + batch refreshFeast
Decision engineRules + basic modelPersonalization SaaS (Dynamic Yield, Bloomreach)
ActivationAPIs to channels (email, push, display)CDP native
ConsentCookieBot / OneTrustCustom CMP

Total stack cost for a mid-size retailer runs EUR 80,000-200,000/year in software, plus the data engineering team to operate it (2-4 people).

The most common alternative for retailers who don’t want to build: integrated personalization platforms like Dynamic Yield (Mastercard), Bloomreach, or Algonomy. They cover CDP, decision engine, and activation in a single product. The downside: less control over data, vendor lock-in, and cost that scales with interaction volume.

What’s coming

Retail media is evolving fast. Two trends directly affecting infrastructure:

Data clean rooms: Spaces where retailers and advertisers share anonymized data to measure campaigns without exposing individual records. AWS Clean Rooms, Google Ads Data Hub, and Snowflake Data Clean Room are the main options. For a retailer with retail media ambitions, the ability to participate in clean rooms will be table stakes within 2-3 years.

Generative AI for personalized content: Not just deciding which product to recommend, but generating the copy, image, and layout personalized for each segment. Requires an additional content generation layer connected to the decision engine. Still nascent in European retail, but the large players are already experimenting.

For a broader view of how to unify omnichannel data, see our article on data architecture for omnichannel retail. The data infrastructure you build today for personalization will be the foundation for retail media tomorrow. Investing in identity resolution, real-time processing, and privacy-by-design isn’t optional. It’s the minimum bet to compete in a market where first-party data is the most valuable asset.

About the author

A

abemon engineering

Engineering team

Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.