The impact of privacy on addressability is creating a new reality in digital advertising powered by publishers and their ability to scale data.
As regulators and browsers increase their focus on privacy in digital advertising, the ability of adtech vendors to collect and apply user data across domains is decreasing. The result is that 70% of audiences on the open web are hidden, as those users do not have a cross-domain identifier.
Brands that spend most of their budgets on the open web will need to seriously consider alternatives as they can only reach 30% of audiences today – diminishing their market share and brand equity.
Publishers are in a very different position. They have a first-party relationship with every user who visits their properties, including passer-by users, allowing them to address 100% of their users and understand their behaviours and preferences in the moment.
Publishers are becoming the new generation of data providers for audience targeting. This recognition goes beyond publishers’ ability to collect data in a privacy-compliant way and to recognise all users to create endemic and non-endemic audiences. It’s about their ability to model out niche and hard-to-scale datasets, including high-quality, self-declared ones.
Scaling First-Party Data
Modelling isn’t new to the ecosystem. For a long time, traditional data providers have taken high-quality datasets and applied their proprietary modelling to make them scale for digital advertising applications. As those data providers are losing their capability to do so, publishers, equipped with the right tools, are stepping up.
There are three types of modelling that publishers and brands are exploring to scale data and increase audience reach.
Lookalike modelling is the most prevalent modelling technique and is a powerful tool to extend an interest, behavioural or intent-based audience.
There are many use cases for which lookalike models work very well, including when you want to find users that look similar to existing customers, similar to those that have engaged with a specific campaign or those that have shown a specific interest or intent.
An example is Gumtree. Over the past 12 months, the publisher served over 22m impressions to ʻlookalikeʼ and ‘clicker’ audiences (users who clicked on other ads), extending campaign reach and improving the performance of campaigns.
However, there are use cases where brands seek distinct groups (classes). In this case publishers can use classification models. Here, a single model takes multiple seed audiences and, therefore, knows all possible classes for a specific category or attribute.
For example, users who are labelled as ‘in a relationship’, users who are labelled as ‘single’ and users for whom no relationship data is available.
This has two main benefits: The model is specifically designed to distinguish between those classes (rather than just finding similarities), and the model makes sure a user is only bucketed into a single class, i.e. not categorised as both in a relationship and single at the same time.
Another powerful type of modelling is event predictions. These models give you a prediction of how likely a user is to perform a certain action. This could be a user clicking on an ad, a user purchasing a subscription, or a user clicking on an affiliate link.
Millions of Users in Real-Time
To take this a step further and operationalise these models – making them scale to millions of users in real time – not only requires machine learning expertise, but also the right system design.
It starts with seed data: A model can only be as good as the data you feed into it. For publishers, this can be interest data that they collect from user interactions; it can be declared data from registered users or surveys, or it can be declared data from data partners.
Generally, the larger the seed dataset is, the better the model can be trained. However, you don’t want to compromise quality for quantity.
Sourcing the right seed data and building the appropriate model are fundamental steps. But scaling the model to millions of users and making real-time predictions is the hard part.
Real-time predictions are crucial for publishers: If you get your predictions in a nightly batch job, you will always be one step behind. This also means you’ve lost your chance to address all those users who have one session and then never return to your site.
If you want to maximise scale and quality, you must make predictions in real-time. This requires you to think about feature engineering, meaning the aggregation of raw events into a format (user state) that a model trains against, which is then used to make predictions quickly. Without this aggregation step, it will be hard to achieve real-time predictions at web scale.
The Opportunity Ahead
Equipped with the right knowledge, tools, and expertise in machine-learned modelling, publishers can get greater value for their audience than ever before while paving the way for the future of audience targeting.
As traditional data providers are losing their capability to address audiences on the open web, publishers have the opportunity to receive 100% of the revenue for the value they create.