A common complaint I hear from clients is that they feel like their ads are running on “garbage” publishers when buying programmatically. Setting aside quantitative measures like campaign performance, it does raise an interesting question: How do you know you are buying from legitimate publishers?
In our previous discussions of ad fraud, we examined its many forms, why it exists, who should be responsible for stopping it, and how marketers can protect themselves from it. Up until now, we haven’t discussed the distinction between outright fraud (i.e., ads without a chance of being seen by human eyes) and simply low-quality sites.
Today, I am going to teach you a skill that every programmatic ad buyer must possess: The ability to vet a publisher to determine if they are legitimate. If you have the ability to do this, you can blacklist suspicious publishers from your campaigns, and optimize ad spend towards legitimate sites. If you don’t have this ability, you are effectively at the mercy of any inscrutable player in the ad tech ecosystem.
As I’ve explained previously, the whole programmatic advertising ecosystem is an ‘open’ one, meaning that almost any advertiser or publisher is free to participate in the marketplace. On the one hand, it’s a democratizing concept. On the other hand, it also creates an opportunity for bad actors to take advantage of the unsuspecting or unskilled.
In this article, we’ll investigate one specific publisher chosen at random, using tools and processes easily available to most ad buyers. The goal is to provide you with a framework to conduct your own investigations going forward. That way, you can take comfort that the publishers you are buying from are legitimate.
Step 1: Discover a site that doesn’t “feel” right
The first step is to look at your campaign reports that break down performance by site or domain. Like, really look at it. Are they household names, like cnn.com or msnbc.com? There might still be issues with the impressions you are buying from them (the topic of a future article), but you can at least rest assured that the publishers themselves are “legitimate”.
But, what about a domain like lifehacklane.com? I’ve never heard of it, but according to SiteScout (a self-serve DSP that anyone can sign up for and use immediately, with an Inventory tool that is useful for research and planning), it gets 8.2MM impressions per day. This seems quite high for a site I’ve never heard of.
SiteScout also lets you see what exchanges the publisher is using by typing the name into the search box:
Lifehacklane.com uses Sovrn, Pubmatic and Pulsepoint as their supply side partners. If you have been around the ad tech industry for any length of time, you’ll know that these ad exchanges are considered somewhat second tier. I wonder why a site with that much volume isn’t working with Index, Rubicon or OpenX?
On a side note, if you click on the gear icon in the Actions column in SiteScout, and choose “Placement Details”, you can see a breakdown of the various placements available on the site:
This list provides a good reference for when we finally check the site out ourselves, to see if we can find these placements, and if everything adds up.
Step 2: First impressions (aka the “sniff test”)
Now that we know a little bit about the kind of inventory available from this specific publisher, including their ad placements and their exchange partners, we can visit the site and check it out for ourselves:
At first glance, it looks like a generic-looking, click-bait website.
Virtually all of the content on the site is a “listicle” – an article presented in the form of a numbered list. It’s called click-bait because such headlines often get clicked on the most. People love reading lists. So, when these articles get shared on Facebook, for example, they can drive good amounts of traffic.
(Note: The fact that almost every article follows the same format is another clue. Later on in this guide, it will help us explain the business model of this site.)
Aesthetically, the site appears to be using a generic theme. And, the logo looks like it was made in less than 5 minutes. This isn’t necessarily indicative of legitimacy (ever look at drudgereport.com?), but it raises some additional flags.
The next thing I look at is the content of the articles and their meta-data. This is what you see if you click on one of the prominently featured articles:
The author of this article is apparently named Vincent. But he has no last name, and when you hover over his name, it links back to the homepage, which is very odd. Any legitimate publisher would link to an actual author page with a list of their previous articles and a brief bio with, of course, a full name.
Moreover, if you try reading the first paragraph, it’s very clear, to me at least, that the person who wrote it is not a professional writer. Instead, it seems like it was written by a 12-year-old, or by someone to whom English is a second language.
Is this alone indicative of illegitimacy? Not necessarily. But, it does suggest that the site is a cheap content farm, and definitely raises a few more flags.
Moving along, I next check out the “About Us” page. This is oftentimes a dead giveaway that the publisher is questionable.
This page looks like it was written in less than 5 minutes as well. While it’s not a smoking gun, it does tell you a bit about how much was invested in creating the site. It also communicates how much the site owner cares to divulge about themselves and the story behind the site.
Lastly, if we then check out the “Contact Us” page, we finally get some concrete information to go on:
We now know that the company behind lifehacklane.com is ‘eSelling Limited’, based out of the UK.
This information is also found at the bottom of the “Terms and Conditions” page, which is another handy location that often includes either the company name of the owner or their law firm.
Step 3: Conduct some basic due diligence
If we do some quick due diligence on Google, we can find publically available information about the business and its director(s):
Here we can see that eSelling Limited was registered around 3 years ago, and is owned by Mr. Igor Lugovkin, a Russian national living in London. Great, a real person! That is a tally in the “legitimate” column, right? Well, if we do a quick search on Google for Mr. Lugovkin, we find this:
He is an “entrepreneur” in the UK with 0 connections. What kind of entrepreneur has no connections? Maybe he isn’t a real person, after all…
Doing a WHOIS lookup on the domain you’re investigating is another valuable data point for conducting due diligence. A WHOIS lookup will tell you when, and to whom, a domain name is registered.
If you are on a Mac, simply open up the Terminal app and type in “whois domain.com”, where domain.com is the site you’re investigating. (For Windows users, use the ICANN WHOIS website.)
For lifehacklane.com, we get this:
Seeing “Domains By Proxy” tells us that privacy protection is enabled on the domain. So does the registrant name, “Registration Private”. This isn’t evidence of any wrongdoing; there are lots of legitimate reasons to obscure the identity of a registrant. But, in light of all of the foregoing, this is yet another negative signal in our overall evaluation criteria.
When you combine this information with the sparse “About Us” page, and lack of author information on articles, it starts to paint a picture of a publisher who does not want to be identified or associated with the site. Why would that be?
Step 4: Audit website for suspicious activity
If this website wasn’t suspicious enough for you by now, let’s take our investigation to the next level by looking for signs of suspicious behavior on the site itself.
First, let’s look for the actual ads. If we load up an article, say, “17 Facts That Could Save Your Life”, and we don’t use an ad blocker, I can count 4 visible ad units on the page: two 300×250 placements, one 728×90 placement:
(All highly targeted ads.)
Then we have one 3×3 “native” advertising block at the bottom of the page from Content.ad:
(These ads, not so relevant.)
If we cross-reference this information with the placement stats we gathered from SiteScout earlier in Step 1, something clearly doesn’t add up. Remember these placement stats:
In total, SiteScout shows nine banner ad placements for lifehacklane.com, but we only see three when we view an article. How do we account for the other six?
Within 10 seconds, the number of ads and trackers blocked jumps to hundreds!
A minute later, the number of ads blocked blows past 1000.
I’ve actually never seen this happen before. If we had the time and the inclination, we might want to investigate why this is happening. Who knows what we might discover. But that’s a time-consuming endeavor, and this is meant to be a basic investigation process for ad buyers of every level.
At this point, let’s just say that we have yet another red flag to cause us concern.
Step 5: Examine upstream traffic sources
If we run lifehacklane.com through SimilarWeb search, a few data points stand out as insightful:
Here we can see that display advertising accounts for almost half of their traffic. This means they are paying for a good portion of their traffic — a typical sign of arbitrage (i.e., buying traffic for cheaper than you can sell it). If we go even further, the picture becomes clearer:
Okay, so the majority of that display advertising is actually from content syndication (aka “native”) ad platforms like Taboola, OutBrain, and Yahoo. They are paying to get users to their generic, click-bait site. Upstream site data from Alexa also confirms this:
Now the daily traffic volume starts to make sense. But let’s go deeper…
Most of the top referring websites are also of the “viral” category. These kinds of sites are not known for driving organic referrals, so this could be a mistaken categorization on the part of SimilarWeb. These are probably just websites that use “native” ad platforms like Taboola and OutBrain for monetization.
By looking on diply.com, one of their top referrers, we can confirm this to be true:
The second ad here in the Revcontent widget is a LifeHackLane article. So, the actual percentage of traffic to lifehacklane.com from native ad platforms is probably closer to 60 percent.
Here is how these “native” or “content syndication” ad platforms work: They charge advertisers (LifeHackLane, in this example) on a cost-per-click (CPC) basis. Whenever CPC is the pricing model, the ads with the highest click-through rates usually receive the most impressions and the cheapest clicks, because they make the ad network (Revcontent, in this example) the most money.
Therefore, it’s not surprising that almost all the ads in these widgets use click-bait headlines and images. In fact, it’s the only way to make the economics feasible for publishers like LifeHackLane.
Remember earlier in Step 2 when we wondered why all the articles on LifeHackLane were listicles? This is why. They have to be listicles in order for them to perform well enough (i.e., have a high enough click-through rate) on these types of ad networks. Consequently, they can acquire traffic in the most cost-effective way possible, so they can extract the most revenue when selling that traffic on their sites.
This is the arbitrageur business model.
It’s not fraud, unless of course the traffic they acquire is non-human, non-viewable, or uninitiated by the user. (This is oftentimes the case with the cheapest traffic sources, which makes arbitrage a slippery slope.) In today’s case, we are simply dealing with low quality.
We can confirm this if we evaluate lifehacklane.com through automated ad verification tools like Pixalate and Moat. Using these tools, lifehacklane.com returns 4% and 0.6% IVT (invalid traffic), respectively. Both of those numbers are below average levels. Therefore, according to automated ad verification tools, there is nothing fraudulent about the site.
Food For Thought
We’ve completed five investigatory steps, and while there is no smoking gun to indicate that lifehacklane.com is illegitimate, there are enough flags raised that I would be asking the following questions:
- As a marketer, would you want your ads showing up on this site?
- As a demand-side platform, would you want to see a site like this coming from your supply partners?
- As a supply-side platform, is this the kind of site you want to be allowing into the marketplace?
- And for all the players involved: Is this a publisher you want to support financially?
I understand these are value judgements. Objectively, lifehacklane.com, and others like it, may very well perform for marketers from a conversion perspective. They might not. My point is: There are secondary effects, that may not be obvious at first thought, to having such sites in the advertising ecosystem.
Digital advertising is a zero sum industry. Sites like this, while they may not be outright fraudulent, are taking a slice of available advertising budgets. And there are dozens, if not hundreds, of sites like these. From whom are they stealing that share? Legitimate, struggling publishers – publishers doing real work, with swaths of employees to pay, among other expenses.
By manufacturing page views under the guise of “viral” and “native” advertising networks, inventory supply is expanding to the point where, naturally, CPM rates are forced downwards. And, as a result of this extraneous inventory, all the struggling publishers – who employ real journalists – are now receiving an even smaller share of the advertiser’s wallet.
Performing this type of qualitative investigation into publishers is not very efficient. It can be very time-consuming and, as a media buyer, unrealistic to perform at scale for hundreds and thousands of sites. However, it can still be valuable to “sample” random publishers sold to you, and do this kind of research.
If you check out 10 or 20 sites that you’ve bought on, and they are all clean, then no worries. But, if a significant number raise flags, then it might be time to have a talk with your programmatic inventory supplier.
While you are at it, you may also want to ask your inventory supplier about the types of due diligence they perform when vetting new partners. Naturally, automated methods are superior at scale, but they should be augmented with human eye verification and qualitative investigation. It might not be realistic to vet every site, but this is the promise that SSPs and exchanges are providing, and they need to do their duty.
Ultimately, the investigation method outlined in this article is just a basic guide that anyone can follow. There are more advanced methods of investigation, but communicating them would give too much information to fraudsters, allowing them to adapt, thereby making my job harder. But, if you’re a media buyer looking to vet a publisher and have some questions, feel free to reach out.
To download a printable PDF cheat sheet that outlines and summarizes this process for investigating publishers, click here.
P.S. — If LifeHackLane or Mr. Igor Lugovkin are reading this, and believe I am mistaken in my analysis, please reach out to me. I would be happy to present your side, as it will help my readers in their assessment.