What to Do if Infected With SEO Spam on WordPress

At Jetpack, dealing with different types of web threats and attacks is part of our routine. Most of the time, it ranges from collecting a malicious file and finding the attack vector, to providing assistance on restoring a website from the latest backup. But sometimes we enter a different dimension of really creative attacks, a dimension of inexplicable reinfections — we enter … the twilight zone.

Okay, I’m probably being over-dramatic, but bear with me as I set the scene for this mystery tale. Ready? Please join me on this trip to the realm of ghosts, spam, and search engines.

The malicious behavior

We found a website that was under a very interesting kind of attack. It first surfaced as an email sent by the Google Search Console: an uncommon URL (and a very suspicious-looking one at that, with a clickable URL inside) was listed as a top growing page.

Image showing spam inserted into a URL — Google’s Search Engine Tool’s Top growing pages

The website owner was a bit upset since behavior like this is often the result of an infection, but Jetpack hadn’t detected or warned them about anything. Plus, these pages didn’t even exist on the website when they checked, but were being indexed by Google anyway. Twilight zone intensifies.

As we checked for any suspicious files that Jetpack Scan may have missed (no security tool detects 100% of threats), things got even stranger. WordPress Core and plugins were intact: no injected file or scripts on the database. A couple of outdated plugins didn’t have any security fix on them, WordPress was one version behind (5.6), and the latest update didn’t list any big security fixes. There was nothing suspicious at all. No usual suspects, no evidence of attacks; not yet, anyway.

The next logical step is to check the access logs. Perhaps it could shed some light into this mystery. Would we find that we are facing a zero-day attack, or that we finally found a proof for the multiverse theory, and this website is only infected in Universe #1337? To the logs!

Code for a spam request — A request to this strange spam

Another strange spam request. Why does Bing like it? — It seems that Bing likes it too… But why?

As you would expect: nothing strange, other than a bunch of requests to those spam pages as you can see on the screenshots. And they were all returning a `200 OK`. So, the page existed somewhere in the time and space continuum, or … wait a second … do you see it now?

All those pages were pointing to the same location: `/?s=`, meaning that search engines (the problem was noticed by Google, but the requests are coming from Bing) were indexing search result pages. And why is that? The crawler doesn’t perform searches on the page, as far as we know, right?

Explore the benefits of Jetpack

Learn how Jetpack can help you protect, speed up, and grow your WordPress site. Get up to 50% off your first year.

Explore plans

The indexing paradox

The basics of how a search engine works are fairly straightforward if you are in the website business. There’s a robot (or automated script) that crawls web pages, indexes its content, performs some magic, and stores the queryable resources somewhere in the cloud.

With that in mind, we dug the logs a little bit more to see if any of those requests had any other clue, like a referrer, but no luck at all. All logged requests were coming from search engines. Luckily, Google Search Console had one of the referring pages in one of the logs.

Google Search Console review of the page — Google’s Search Console tool gave some hints for us.

Now I think it’s time to switch our Twilight Zone hat for a CSI hat and dig up some website bones to put under the microscope.

For the trained eye, it’s easy to see that the referring page URL belonged to a compromised website; luckily, we have well-trained eyes! The `index.php` directory makes no sense and it was probably added to confuse the website owner. Next, it is followed by another random directory and a PHP file with a random name, which is probably a loader that is getting the final payload: the `cargese4/cca442201.htm`, which is also random. All of these are characteristics of a link-farm malware infection.

A quick search on Google to see what was indexed for the referring site confirmed that it was indeed infected and serving SEO spam for a while. The site is for a food company in India but offering deals on SUVs in Japan — yes, that’s spam.

Japanese search results for spam attack — Search query bringing spam in japanese on indian sites.

But, none of the results linked to our friend’s website, so I decided to find if other sites were affected by this same strange behavior.

In order to look for more victims of this spam attack we, for educational purposes only, used our Google-fu knowledge to craft a search query which would return sites ending with .edu, which had `/?s=` in the URL, and the word “buy” in the title. And we got 22 results. Which is enough for our hunt.

Search results for spam pills — Using .edu and .gov sites to check spam infection filter out domains (like .com) that have been created only for link farming.

This is evidence that the reported site was not the only one affected; it seems to be a more widespread issue. We reflected on what could have made Google index those pages. How did the Googlebot reach them? Next step: backlink checkers.

There are several online tools that provide reports on backlinks to websites; the one we used in this research was Ahrefs, but other tools may reach the same results. Some of the malicious search pages are listed in the results, confirming we were on the right path.

Selecting one of those websites to check what was going on, we saw almost 5,000 spam comments, as you can see in the next screenshot (they should check out Jetpack Anti-spam). Every comment was linking to a website search page with spam in the query.

Comment section linking to spam results — Comments linking to spam

Catching the white rabbit

As I mentioned before, search engine robots don’t perform queries on websites’ pages. But, if it finds a link to it, it’ll be followed. And if the page doesn’t tell the automated script that a particular page is not indexable, it’ll add it.

Example of injected spam — Ye olde injected spam

This is a clever method of “injecting” spam on a website for spamming search engine results and increasing site page rank through low-effort link farming.

Now that we understand the problem, how do we tell search engine bots to avoid following links to search pages (or just refuse to index them)? The best way would be to make a change to WordPress Core, which would help to protect the whole community (if you want to report a bug or want to contribute with the code, please join us).

In order to avoid some unnecessary rework, we checked the WordPress Core trac and found this issue that was solved on version 5.7 but unfortunately didn’t make it to the changelog as a security issue.

I’ll quote the author, who described the issue better than I could (thank you abagtcs for the report):

Web spammers have started to abuse search features of those sites by passing in spam terms and hostnames in hopes of boosting the search rankings of the spammers’ sites.
The spammers place these links in open wikis, blog comments, forums, and other link farms, relying upon search engines crawling their links, and then visiting and indexing the resulting search results pages with spammy content.
This attack is surprisingly quite widespread, affecting many websites around the world. Though some CMS’ and sites powered by custom-written code may be vulnerable to this technique, based on preliminary investigation, it appears that — at least in the .edu space — the most targeted web platform by far is WordPress.”

This isn’t surprising when more than 41% of the biggest sites on the web are WordPress sites.

Closing the case

There are some good lessons to be learned from this incident:

The URL presented on Top Growing Pages is not well sanitized, so the spam URLs you see separated by emojis are actually directly clickable (hi Google friends, that’s on you 😉 ); unaware users could click on them and access unwanted content.
Some tweaks are needed by Google to avoid indexing clearly spammy pages. Based on the tool report, some clear pages were crawled and not indexed, while spam was added.
Attackers will leverage even the smallest opening on your system and we must be vigilant at all times.
Always listen to people and understand their problems. If we only checked the logs from our own tools, we wouldn’t be aware of this issue or be able to help to fix their site.
Keep your software up to date. Always.

At Jetpack, we work hard to make sure your websites are protected from these types of vulnerabilities. To stay one step ahead of any new threats, check out Jetpack Scan, which includes security scanning and automated malware removal.

And a hat tip to Erin Casali for highlighting this issue and helping with the investigation.