Google Penguin Examination & Conclusions

Written on:May 4, 2012
Add One

There seems to be a huge and growing confusion over this latest Google update, which reared its head at the end of April 2012.

And it’s taken longer than usual to get a handle on what’s going on, as this is one of the most ‘arbitrary’ updates ever. SEO’s everywhere are pontificating over the possible rule changes that have been applied. In many tests, where one SEO could make an argument for it being about one particular set of factors, another set of tests can completely contradict that conclusion.

The bottom line is that many ‘experts’ are working with far too narrow a sample space, and essentially basing their opinions on minimal data and hearsay. So, in an effort to try and at least clear up some of the confusion, I’ve decided to present some of our data and conclusions. These conclusions run parallel to and are corroborated by several other major SEO testers I regularly work with, who also operate with large-scale data of tens of thousands of sites.

Link Profiles

All the sites/pages we found that lost rankings from the Penguin update had at least 50% of its anchor texts using ‘exact match’ main-term phrases. (We also saw plenty of comparative sites that didn’t get hit due to a high concentration of exact-match anchors, but this always happens; they just haven’t been unlucky enough to be caught yet! Which goes to prove yet again how ineffective Google really are at detecting these things algorithmically. It’s very much a ‘crap-shoot’ as to whether you’ll be discovered.)

But the over-riding factor here is that we couldn’t find hardly any negatively hit sites which had a truly diverse anchor text profile. The more diverse the profile, the less likely that the site was hit by Penguin. Where all the site’s pages had less than 1/3rd of its anchors from that page’s key term, we saw no obvious ranking drops at all.

Considering this, it seems blatantly clear that anchor text profile remains one of the key factors of this update. We saw a lot of damage during Panda 3.3 for the same reason, but the ‘tolerances’ have shifted considerably. It may be that Google ‘turned up the sensitivity dial’ a bit for this update, as many sites that escaped Panda 3.3 were caught by this.

This is probably one of the many reasons that we’ve seen so many good-quality ‘white-hat’ sites being hit. If you don’t have that many links and you’ve been using precise/fixed anchor texts as part of your article/press-release strategy, then your percentages are possibly going to fail here. It certainly goes to show how sticking to ‘white-hat’ methods is absolutely no guarantee against the vagaries of Google’s changes.


We also saw some evidence that ‘power-links’ from highly relevant ‘themed’ sites now had slightly increased ranking effects. And this makes sense from Google’s perspective. Since anchor texts are so easy to manipulate, it could be that Google has widened the gap in ranking-power between links from non-relevant sites as compared to links from tightly-themed sites. By trying to classify a site into a theme, Google is attempting to assign a different ranking value based on site topic.

This is certainly NOT to say that ‘non-relevant’ links don’t work or aren’t required; it just appears that Google has possibly either ‘turned down’ the ranking-effect of links from sites that it determines as non topic-relevant, or ‘turned up’ the power of closely topic-themed links. We certainly haven’t seen any indication of penalty or negative-ranking where links are achieved from non-relevant sources; and this would seem almost impossible to implement, as it would hit virtually everyone (including the big corporate sites, who are the only customers that Google pays any attention to, being their main AdWords revenue stream.)

And the bottom line is that there simply aren’t enough topic-relevant sites in existence to provide us with the IP spread, diversity and volume we need in our link-building. There’s no way you can compete commercially with the quantity of links available from on-topic sites. You need these other links to keep your ‘glass’ full!

I believe that content relevance is one of the biggest general misconceptions to come out of the Penguin update. The example shown by Matt Cutts on the Google blog has made some people think that they can’t have links from non-relevant content. (And always bear in mind that they take every chance they can to misinform SEO’s and cause confusion.) But the 1,000 sites that we track in Backlink Banzai, which have been there for 2-5 years, have been consistently drip-fed links from non-relevant content, and none of them have been affected by Panda and only a handful by Penguin. And the small quantity that dropped with Penguin were all sites that we were pushing the envelope with after the panda update, to investigate and test the threshold/limits. So as soon as Penguin hit and ‘moved the bar’ they got slapped a bit. These were just part of the indicators of the 50% threshold we discussed above.

Moving Forward

I could possibly argue now¬†though that with your more expensive link-building purchases from quality private blog networks etc, you might want to start looking for somewhat relevant topic sites; and preferably sites that don’t have as wide an overall keyword/topic distribution, so that Google finds them easier to ‘classify’ according to theme. I can see private networks springing up in the near future which are topically themed for a small quantity of power-links added into your mix.

This also further argues the point for themed manually-built asset-quality 2.0’s and buffer sites. And by inference; building your own mini private networks of web assets. By feeding volumes of links into these and then using a wide variety of themed keywords to pass the juice onto your key pages, you should benefit from all sides of this update. And ideally, you should use a different outbound anchor on every buffer site you build, so that you don’t run the risk of over-optimisation with your key pages.

But the continuing message here is to keep dealing with things as I recommended in my last update on dealing with Google penalties and ranking drops. Keep building (or ‘start’ building for many of you!) highly diverse links, don’t contact Google, keep adding fresh content & start expanding your web estate incrementally. My last few posts have covered these areas in detail, so I won’t reiterate further.

In terms of Backlink Banzai and Wiki Whirlwind, many people should consider using our 100% Naturalisation option, to help drive down the anchor concentration of their overall link profiles in a shorter time-frame.

Penguin (What Is It With All These Black & White Animals?!)

Understand that Penguin is purely algorithmic in nature; there wasn’t an increase in GWT Notices and Matt Cutts is openly telling people not to file ‘Reconsideration Requests’. This means that there’s no manual ‘penalty’ that can be removed; it’s a fundamental change to Google’s rank ordering algorithm. The good news on the back of this is that any corrective changes you achieve should reflect reasonably ‘soon’ in the results.

If penguin was really ‘all about content’, and not linking, as some SEO’s seem to be postulating, then why are we seeing Amazon and all the big corporate sites ranking all over the place along with loads of obscure exact-match domains featuring no content? Why? Because these big sites don’t build back-links like IM’ers do, and so what little linking they have is very natural and their ranking mostly comes from their overall authority and the on-page content, page titles and headers. All the obscure sites we’re seeing have either done no SEO or very little SEO, and so haven’t been hit by any of the algorithm changes; and since they’re doing nothing ‘wrong’ they don’t get any manual penalties either. And they’re now ‘left’ there at the top because most of the sites that HAVE been building back-links for the last few years have been doing it according to the old mass keyword-linking rules, (which have been completely misunderstood for a long time,) so their link-profiles are generally awful. In many niches, it’s almost the case of ‘last man standing’!

As another example, how come loads of 2.0’s (and we have tens of thousands of these) are out-ranking loads of ‘proper’ sites at the moment? Many of these are built on reasonable article-quality content, but are tiny in comparison to a proper site. And the link profiles of the tens of thousands of these that we have is predominantly from off-topic content and linking. But they do have hugely varied inbound anchor text profiles; so they pass the Google algorithmic filters and end up out-ranking larger, better sites simply because they have the right link-profile and they’re sitting on high authority domains.

Studying At The University Of Diversity

Diversity is generally always the key, as it creates that natural looking randomness and chaos that’s required. We aim for 100% variety in our link-profile. We never achieve anything like this of course, but to have this as a goal means you never end up with large percentages of the same anchors.

We hear people talking about their ‘diverse’ anchor profiles all the time; and how they use highly varied terms. Then we look and see that what they mean is they’ve used 5 different terms for the page, and the majority of their links were to 1 or 2 of those terms, and maybe they’ve added in a few ‘click here’ generic terms somewhere and a few URL/brand anchors (if they’ve really pushed the boat out!)

When I talk about anchor diversity, I mean starting with 3-5 root keywords and beefing that up to 10-15 good LSI/long-tail terms. Then each of those is multiplied by 100 by adding 50 different prefix modifiers and 50 different suffix modifiers. So now we have 1000-1500 different anchors with a central ‘keyword mass’ themed towards our required terms. Then I add in 100-200 different generic terms, plus the bare URLs and company/brand names etc. And all that gets mixed up and randomly used. Now THAT’s a mixed anchor profile. (By the way, this is what the Naturalisation options in Backlink Banzai and Wiki Whirlwind are for. You just provide the 10 good LSI/long-tail terms.)

And that’s just considering anchor diversity; so ignoring the obvious topic of IP diversity, (which surely everyone must realise by now is pretty damned important,) what about variety in the CMS/Platform/link-type or the submission timing?

We see people firing up SENuke and asking whether they should blast 500 or 1,000 links today?! And so we look at the content they’re using, as well as the URLs and anchors. And they’re hitting 3 different URL’s with 2-5 different anchors each on a 500 link submission run. And that’s all with one simple spintax’ed article with barely any variety in the title or bio. This has ‘footprint’ written ALL over it! Over the coming weeks, Google’s link graph is going to see hundreds of links built within minutes of each other using thematically similar content & titles and all using just a few URL’s and anchors. You may as well wave a big red flag, and shout, “Over here!”

If you want to do larger scale link-building runs then you’d better be sure that you’re using loads of URL’s & domains with a massive variety of anchors, and that your content, title and any bio is hyper-spun; thus removing the ‘points of connection’ between the individual elements of the blast. This is one of the ways Backlink Banzai and Wiki Whirlwind have a huge advantage; we have a zero submission footprint, as we’re posting with tens of thousands of different URL’s and anchor text’s each day. There is no point of connection anywhere.

Plus we use over 250 different content structures. And I don’t mean 250 article constructs ; I mean 250 different themed paragraph databases; with massive spin constructs built randomly from these.

Old-Style ‘Spinning’ vs Modern ‘Re-Authoring’

You see, some people still see ‘spinning’ as just a bit of spintax – those {curly} braces we all love to hate. But modern spinning has to be SO much more, and we call it ‘re-authoring’ to highlight the differences.

You need multiple paragraph substitution & reordering, and multiple sentence substitution, as well as sentence based word spintax. It’s what we call 3D or hyper-spinning, and it’s the only way of yielding highly readable but massively varied content. We were pioneering and using this method back in 2006.

Using our basic guidelines in the UberToolz ‘UberCubez3‘ module for example, you yield around 69 billion permutations per paragraph; not per article – but per paragraph. And you should always judge permutations at paragraph level; not article level, as it’s ridiculous to call 2 x 500 word articles ‘different/unique’ if they just have a few minor spintax differences in one or two of the paragraphs; which is what happens once you start to produce hundreds or thousands of outputs. Uniqueness fades with each iteration; and it’s why the standard article ‘uniqueness’ percentage is a completely useless number, as it offers no insight into the volume of uniqueness achievable (which is the most critical element.) But I’ve written separately on this subject.

Some Final Conclusions

One very worrying effect of this update, is that it renders negative-SEO even easier. Since Google seems to be working with narrower tolerances on anchor-text profiles, it gets even easier for some unethical person to step in and blast your site with main-term anchors from content-less sources, thus pushing your percentages into the red! This is one of the main areas that I hope Google will address very soon.

But the most glaring effect of this entire update has been yet another reminder of why everyone needs to diversify their sales funnels to protect their income. It has never been more important to build and maintain volumes of micro-sites, 2.0’s and other authority buffer-pages. By channelling the majority of your link-building through these, you massively reduce your chances of any negative feedback, reduce direct-linking (which is where all penalties come from) and protect yourself by virtue of sheer scale.

I, like many people, am hoping that Google will reverse many aspects of the Penguin update, or at least make some pretty major tweaks, but I’m not holding my breath. I personally feel that Penguin has trashed the search results. Not from the perspective of an SEO or a marketer; as I’m NEVER happy about their changes from that front. But as a user, I’ve finally and voluntarily moved over to Bing; not because I want to make a ‘statement’, but because I’m actually getting much better personal search results from it. They’re always relevant and provide a good diverse range of options and domains.

Over time, I think we’re going to see this more and more. Google have finally started to break the things that really matter; and that’s the relevance and basic ‘usefulness’ of the search results. Users will vote with their searches and start to try other search engines they hear recommended. In the same way that countless other large corporations have in the past, Google have forgotten what their customers want and need, become too big for their boots, and started to think that nothing they do will harm them. Time will tell!

Good luck with your SEO…

Be Sociable, Share!

One Comment add one

  1. Gerrid says:

    Awesome post! Thanks, Jason. I also enjoyed your interview on

Leave a Comment