
These are MI5 products. They’re are both produced using “secret recipes”. Employees involved in their manufacture probably sign a commercial equivalent of the Official Secrets Act with their contract and are sworn never to divulge the ingredients and techniques used in the making of these products.
And now of course, we have a 21st century online equivalent of these culinary mysteries; the search algorithms used by Google to sift and rank the world’s websites.
Seas of SEO consultants claim to be able to “beat Google” and increase your site’s page rank. They too have their own SEO recipes; buy more back links, get listed in big web directories, optimise your HTML, don’t use java or pictures and remember that meta tags are passe. For some, ‘beating’ Google is an interesting intellectual puzzle. For others, it’s a chance to fleece the unwary. So what is Google’s secret recipe and is it really possible for an SEO consultant to beat it?
My guess is that Google runs what is, in effect, a very complex credit scoring model and applies it to the web sites of the world just as a retailer applies credit scoring to prospective card account customers. Think back to when you last applied for a credit or store card. You were asked where you lived, your DOB, your income, where you worked, if you owned your home, if so, for how many years, whether or not you had other credit cards or other debts and so on. This process is about collecting statistical data about you to predict how you are likely to behave as a customer. The store or the credit scoring company don’t know you as a person so they have to collect information that is representative of you, your financial behaviour and, ultimately, your financial trustworthiness.
But collecting this data is just the start of a complex process (you can do a PhD in Credit Scoring in the School of Computer Sciences at Edinburgh University). To start with, not all your answers are treated as equal. For example, where you live may be three times as important as your age in determining your credit worthiness. And your income may be twice as important as where you live. All your answers are individually weighted and then aggregated to give an overall score for your credit application. So if you fail against an unimportant factor, your score may not be significantly reduced. But if you fail against a highly weighted factor, that could fail you regardless of all your other answers.
There are other factors too. Companies must continually adjust the way they score applicants because the nature of the applicant market ‘pool’ may change, or because the firm may have changing volume targets to deliver from within a fixed pool - necessitating a change in the way they look at those applicants within the pool. Statistical rules may need to change.
New techniques are being developed to reflect the dynamic nature of the task. Models are now capable of “meta learning” which means that they can run multiple learning applications and then take the most important learnings from the learnings developed by individual sub-models. This is where we get into the territory of machine learning, soft margins, non-linear regression and hyperplanes. These models are far more complicated than the recipes for certain spicy foods or fizzy drinks.
If you’re a webmaster, then it’s highly likely that your site is being scrutinised by Google using these sorts of data mining techniques. If you come across a search consultant who claims to have discovered how to “beat Google” and “improve your page rank dramatically” then bear in mind that they must have cracked the sophisticated statistical model that Google uses to prioritise pages.
So is Google really the equivalent of the HP or Worcestershire sauce of the 21st century? Well the answer is both yes and no. Yes, it’s certainly secret, but no, it’s far more complicated than the process of manufacturing a spicy sauce.
Update 22 August 2007:
Under the term “Digital Media+Web 2.0″ Teqtonic.com is ranking 11 out of 49,000,000 returns on Google. Unfortunately, that keeps us at the top of page 2, which is not ideal. But we’re working on those last 11 places….