SEO and Talk: The Non-Story

In the past few weeks, there have been a number of stories presaging a shift in how Google and other search engines rank content.

The contention is simple enough: Current search engine technology is limited because people game the system. Lately, people have been gaming the system a bit too successful; people have documented the unreasonable success of both Demand Media (recent IPO) and The Huffington Post (just sold to AOL).

“Gaming the system” is, in the system, called “Search Engine Optimization.” And, – due to the recent scrutiny -  a number of people are taking the opportunity to forecast the death of search engine optimization.

Farhad Manjoo makes the argument in Slate that content farms are doomed an ignoble death. According to him, the problem is that “Google’s weaknesses aren’t permanent.

There are a few problems with this forecast. To illustrate them, let me back up a bit and discuss search engines and content farms.

Originally, a search engine existed to answer the question “Which pages on the internet have to do with my query?” But, as their power and success has grown (and by “their” I mean “Google’s”) so has the scope of their ambition.

Now, a search engine exists to provide a (satisfactory) answer to a searcher.

The history of search engines means that this has taken the form of links (10 to a page!) and, occasionally, in-line answers (Bing’s “Instant Answers” and the like). The current form of search engine results as a set of blue links casts the search engine strictly as an intermediary between the searcher and the answer.

The history of search engines also lead to the rise of content farms.

From an objective perspective, content farms fulfill the same goals as search engines. That is, a content farm aims to have a page ready for every question an individual may have. Whether it’s How to Train My Hamster, How to Kill Flies in Your House with Mushrooms, or How to Set Up a Super Bowl Party, eHow has a page ready.

How do they know what pages people want? Well, it’s pretty simple: they mine search engine data. (Well, plus some guesswork and interpolation).

It’s a pretty ingenious idea: people are looking for things on Google, so why not create the very things they’re looking for? You already know what they’re looking for, since they’re already searching.

Really, it’s a wonder search engines didn’t think of the idea themselves (Oh, wait, they did).

Now, due to revenue concerns, most of the content in content farms is crap. You pay very little, and then reap in the advertising revenue. As the advertising revenue scales with how much traffic you get, and traffic comes almost exclusively from search engines, content farms sink a huge amount of effort trying to find (i) what search engines consider important, and (ii) doing more of what they consider important.

This is problematic for a few reasons, all of which can be summed up by: “Google isn’t perfect.”

More specifically, search engines like Google measure proxies of quality and use it as an indicator of quality. This is expressed most clearly in Google’s big breakthrough – PageRank. PageRank assumes that (i) humans link to web pages, and (ii) on average, a link is a vote. Someone considers the page valuable, or no one would be linking to it. Certainly, Google uses other variables (Bing uses over a thousand, so we can infer that Google uses at least that many. Sort of.). But (almost) all of them are proxies for value – proxies for the reality of the situation.

When content farms engage in search engine optimization, they are “tricking” Google into thinking that their content is higher quality that it actually is.

To an extent, this is purely a trick. Google must – and has in the past – responded by de-valuing signals which people being to manipulate consciously.

But in another sense, this is good: some of the things Google tracks actually do have to do with quality. The announcement last year that Google was going to begin using page load speed as a ranking factor is an example – sites that load faster as more pleasant to use, so sites that work for a higher Google ranking will also benefit site users.

In that sense, search engine optimization is similar to grammar. Yes, a writer with poor grammar can communicate himself – but it’s considerably easier for a reader to interpret someone who has a grasp of good grammar.

This explanation of search engines and content farm  suggests a couple of things.

First, that search engine optimization isn’t going anywhere (nor has grammar, to the dismay of many).

Second, that as Google becomes better at providing links to high quality answers, content farms will provide higher-quality answers.

It’s a terribly symbiotic relationship. Since content farms subsist on advertising revenue, the more accurately Google (and others) can reflect user judgment, the higher the quality of the content that content farms will produce.

Unless both run out of money, which is rather unlikely – after all, it’s Google (AdSense) that’s paying the bills for the content farms, and it’s Google AdWords which is paying the bills for search.

Deliberate Mistakes: How Science is Invading Business

In 2006, HBR Ideacast (in its fourth podcast) interviewed HBR Senior Editor Gardiner Morse on an article he’d working on – concerning so-called “deliberate mistakes.”

I found the podcast very interesting – primarily because the idea it explored was something I’d covered extensively in my major.

Let me back up: Mr. Morse explains how over the past couple of decades businesses have embraced “experiments” to test if things will work. However, simply running experiments isn’t enough – a business also has to make “deliberate mistakes,” which reduces to running experiments you believe will fail. A problem with running experiments that confirm your experiments is that you become trapped by your assumptions – and you may end up “assuming” things that can cost your company millions of dollars.

AT&T

The example that Paul Schoemaker and Robert Gunther begin their article with is AT&T:

Before the breakup of AT&T’s Bell System, U.S. telephone companies were required to offer service to every household in their regions, no matter how creditworthy. Throughout the United States, there were about 12 million new subscribers each year, with bad debts exceeding $450 million annually. To protect themselves against this credit risk and against equipment theft and abuse by customers, the companies were permitted by law to demand a security deposit from a small percentage of subscribers. Each Bell operating company developed its own complex statistical model for figuring out which customers posed the greatest risk and should therefore be charged a deposit. But the companies never really knew whether the models were right. They decided that the way to test them was to make a deliberate, multimillion-dollar mistake.

For almost a year, the companies asked for no deposit from nearly 100,000 new customers who were randomly selected from among those considered high risks. […] To the companies’ surprise, many of the presumed bad customers paid their bills fully and on time and did not steal or damage the phones. Armed with these new insights, Bell Labs helped the operating companies recalibrate their credit scoring models and institute a much smarter screening strategy, which added, on average, $137 million to the Bell System’s bottom line every year for the next decade. (emphasis added)

Continue article here.

Now, before you ask specific – why couldn’t AT&T retroactively identify these people; why wasn’t this data used in the creation of the formula (as opposed to credit score, income, neighborhood, etc; which Mr. Morse implied they used); why did they have to exclude people from the insurance plan to learn this? – the podcast didn’t going into that. Being the generous guy I am, I’m going to assume a “displacement of responsibility” effect – that is, charging the insurance fee eliminated the social obligation to not damage the equipment, thus charging the fee actually increased damaged equipment. It’s quite plausible, and fits the narrative better.

Enter Science

The state of experimentation in business is baldly stated:

Many managers recognize the value of experimentation, but they usually design experiments to confirm their initial assumptions.

This is where science was in the 1920s.

In the 1920s, the Vienna Circle was in full swing – refining and popularizing the philosophical doctrine of Logical Positivism, which quickly permeated science. The fundamental tenet of logical positivism is that everything can be derived from empirical data (e.g. experiments) and logical inference. It’s a rejection of both theology and metaphysics – “postulating” (or assuming) that reality works in a certain fashion without any direct evidence.

In the 1930s, Karl Popper popularized falsification. Falsification is a logical correction – it points that the “problem of induction” popularized by David Hume in 1748 means that it’s impossible to arrive at “true” knowledge by induction (going from evidence to theory). This is the “all swans are white” fallacy that Nicholas Taleb used to great effect in his book The Black Swan: no matter how many white swans you see, it doesn’t mean a black swan doesn’t exist. The logically valid way to proceed is to create a theory, then derive a hypothesis, then to test that hypothesis against the evidence. You only need one disconfirming example (e.g. a single black swan) to disprove the hypothesis; so by falsifying hypothesis you can have a “process of elimination” of hypotheses and theory.

Now, it turns out that (philosophically) there are a few problems with that way of progressing, one of which is known as the Duhem-Quine thesis (Quine has a nice, if overstated, explanation in Two Dogmas of Empiricismhe later partially retracted his conclusion). In short, it states that hypothesis cannot be isolated and tested individually – rather, you are testing a bundle of interconnected theses, which makes it very difficult to falsify any hypothesis. Another problem is underdetermination, which claims that that there are n possible (contradicting) theories for any finite amount of evidence, where n > 1 (and usually very large). Thus, (practically speaking) a company can have any amount of evidence and have multiple contradicting theories available – making it very difficult to choose which action to take where the theories contradict.

(On the bright side, the theories are going to overlap a lot, too – they need to agree on the empirical evidence, after all – so the more evidence you have, the better off you’ll be. It’s just not certain).

Practical Reasons

However, these problems are of little practical significance in business, where the goal is not to be “right” but to be “more correct then the next guy” – your competitors. Business is a relative thing, so relative increases in truth have real business value.

In fact, there is a more compelling rationale to use falsification in business than there is in science.

“Savvy executives” – to borrow HBR’s favorite phrase – are considerably more vulnerable than scientists to cognitive “traps” that psychologists have identified. Why? Because whereas pressure is on scientists to produce truth, executives are under pressure to make it work. And boy oh boy, is the list of cognitive biases a mile long. The most significant cognitive errors for executives are:

  1. Confirmation Bias: When someone considers a hypothesis or opinion, they reach back in their memory for instances that confirm the hypothesis – and suppress recollection of instances which disprove the hypothesis.
  2. Regression to the Mean: People tend to ignore probabilities, and focus on the most recent event. In probability, an exceptional event – e.g. very good (or very bad) returns – is likely to be followed by a more ordinary event. The classic example is for a flight instructor who swore that negative feedback works (it doesn’t) – his explanation was that every time a pilot did exceptionally badly and he yelled at them, they did better the next time. This was true – but after an exceptionally bad landing, a pilot is likely to regress to the mean (of his capabilities) and therefore do better. This also applies to, e.g. stock trading and revenue/sales performance (this is part of the reason why it’s very bad to base firing/compensation on just the last year of results).
  3. Hindsight Bias: The tendency to look at a previous event and believe that you saw it coming – and didn’t – but that gives you confidence in predicting future events.
  4. Overconfidence effect: For many questions, answers that people rate as being “99% certain” are wrong 40% of the time.
  5. Halo Effect: Judgment about one attribute spills over to other attributes… e.g. Google is doing really well, therefore everything they do is really good as well; or, people who are more attractive are better at their work.

There are – quite literally – hundreds more; but let me explain what these combine to in terms of falsification.

Obviously, confirmation bias, hindsight bias, and the overconfidence effect combine to make people very bad at predicting what’s going to happen next. Together, they mean that (i) people believe they’re good at understanding or forecasting the situation, and (ii) only remember information which conforms to their opinions.

The halo effect means that people will routinely focus on the wrong things – that is, they will take an indication of one thing as an indication of another, unrelated, thing. AT&T is an example – they assumed that credit risk indicated bad debt and equipment theft/damage – when it did not.

And “regression to the mean” means that people are likely to base their predictions on exceptions as opposed to the underlying reality – essentially, to overestimate the amount of control they can exert on the situation. (Incidentally, the “fundamental attribution error” – the tendency to ascribe people’s performance to their nature as opposed to their situation – is not unrelated).

The combination of (i) focusing on the wrong thing, (ii) being overconfident of their understanding, (iii) not questioning their opinions, and (iv) overestimating the amount of control they can exert is not a good combination.

The practice of falsifying hypothesis is good in two ways: first, it’s humbling to realize when you’re wrong. Second, it provides basic ammunition to hobble people’s initial false conclusions.

The twin benefit of providing both discipline and improved performance – is, well, quite compelling.

Inheriting Methodology

The interesting fact is that it seems that business is, slowly, picking up the methodology of science.

In some ways, business has had scientific ideas introduced backwards. Consider the idea of a “paradigm shift,” which entered the business lexicon in the 1990s (and was promptly over-used); Kuhn introduced paradigm shifts in science in 1962, nearly thirty years after Karl Popper spoke of falsification. Experimentation has gained considerable traction in the past decade, closely followed by “deliberate mistakes” as Popperian falsification followed the Vienna Circle.

The advance of scientific methodology is not limited to philosophical ideas – in 2009, for example, IBM acquired SPSS (Statistical Package for the Social Sciences), and is now selling it as a Business Intelligence tool. Wall St is famous for hiring statistics PhDs to mine stock data; Google has become well-known for hiring newly-minted PhDs and giving them room to find the best solution to a problem (as opposed to a “sufficient” solution – or the first one that works).

As business faces increasing competition, making sure you’re doing the right thing matters more and more – room for mistakes, or acting sub-optimally, is quickly disappearing in the modern, distributed, global marketplace. It will be interesting to see how business continues to adopt scientific methodologies to try and reduce the possibility of error (particularly recurrent, expensive error) in the future.

Doom, Gloom, and Education

A college degree is just a piece of paper to me. I worked hard and spent a lot of money on mine. But, it’s not what got me a job.

I find the above quote, from Rachel Esterline, horrifying.

The conclusion that a college degree does not get you a job begs the question of what, precisely, a college degree does get you. There are rosy platitudes about “learning,” or “experiencing life” (really, just living out from under the ready hand of one’s parents for the first time). There are also depressing prognostications claiming that the purpose of college is to indebt Americans so that they are forced to enter the business world to pay off their debt. Indentured servants.

But let’s ignore that for now.

In order to be a marketable job candidate once you graduate, I believe you need to spend hours outside of the classroom to develop your real-world skills

College does not prepare you to get a job; well, then, what does? The author lists a number of things; it reduces to internships (experience), networking, attitude, and skills.

You don’t necessarily need a degree to get a job. You need skills.

Yes, skills. What are skills? “How to use Twitter” or “How to create a PivotTable in Excel?” Or are skills more along the lines of “How to write persuasively,” or “How to think clearly?”

It’s certainly difficult to say, and depends to no little extent on the job you’re pursuing. If you’re trying to get a job as a programmer, experience programming helps – but so does a solid knowledge of theoretical computer science, which you learn in college. And I imagine it’s quite necessary to have an Engineering degree to get a job as an Engineer.

A degree can act as a certification that you have certain skills. You can’t graduate from an Engineering program without a certain minimum baseline of both skills and knowledge.

But let’s step back and look at college a bit more. Broadly speaking, there are two types of college education: the liberal arts, and vocational studies. Vocational studies include Engineering and Welding, but I’d also through pre-professional programs like Social Work, Art, Theatre, and Business in there.

Liberal arts programs attempt to expose students to a broad array of concepts, methodologies, and techniques to build clear thinking and writing skills. This is necessarily ambiguous, and the results are rather difficult to quantify. As such, its benefits are questionable (and are frequently questioned).

Vocational programs attempt to distill the essential knowledge of a field, and provide that to a student. A Business major entering the workforce will know quite a bit more about how business works than someone lacking that education; he/she will understand marketing, finance, operations, business law, strategy, etc. Certainly, their knowledge will be broad (no one’s going to hire a new grad to spearpoint the development of a business strategy) but it will be a good overview of the field of business. That is, they’re unlikely to be tripped up by not understanding a major area of business – such as possible legal impediments when introducing a new product, or exporting a product overseas.

Naturally, we can expect liberal arts students to have difficulty quantifying what they learned, and how that’s beneficial to business (though you’d hope that the nature of the education would give them a clue of where to begin). However, it’s more troubling of students in vocational and pre-professional programs, where one would expect to feel confident in their understanding, and their ability to contribute to the bottom line.

There’s a possible explanation. College is taught by academics, who are trying to make more academics. To the extent that academia is separate from the concerns of business, people taught by academics won’t understand business concerns. For instance, if Professors in the Business department are interested in understanding how business works, they may neglect teaching the skills to be successful in favor of the understanding of what makes something successful. So a student could be quite good at evaluating whether or not something is going to work, but quite bad at making it work in the first place.

College could teach general theory without teaching applicable skills. The idea, of course, is that the skills necessary both change frequently, and there’s a huge selection of skills that people employ. Attempting to cover them all has a very poor return on investment. However, the theory of how something works gives students the ability to acquire the skills faster than they would otherwise be able to. A college degree may not certify that you have the skills required – but it could certify that you’re eminently capable of learning the skills in short order.

It would thus make sense that new graduates, upon entering the workforce, feel that nothing they learned is directly applicable to what they’re doing. It’s probably quite galling to spend four years of your life studying something, and then come out the other end faced with the task of learning a bunch more; vocational students especially, who may expect their education to deal with their jobs.

Does College Get You a Job?

The problem, of course, is that the author does not credit college with getting her a job – at all. This is obviously false on one level (the unemployment rate for college graduates is less then half that of the overall unemployment rate), but that’s not sufficient to dismiss her claims.

The author could mean a few things. First, she could be claiming that college does not improve your chances of getting a job – rather, that the “real-world” skills you build during the time you’re coincidentally at college get you a job. It’s not sufficient to point to either the unemployment rate of college graduates to dismiss this; it’s possible that people who are more skilled (or more likely to acquire skills) go to college, and that college adds nothing.

Second, she could be claiming that skills improve your chances of getting a job. This is non-controversial: the more skills you have, the more ways people can use you, the more companies can consider hiring you, the more likely you are to be hired. It’s a simple numbers game. And, in a competitive economy, having additional skills can put you ahead of your classmates.

Third, she could be claiming that a college degree is necessary but not sufficient to get a job. That is: companies will not hire you without a college degree, or without a college degree and skills. You need both a degree and skills to get a job (while this possibility is rather undermined by her claim that “you don’t necessarily need a degree to get a job,” different jobs may have different standards; so no one will hire you without skills, some will hire you with skills, and more will hire you with a degree and skills). This option seems to impose a considerably amount of structure on the labor market – more, perhaps, than is warranted. Relaxing the structure reduces this option to a variation of the second, where both skills and degrees factor into your desirability (and/or flexibility) in the labor market.

Still, the first option is, in many ways, the most disturbing. The claim that what you learn in the classroom is unrelated to getting hired means that the benefit of attending college is learning the other skills: the “real-world” ones outside the classroom. Such an option begs the question of why attend college in the first place; there are surely cheaper ways to obtain those same “real-world” skills with paying tuition, which can be as high as $30k/year. Given how much time students spend attending and studying for classes, it’d probably be a lot faster as well.

Let’s assume this is true. Can we reconcile this depiction of college with my above musings? Well, one option is to shift the playing field, by claiming that a college education shows dividends later on in life. Perhaps, after you’ve had a few jobs, your education allows you to make fewer mistakes or somesuch. It’s not unreasonable to suppose that an entry-level position requires less than four years of training you received at college; they may be requiring something else altogether.

However, that’s dissatisfying. It simply shifts the value of college to later in life – and if that was true, we would expect to see more people attending college after they’ve worked for a few years. Instead, most people in college come directly from high school.

I prefer to believe that there is some additional value in college that the author is not considering. The question is how to define it, and where we can expect it to manifest. Unfortunately, that’s not something I’m well-equipped to answer…