Saturday, January 3, 2009

Who Writes Wikipedia?

Ever wonder who writes the entries in Wikipedia? "Conventional wisdom," coming mainly from the speeches of Jimmy Wales, the founder of Wikipedia, says that about 1,400 people, aka 1,400 obsessed freaks, make over 73% of the edits.

Aaron Schwartz, who blogs Raw Thought, decided to look into those statistics. He knew that Wikipedia keeps a complete history of every change ever made to every article, as well as who made the change, and that history is available to the public. Some changes are made anonymously by people who never log in, but not the 1,400 mentioned by Jimmy Wales. Wales says that they all know one another and he knows them all. Schwartz found just the opposite.

"Curious and skeptical, I decided to investigate. I picked an article at random ("Alan Alda") to see how it was written. Today the Alan Alda page is a pretty standard Wikipedia page: it has a couple photos, several pages of facts and background, and a handful of links. But when it was first created, it was just two sentences: "Alan Alda is a male actor most famous for his role of Hawkeye Pierce in the television series MASH. Or recent work, he plays sensitive male characters in drama movies." How did it get from there to here?

"Edit by edit, I watched the page evolve. The changes I saw largely fell into three groups. A tiny handful -- probably around 5 out of nearly 400 -- were "vandalism": confused or malicious people adding things that simply didn't fit, followed by someone undoing their change. The vast majority, by far, were small changes: people fixing typos, formatting, links, categories, and so on, making the article a little nicer but not adding much in the way of substance. Finally, a much smaller amount were genuine additions: a couple sentences or even paragraphs of new information added to the page.

"Wales seems to think that the vast majority of users are just doing the first two (vandalizing or contributing small fixes) while the core group of Wikipedians writes the actual bulk of the article. But that's not at all what I found. Almost every time I saw a substantive edit, I found the user who had contributed it was not an active user of the site. They generally had made less than 50 edits (typically around 10,) usually on related pages. Most never even bothered to create an account."

What Schwartz discovered is not really surprising. It would be impossible for 1,400 people, no matter how brilliant, to write 75% of the roughly 2.5 million articles currently in Wikipedia. The truth is that over a thousand people working virtually full time are required simply to edit the articles as they increase and change. Editors, book salesmen, and the price of paper, is what made the Encyclopedia Britannica so expensive.

So what does this say about Google's competing "encyclopedia" called Knol? Knol is supposed to be written by experts who are paid from advertising revenue and only the author is allowed to make changes to their own articles. That way, Google argues, the material can be trusted and can't be vandalized. But the Knol scheme is fatally flawed, as Henry Blodgett points out in "Oops, Google's Knol Won't Be Killing Wikipedia After All." The "experts" are not exactly renowned and nobody seems to be making the easiest possible checks for plagiarism. For example, take most any complete sentence from the Schwartz quotation above and stick it into Google. You will immediately find a reference to the source of that sentence or at least a reference with a very low "Kevin Bacon number." And, as Blodgett explains, that's the least of the problems with Knol.

Hmm, a Kevin Bacon number for a link is amusing. Stephen Dolan, who appears to be a mathematician, has blogged on that concept. In his article "Six Degrees of Wikipedia," he discusses the links between articles in Wikipedia using graph theory. He looked for the "departure center" of Wikipedia, defined as the Wikipedia article from which it is possible to link to the most articles with the fewest clicks. Not all articles are referenced anywhere else, but he found that excluding articles that are just lists, years or days of the year, the "real article" closest to the centre is United Kingdom. Of the 2,301,486 articles existing on 3 March, 2008, 2,111,479 were reachable from some other article. From United Kingdom, you could reach them all in an average of 3.67 clicks. Next came Billie Jean King, oddly enough, at 3.68 clicks, followed by United States at 3.69 clicks. As an aside, he points out that it takes an average of 3.98 clicks to get from Kevin Bacon to anywhere else.

That ought to fill our trivia quota for today.

No comments: