halfaker-aaron-sept-2013
Internet

Wikipedia's Aaron Halfaker on anonymous editors and sneaky algorithms

“We were trying to figure out why the population of Wikipedia was declining. What we ended up learning was that it was these quality control tools that were maybe the worst of it,” Aaron Halfaker, Senior Research Scientist at Wikimedia tells me over the phone from the US.

Wikipedia, the free online encyclopaedia has been in trouble for a while. Owned by the non-profit Wikimedia Foundation and run by volunteers, the number of articles on the site has been steadily declining over the last few years.  In its heyday in 2006, the number of articles added to Wikipedia every month reached over 50,000 new articles a month. Now, it’s just increasing by over 20,000 a month. So what’s happened? Some say Wikipedia’s inability to make newcomers feel welcome, clashing editorial styles, and a complex infrastructure are all to blame for this decline. 

Aaron Halfaker has been intently studying the reasons for this decline and believes that Wikipedia’s quality control tools have been at the heart of it.

“These tools have template messages that they send to people. It turns out that they would send the exact same template message to someone that is obviously vandalising Wikipedia because they want to hurt the project, as they will to somebody who is only trying to make a positive contribution,” explains Halfaker.

Wikipedia will now be deploying a machine learning model that removes damaging edits whilst keeping new editors happy.  

“By doing this damage detection you could take 100 edits that came into Wikipedia and filter that down to just 10 edits that people will have to review. So that means you are essentially removing 90% of the work in doing quality control in Wikipedia so this machine learning model is really important.”

Halfaker admits that they are not “breaking new ground” in the machine learning algorithms but setting it up has been extremely challenging. It took him around “20,000 lines of code” which he applied from his advanced degree in computer science. But he says Wikipedia needs to innovate and it is hoped that the algorithms will be able to minimise vandalism, and provide a “reasonable newcomer experience at Wikipedia”.

Halfaker is really keen to improve Wikipedia’s quality control system and he is hoping to encourage tool developers to come forward with suggestions on how to make improvements and help alleviate some of the “newcomer issues”.

Newcomers and Wikipedian culture

When Wikipedia was first launched by Jimmy Wales and Larry Sanger in 2001, it was a surprise to both of them how quickly people took to the idea of an easily editable encyclopaedia. As the site and content grew, the Wikipedians developed a set of workflows and guidelines to maintain the site. But as reported by MIT, around 2006, “the established editors began to feel control of the site slipping from their grasp”.

As the number of contributors grew, so did vandalism and the site soon ran into big trouble. Wikipedia’s most active contributors tried to resolve this by introducing more sophisticated software designed to detect vandalism and bad edits – and it worked. But it also deterred newcomers as the chances of their edits being deleted increased due to the strict software tools coupled with tales of elitism and bureaucracy, which made it impossible for editors to co-exist peacefully.

What does Halfaker make of all this?

“I suspect that if you are unusual and if you are not the type of person who is normal within Wikipedia then you are more likely to be spotted through this newcomer phase where people are scrutinising your work very closely. Because our quality control system is so aggressive we are more likely to push you away if you are not part of the typical Wikipedian culture,” Halfaker admits.

But Halfaker believes this type of social issue is inevitable and shows up in most workplaces like the ‘boys club’ in technology right now.

“Regardless of how Wikipedians actually operate and who got there first, this is what social systems do. They sort of rally around the first people who were there, who designed the systems that worked for them and the people who show up afterwards then have to deal with the system that wasn’t designed for them.”

Halfaker says that social change cannot happen without technological innovation.

“Wikipedia is essentially a machine, where some figures are people and some are visual technologies. So if the digital technologies don’t spin right, then the people can’t spin right.”

Anonymous editors and accountability

One of the quirks about Wikipedia is the ability to make edits to the site anonymously without logging in. Much debate has been generated about whether anonymous editors should be allowed to edit articles. For some Wikipedians it’s a no-brainer – a registered account will help minimise vandalism on the site. But others believe forcing editors to register will only harm Wikipedia and deter newcomers even more. Furthermore, many Wikipedians believe that forcing editors to register goes against the core philosophy of Wikipedia.

Wouldn’t it just be easier to make it a rule for everyone to register?

“I was just doing analysis of this over the last couple of days. Anonymous editors, people who have not registered an account, add about 25% of the new content in Wikipedia so they are really substantial. There’s a lot of productive stuff that comes from people who, for whatever reason prefer not to edit while logged in,” Halfaker tells me.

But Halfaker says there are a lot of good arguments on why people shouldn’t be forced to register an account. He says that not “everyone wants to have a social identity on Wikipedia” and have conversations and debates on “what should appear in articles and what shouldn’t”. They just want to “fix grammar” and leave it at that.

“We've actually run experiments where we have changed the user interface so that it implies very strongly that you do need to register an account in order to edit. In those controlled experiments we saw around a 30% drop in productivity. So just from a practical point of view, Wikipedia would develop far slower if we were to force people to register,” Halfaker adds.

Power dynamics in algorithms

Halfaker tells me that Wikipedia currently has 110,000 ‘active editors’ on the site. Editors that are considered ‘active’ are those that make five edits in a month. But Wikipedia has a major diversity problem. The Wikimedia foundation found in its survey that 90% of the editors are men. Halfaker believes it is important to think about power dynamics on Wikipedia and how this might influence things in the future.

“I think we should be concerned and interested in how good Wikipedia's coverage is of information that is not part of the Western academic tradition. Who gets to write the content, whether it's female editors or male editors – or people editing from Africa. Algorithms are a space where we need to be really careful about these power dynamics because they can remain hidden. An algorithm can be racist or biased – it can create this disparate impact.”

Google’s non-transparent algorithms

Google’s algorithms caused a lot of controversy in the press this year when its photo app labelled two black people as “gorillas” and searches for “nigger house” pointed to the White House in Google Maps. Google apologised but because Google’s work on algorithms is largely non-transparent – it is hard to tell whether this was just an example of racism on the web or a fault in its algorithms. Halfaker would like to see more transparency from technology companies like Google and Facebook.

“With Google's search ranking, as they become more open then they're easier to game by spammers. So there's trade-offs. I think that we as humans who are using Google services need to advance our conversation on what we expect. I don't think it's fair that we just let the spammers have free reign but I don't think it's fair to say that we should know nothing about how they are affecting the content that we see.”

I put it to Halfaker that there are wider repercussions to think about too – what we see in Google’s search results could subliminally shape how we see the world.

“Yes and I would say that this is the same of any sort of space where someone makes changes. You can affect people's behaviour by moving a chair within a room. You can affect a Wikipedian's behaviour by moving a button, changing a colour or making it larger or smaller.”

“Algorithms are sort of the same. But they are sneakier. You don't see how they change. Changing the algorithm will change how you perceive the chair. Maybe the chair didn't move at all,” Halfaker concludes.

 

Also read:

The Wiki Man: Jimmy Wales on Africa, driverless cars and robotic pizza

PREVIOUS ARTICLE

« C-suite career advice: Werner Knoblich, Red Hat

NEXT ARTICLE

How to avoid ICT contracts that delight vendors but hurt you »
author_image
Ayesha Salim

Ayesha Salim is Staff Writer at IDG Connect

  • twt
  • Mail

Recommended for You

Trump hits partial pause on Huawei ban, but 5G concerns persist

Phil Muncaster reports on China and beyond

FinancialForce profits from PSA investment

Martin Veitch's inside track on today’s tech trends

Future-proofing the Middle East

Keri Allan looks at the latest trends and technologies

Poll

Do you think your smartphone is making you a workaholic?