25 January 2011

They number only a hundred or so. Remarkably few, when we consider the massive black hole they generate within the creative industry. A nearly year-long study conducted by a group of researchers from the Carlos III University of Madrid and Institute IMDEA Networks, headed by Rubén Cuevas, a professor at the university’s Telematic Engineering Department, reveals that barely a hundred “key pirates” are responsible for uploading 66% of all illegal files distributed through the BitTorrent protocol, the mass sharing P2P application that accounts for a large part of Internet traffic.


Assistant Professor Cuevas explains that the study was born from a question: “Who is the original source of the content?”, and also “what are their motives and why do they do it?”. The researchers focused their attention on a massively popular protocol (BitTorrent) and on two well-known portals, namely Mininova and The Pirate Bay. From these, they downloaded 55,000 files of all different types, and went on to analyze 40,000 of them in detail. It was an arduous process that took over eight months, and the findings were presented at the close of 2010 at the International Conference on Emerging Network EXperiments and Technologies (CoNEXT).

The results can be summed up in two statements, explains Rubén Cuevas: just a handful of people are responsible for a substantial part of the file sharing structure and they do so for money. “The 40,000 files we analyzed in detail were downloaded 96 million times by 27 million different users. Impressive figures, all the more so when we consider that 66% of these files were posted on-line by just a hundred or so people”.

English and Spanish rule supreme

The hundred-odd pirates that rule the Internet operate through servers outside the country where they “work”, creating a huge thorn in the side of the proposed Spanish ‘Sinde’ Act. “English and Spanish rule this market. In many cases, the on-line pirates upload films or games in both languages, although we have noticed that at least twenty of these major users or user groups specifically upload content in Spanish”, adds Professor Cuevas.

The study conducted by Universidad Carlos III has an unequivocal response to the other key question (why do they do it?): money. “The publishers place themselves at risk in exchange for the economic rewards they reap from the associated advertising”. Some of the users, which the researchers identified through a specific tool, pocket in the region of four thousand dollars a day, although the average for the cases analyzed was closer to four hundred. “We believe that if these users lost their incentive, whether due to slumping revenue from advertising or prohibitively high fines, they would probably cease to offer this kind of content”.

“Antipiracy” agencies

Professor Cuevas’ team also unearthed another interesting finding. 25% of the files analyzed were fakes, meaning they didn’t actually contain the material promised on the search engines or portals such as Mininova. Who uploads these decoys? The research team contends that part comes from “anti-piracy” agencies created by the creative industry to discourage users, who often end up frustrated by the hours and hours they spend downloading fake bits onto their hard drives. The rest, of course, stems from hackers and viruses.

Experts are quick to point out, however, that P2P is a tool that reached its pinnacle by allowing for direct downloads and streaming. The success of Spotify, for example, has meant that many of its fans don’t need to download music from the Internet (or at least download less).

700 gigabytes and eight months of work

The approach
The researchers developed a tool enabling them to compile data on files shared via the BitTorrent protocol, including who posted it, their IP address and who downloaded it.

In order to protect their identity, many content publishers rent servers from companies specializing in this service and post content from these servers, which are often located in other countries.

Gigabytes and time
Eight months of work and 700 GB of stored data, which is still yielding valuable new research information, particularly in relation to the so-called “fake files”.

