Journalists lean how to web scrape.

By Mbalenhle Buthelezi

Web scraping is a useful technique to assist investigative journalists to uncover truths and improve the overall quality of reporting. Data journalism editors of Klass and Mulvad, Nils Mulvad and Tommy Klass, alongside founder of Data Journalism Turkey Platform, Pinar Dag, taught journalists the basics of web scraping during a workshop at the 10th Global Investigative Journalism Conference on Thursday.

They identified web scraping as a valuable tool for data and insight generation to reporting. Web scraping refers to extracting of large amounts of data from websites.

There are different tools for web scraping, varying in levels of difficutly and one’s ability to code. Mulvad demonstrated some basics of applying the scientific methods of data extraction to investigative journalists from all around the world. He showed the audience the basics of using the online scraping tool import.io.

The internet and various websites are filled with large volumes of information, most of which can be irrelevant for a specific topic. Extracting the data that is particular to one’s work from a large pool of unrelated and unstructured data takes time and effort. This is where web scraping becomes an irreplaceable component of the data journalism process. It is a targeted and strategic information gathering process.

But while tirelessly extracting thousands of lines of code is beneficial for reporters, there are ethical implications to web scraping. “While you are scraping you should also think, what is your limitation, what purpose are you scraping for and where to draw the line?” said Dag. She further said that reporters should respect the highest ethical and legal standards rules set by the websites and the scraped data is used with good intentions.\

PHOTO: Pinar Dag gives practical lessons on web scraping as a way to collect data at the 2017 Global Investigative Journalism Conference in Johannesburg. Photo: Kayla de Jesus Freitas.