Problems of Data Quality

One of the advantages of working with sports data, you might have thought, is that official sports results are typically good quality data. With a recent redesign of the Formula One website, the official online (web) source of results is now the FIA website.

As well as publishing timing and classification (results) data in a PDF format intended for consumption by the press, presumably, the FIA also publish “official” results via a web page.

But as I discovered today, using data from a scraper that scrapes results from the “official” web page rather than the official PDF documents is no guarantee that the “official” web page results bear any resemblance at all to the actual result.

formula_one_spanish_grand_prix_2015_q_off_class_pdf__page_2_of_2__and_Session_Classifications___Federation_Internationale_de_l_Automobile

Yet another sign that the whole F1 circus is exactly that – an enterprise promoted by clowns…