Engaging in some rally data junkie play yesterday, I started wondering about whether I could grab route data out of the rather wonderful rally-maps.com website, a brilliant resource for accessing rally stage maps for a wide range of events.
The site display maps using leaflet maps, so the data must be in there somewhere as a geojson object, right?! ;-)
My first thought was to check the browser developer tools network tab to see if I could spot any geojson data being loaded into the page so that I could just access it directly… But no joy…
Hmmm… a quick View Source, and it seems the geojson data is baked explicitly into the HTML page as a data object referenced by the leaflet map.
So how to get it out again?
At this point I started wondering about accessing the data as a native JSON object somehow.
window object, you can then browse through all the objects loaded into the page…
map.data. Knowing the path to the data, can I then grab it into a Python script?
One of the tricks I’ve started using increasingly for scraping data is to use browser automation via Selenium and the Python
selenium package. Trivially, this allows me to open a page in a web browser, optionally click on things, fill in forms, and so on, and then either grab HTML elements from the browser, or use
selenium-wire to capture all the traffic loaded into the page, (this traffic might incude a whole set of JSON files, for example, that I can then reference at my leisure).
It seems so: simply call the selenium webdriver object with
dict object. Simples:-)
Here are the code essentials…
Another one for the toolbox:-)
Plus I can now access lots of rally stage maps data for more rally data junkie fun :-)
PPS it also strikes me that
ipywidgets. (Hmm… I think there’s a push on to to support