Tipped off by @brlamb to this Huffington Post story on Pearson ‘Education’ — Who Are These People? (which in turn led to this SEC filing (I found the risk assessment on pages 8-10 interesting), I started wondering about the web domains owned by Pearson.
Looking up the domain registration details for pearson.com turned up a a handful of nameservers – ns.pearson.com, ns2.pearson.com, oldtxdns2.pearsontc.com, usrxdns1.pearsontc.com – which we can use as the basis for a reverse lookup to see what other sites are registered with the same domain server (and which presumably, therefore, relate to Pearson activities).
SO for example, Gwebtools turns up a couple of thousand or so domains dangling off usrxdns1.pearsontc.com, but I couldn’t get Haystax extractor to scrape more than a single page (not used it before? Maybe I was doing something wrong? Or maybe Chrome was playing up (too many open tabs again?!). I’m also too tired right now to write a scraper – been struggling to answer ReCaptchas all night (I guess that by now they’re completely inaccessible if you have dyslexia? It often takes me 5 or 6 refreshes before I feel confident going for one!) Which is to say, if you scrape the data describing all the domains associated with each of the Pearson nameservers, please post a link to it in the comments;-)
I don’t remember if I tried grabbing Pearson data from OpenCorporates to do a corporate sprawl graph..? I guess I should try and find what trademarks they have registered too?
WHich reminds me: is there a free open source of directors listing for UK companies yet? And how’s the Lobbiests register campaign (or WhosLobbying scraping) coming on? Is there a reverse lookup by company, so for example we could look to see who reps from Pearson had been chatting to?
I wonder also if Pearson support any All Parliamentary Groups…?
PS this was handy, at first… How to Find the other Websites of a Person?
PPS See also A Gust of WInd BLows Across HE on Pearson’s VUE assessment centres being used for open online course supervised examinations.
The URLs (from Gwebtools.com) are in a Google spreadsheet at https://docs.google.com/spreadsheet/ccc?key=0AulLBOiAoaRUdDNiU3FUV2gtR3lzTTlyM0owbEM3RUE
Note that NS appears to include the same URLs as NS2 (I did not do a match), while OLDTXNDS2 is almost, but not quite, the same as USRXDNS1.
Also, for what it’s worth, the numbers of domains reported by SpyOnWeb.com does not match what Gwebtools.com lists. According to SpyOnWeb.com, NS and NS2 have more domains than Gwebtools.com lists, and Gwebtools.com lists more domains for OLDTXNDS2 and USRXDNS1 than SpyOnWeb.com reports.
@ed Thanks for doing that/digging around. Does SpyOnWeb support paging of results do you know (so you can get to see all the domains it claims to know about?)
SpyOnWeb does not appear to support paging of results at this time – it apparently just displays a max of 100 URLs per nameserver. And it asks users to not “use any robot, spider, other automated device or any tool-bar, web-bar, other web-client, device, software, routine or manual process, to monitor or scrap information from this Site or the Service, or bypass any robot exclusion request (either on headers or anywhere else on the Site).”