A few days ago, I posted a recipe that showed how the Google geolocation API could be used to look up the geographical co-ordinates of a wifi router from its MAC address. Looking up stuff in an index is one of the things that Google is exceptionally good at (and its got a huge index to play with…), but Google also has a range of tools and services for extracting information and generating information.
- Label Detection: Detect broad sets of categories within an image, ranging from modes of transportation to animals.
- Explicit Content Detection: Detect explicit content like adult content or violent content within an image.
- Logo Detection: Detect popular product logos within an image.
- Landmark Detection: Detect popular natural and manmade structures within an image.
- Optical Character Recognition: Detect and extract text within an image, with support for a broad range of languages, along with support for automatic language identification.
- Face Detection: Detect multiple faces within an image, along with the associated key facial attributes like emotional state or wearing headwear. Facial Recognition is not supported.
(The logo detection service reminded me of a vapourware product pitch that appeared in one of my earliest blog posts!)
Just like the geolocation API, the idea is that you will be able to post a resource, such as an image, to the Google Cloud Vision API and it’ll tell you what landmark it recognises, or which brand logos.
(Homebrew image analysis options are still available, of course. For example, on the text extraction front, services like Apache Tika allow you to extract text and metadata from a wide range of documents.)
Audio to text services are also available (though I haven’t had a chance to play with them yet) from a variety of providers. For example, the Python SpeechRecognition package [code] has bindings to several services: for example, the IBM Watson Speech To Text API, the AT&T Speech API or the wit.ai service. The library also wraps a de facto API to the Google Web Speech API. I’m not sure if the Amazon/Alexa Voice Service offers a general purpose speech-to-text API, or whether any re buried deep inside the Alexa service?
Recognition services are also provided a range of other providers. For example, for many years, the Thomson Reuters OpenCalais service has offered a named entity extraction service that will extract – and semantically tag – named entities from submitted text.
- Entity extraction: identify the proper nouns, i.e. people, companies, locations, etc.
- Sentiment analysis: determine the overall sentiment or towards a specific keyword or entity.
- Keyword extraction: extract the important terms.
- Concept tagging: identify the overall concepts of the text.
- Relation extraction: extract subject-action-object relations.
- Taxonomy Classification: automatically categorize your text, HTML or web-based content into a hierarchical taxonomy.
- Author extraction: identify the author of a blog post or news article.
- Language detection: detect 97+ languages.
- Text extraction: pull out only the import content from a web page.
- Feed detection: extract the ATOM or RSS feeds from a web page.
…and a range of visual analysis services thought its Alchemy Vision product:
- Image Link Extraction API: Given any URL, the Image Link Extraction API will scan the designated page to find the most prominent image and directly retrieve the URL for that image.
- Image Tagging API: return keywords summarising scenes, objects and stylistic features, as well as identifying 3D objects.
- Face Detection/Recognition API: return the position, age, gender of people in the photo, along with the identity of any celebrities visible within it.
For anyone who likes logical Lego, the availability of these plug and play services means that in many cases you don’t have to worry about the base technology, at least to get a simple demo running. Instead, the creativity comes in the orchestration of services, and putting them together in interesting ways in order to do useful things with them…