More Recognition/Identification Service APIs – Microsoft Cognitive Services

A couple of months ago, I posted A Quick Round-Up of Some *-Recognition Service APIs that described several off-the-shelf cloud hosted services from Google and IBM for processing text, audio and images.

Now it seems that Microsoft Cognitive Services (formally Project Oxford, in part) brings Microsoft’s tools to the party with a range of free tier and paid/metered services:

So what’s on offer?

Vision

Computer Vision API: extract semantic features from an image, identify famous people (for some definition of “famous” that I can’t fathom), and extract text from images; 5,000 free transactions per month;
Emotion API: extract emotion features from a photo of a person; photos – 30,000 free transactions per month;
Face API: extract face specific information from an image (location of facial features in an image); 30,000 free transactions per month;
Video API: 300 free transactions per month per feature.

Speech

Custom Recognition Intelligent Service (CRIS): customise the acoustic environment for speaker/speech recognition services; free private preview by invitation;
Speaker Recognition API: identify the person speaking in an audio file; 10,000 free transactions per month;
Speech API: speech to text and text to speech services; 5,000 free transactions per month

Language

Bing Spell Check API: 5,000 free transactions per month
Language Understanding Intelligent Service (LUIS): language models for parsing texts; 100,000 free transactions per month;
Linguistic Analysis API: NLP sentence parser, I think… (tokenisation, parts of speech tagging, etc.) It’s dog slow and, from the times I got it to sort of work, this seems to be about the limit of what it can cope with (and even then it takes forever):

5,000 free transactions per month, 120 per minute (but you’d be luck to get anything done in a minute…);
Text Analytics API: sentiment analysis, topic detection and key phrase detection, language extraction; 5,000 free transactions;
Web Language Model API: “wordsplitter” – put in a string of words as a single string with space characters removed, and it’ll try to split the words out; 100,000 free transactions per month.

Knowledge

Academic Knowledge API: search Microsoft Academic? 10,000 transactions per month;
Entity Linking Intelligence Service: 1000 free transactions per day;
Knowledge Exploration Service: not sure….?! Up to 10,000 objects, 1000 transactions free;
Recommendations API: recommender based on your own data? 10,000 free transactions per month.

Search

Bing Autosuggest API: Up to 10,000 transactions per month;
Bing Image Search API: Up to 1,000 transactions per month across all Bing Search APIs;
Bing News Search API: Up to 1,000 free transactions per month across all Bing Search APIs;
Bing Video Search API: Up to 1,000 free transactions per month across all Bing Search APIs;
Bing Web Search API: Up to 1,000 transactions per month across all Bing Search APIs.

There’s also a gallery of demo apps built around the APIs.

It’s seems then that we’ve moved into an era of commodity computing at the level of automated identification and metadata services, though many of them are still pretty ropey… The extent to which they will be developed and continue to improve will be the proof of just how useful they will be as utility services.

As far as the free usage caps on the Microsoft services, there seems to be a reasonable amount of freedom built in for folk who might want to try out some of these services in a teaching or research context. (I’m not sure if there are blocks for these services that can be wired in to the experiment flows in the Azure Machine Learning studio?)

I also wonder whether these are just the sorts of service that libraries should be aware of, and perhaps even work with in an informationista context…?!;-)

PS from the face, emotion and vision APIs, and perhaps entity extraction and sentiment analysis applied to any text extracted from images, I wonder if you could generate a range of stories automagically from a set of images. Would that be “art”? Or just #ds106 style playfulness?!

PPS Nov 2016 for photo-tagging, see also Amazon Rekognition.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering... View all posts by Tony Hirst