Jet d’Eau or no

This is a quick demonstration of how the Visual Recognition service on the IBM Watson developer cloud can be used to identify Geneva’s most famous landmark, the Jet d’Eau.

Introduction

Today I attended a Swiss Financial Technology meet-up, part of the excellent series that they organise.  The presentation concerned the “cognitive computing” suite of services that IBM now provide as part of Watson.  We were given an overview of some of the capabilities and possible applications, and the technology was billed as being powerful but very simple to use.  I wanted to try something practical with them myself to get a better idea of what could be done.

After a quick look at the available services I settled on the Visual Recognition service.  This takes images as input and aims to classify them by their content.  This sounded like it might be something where input data could be readily found, and it fits in nicely with an online course I’m currently doing in machine learning which covers various classification techniques (I’ll blog about this and other courses at some point in the future).

Getting going

It took me a short while to work out how to access the service.  I needed to sign up for a Bluemix account, it’s still not entirely clear to me what this is, by following “Getting service credentials in Bluemix” here.

On signing up to the account and verifying my email I was taken to an IBM login asking for my IBM id.  As far as I was aware I didn’t have one, but it turns out this is the email address I’d just given them.

Once signed in I navigated to the Visual Recognition service, and was told that “The organization does not have a space in the ‘US South’ region. When you create a space for your organization, it will be created in this region.”  Not too clear what this was about but I created a space.  I assume this is some sort of virtual area in their cloud associated with my new account.

Then in the Visual Recognition service I hit the Create button in the Add Service section, without changing any defaults.  This returned the credentials I was after to make use of the API, and I could now fire requests at the API from my local PC.

Access to the service is entirely free as it’s in beta.  My free account is only valid for a month, not sure what happens after that but it looks like there might be a free hobbyist developer option with certain limitations.

The API

The Visual Recognition API allows you to classify images. It does this by training classifiers against known results, and then applying each of those trained classifiers against a test image. So for example, we might train the classifier with aerial pictures of cities. When we then present it with an aerial image of Paris it should identify it as being an image of a city. Each classifier is trained to identify whether an image matches or not, so many classifiers might match a single image. For example, that image of Paris might match both the “city” classifier and the “Eiffel Tower” classifier, but hopefully not the “tomato” or “oak tree” classifiers.

The API is very simple.  You can access it by making GET or POST http calls, and there are only five methods to it. To test I had access I tried the simplest call, to get back a list of all the classifiers. You can do this from the command-line using curl, which is just a way to fire http requests.

curl -u "username":"password" -X GET "https://gateway.watsonplatform.net/visual-recognition-beta/api/v2/classifiers?version=2015-12-02"

This returned a long list of classifiers, here’s a snippet:

        {"classifier_id":"Garbage_Dump_Site","name":"Garbage_Dump_Site"},
        {"classifier_id":"Industrial_Scene","name":"Industrial_Scene"},
        {"classifier_id":"Oilrig","name":"Oilrig"},
        {"classifier_id":"Refinery","name":"Refinery"},
        {"classifier_id":"Substation","name":"Substation"},

Classifying pictures

So the next thing to try was to actually classify a picture. Having seen the list of classifiers I tried asking the service to identify this, to see if it picked up it was an oilrig.
oilrig

You do this by making a call as below.

curl -u "username":"password" -X POST -F "images_file=@C:\Users\Nick\Pictures\oilrig.jpg" "https://gateway.watsonplatform.net/visual-recognition-beta/api/v2/classify?version=2015-12-02"

Note the ampersand in front of the filename is needed. This gave the result below.

{"images":[
        {"image":"oilrig.jpg",
         "scores":[
                 {"classifier_id":"Vehicle","name":"Vehicle","score":0.687759},
                 {"classifier_id":"Outdoors","name":"Outdoors","score":0.66822},
                 {"classifier_id":"Oilrig","name":"Oilrig","score":0.66334},
                 {"classifier_id":"War_Ship","name":"War_Ship","score":0.656808},
                 {"classifier_id":"Sky_Scene","name":"Sky_Scene","score":0.652685},
                 {"classifier_id":"Sailing_Ship","name":"Sailing_Ship","score":0.631554},
                 {"classifier_id":"Water_Vehicle","name":"Water_Vehicle","score":0.631165},
                 {"classifier_id":"Scene","name":"Scene","score":0.616492},
                 {"classifier_id":"Landmark","name":"Landmark","score":0.582753},
                 {"classifier_id":"Boathouse","name":"Boathouse","score":0.578},
                 {"classifier_id":"Fishing","name":"Fishing","score":0.56068},
                 {"classifier_id":"Sledding","name":"Sledding","score":0.550713},
                 {"classifier_id":"Deep_Sea_Fishing","name":"Deep_Sea_Fishing","score":0.533729},
                 {"classifier_id":"Parkour","name":"Parkour","score":0.53178},
                 {"classifier_id":"Gray_Sky","name":"Gray_Sky","score":0.527717}
]}]
}

So what does this mean? The system has run all the many classifiers against my image of an oilrig. Each classifier outputs a score between 0 and 1 as to how likely it thinks the image is to be a match. It returns all those that are greater than 0.5 in descending order. So we can see it’s done pretty well in that the oilrig classifier appears high up the list with a score of 0.66. Some of the other scores here are also what we would want, such as “outdoors”. However we can also see it has mistakenly identified the image as being likely to contain a ship. This is obviously because classifiers trained to find a ships or oilrigs are both going to have been trained on images containing open water, so the presence of open water in an image of an oilrig is enough to classify it as a ship.

A further option of this classification API call is to limit the classifiers to be used. I do this by creating a file containing the json such as below.

{"classifier_ids" : ["Amphitheatre", "Oilrig", "Garbage_Dump_Site"]}

Now I can re-run my test to check only if the image shows an amphitheatre, an oilrig or a rubbish dump like this:

curl -u "username":"password" -X POST -F "images_file=@C:\Users\Nick\Pictures\oilrig.jpg" -F "classifier_ids=<c:\temp\classifiers.json" "https://gateway.watsonplatform.net/visual-recognition-beta/api/v2/classify?version=2015-12-02"

In this case I get a similar result to before, but with just the oilrig classifier output.

Training a classifier

I now wanted to create my own classifier, and the real power of the service is that this can be done remarkably easily. I decided to create a classifier that should identify whether an image contains the Jet d’Eau, Geneva’s most well known landmark.

To create a classifier you need two sets of images: a positive set that represents images you want to classify, and a negative set that contains images that shouldn’t be classified.

For the positive set I grabbed 55 images of the Jet d’Eau from the web. Of these 50 were going to be my training set, this being the minimum recommended number to use. The remaining 5 were randomly selected to be my test set which I would then use to validate the classifier for accuracy. In selecting these positive images I tried to get a bit of variation, taking pictures that were on different scales and under different lighting conditions.

The next thing I did was to fire the training set at the existing classifiers. I wanted to see what they would currently be classified as, and use this to help determine my negative set. As explained in the documentation, you want your negative set to contain pictures that are similar to the positive set but should not be a match. The classifications I got were largely things you might expect because of the presence of open water: lighthouse, outdoors, sailing, windfarm, war ship, whale. There were a few unexpected classifications though. Coincidentally this image was actually more strongly identified as being an oilrig than the actual image of an oilrig I used earlier, and another couple of night scenes were identified most strongly as being musical instruments…

23

So to form the negative set I searched the web for 50 pictures made up of whales, lighthouses, warships, sailing and windfarms. After rolling these two training sets into zip files the classifier can be created as so.

curl -u "username":"password" -X POST -F "positive_examples=@C:\Users\Nick\Pictures\JetDEau\train\train.zip" -F "negative_examples=@C:\Users\Nick\Pictures\JetDEau\train_negative\train_negative.zip" -F "name=jet_deau" "https://gateway.watsonplatform.net/visual-recognition-beta/api/v2/classifiers?version=2015-12-02"

It took a few minutes to run, but it returned successfully with the id of my new classifier.

Testing the new classifier

So now it was time to test whether the classifier would work. I took the test set I had set aside earlier, put them in a zipfile, and ran them against just the new classifier by updating the relevant json file.

curl -u "username":"password" -X POST -F "images_file=@C:\Users\Nick\Pictures\JetDEau\test\test.zip" -F "classifier_ids=<c:\temp\classifiers.json" "https://gateway.watsonplatform.net/visual-recognition-beta/api/v2/classify?version=2015-12-02"

This gave the results below for the five test images.

{"images":[
        {"image":"14.jpg",
         "scores":[
                 {"classifier_id":"jet_deau_1131165134","name":"jet_deau","score":0.813962}]},

        {"image":"17.jpg",
         "scores":[
                 {"classifier_id":"jet_deau_1131165134","name":"jet_deau","score":0.778455}]},

        {"image":"26.jpg",
         "scores":[
                 {"classifier_id":"jet_deau_1131165134","name":"jet_deau","score":0.787328}]},

        {"image":"4.jpg",
         "scores":[
                 {"classifier_id":"jet_deau_1131165134","name":"jet_deau","score":0.798647}]},

        {"image":"22.jpg"} ]
}

This is impressive stuff. On four of the five images it is over 0.75 confident that the image contains the Jet d’Eau. The strongest example, with over 0.8, was this image.
14

But what of the fifth image that failed to get a score over 0.5? Without even looking at it I could guess what it would be like, and sure enough it was a night scene.  Not only that but the jet was pretty small and indistinct. It would probably take a lot more night-time training examples to be able to identify this accurately, and this might have been the wrong approach. Given how different day and night images will be, it would have been better to have two classifiers, one for daylight and the other for night scenes.
22

I then tested a few images that were not of the Jet d’Eau, so should not have been classified. It did well with various ocean scenes and pictures of boats. However where it fell down was with pictures of cities on a lake. It registered scores of around 0.6 for these snaps of Lausanne and Zurich, scores that weren’t as strong as for the real jet images, but still falsely classifying them as positive. In retrospect this was to be expected, there were no negative training examples presented which had a city on a lake but no jet.5 6

Conclusions

Although I was able to get up and running pretty quickly, the documentation could be improved. It took me a while to work out how to get access to the service, and the api details appear in three different places, one being an old version. The format of the json file for selecting classifiers is given in one place but not the other, something that cost me a bit of time and frustration. There are no code examples, just the curl syntax. Although this was pretty easy to use, if I were to use it further I’d want to wrap the api in code, I might consider doing this in Python at some point.

The successes, and failures, of the classifier I created illustrate that for all the power of a matching algorithm you need high quality training data for best results.

Overall I was very impressed by how accessible this very powerful tool was, it didn’t take me very long at all to produce some interesting results.

This entry was posted in Machine learning. Bookmark the permalink.

One Response to Jet d’Eau or no

Leave a Reply

Your email address will not be published. Required fields are marked *