The ultimate image labeling tool

Posted by on 2007 Jan 29, Mon in Technology / programming

It's been a while since I promised myself I would describe the image labeling tool that my colleagues and I have developed about a month ago. Several people asked, it's better to describe it once rather than repeat myself every time one's curiosity finds its way out to the surface. It all started when I came across Google Image Labeler.

I quickly became attracted by the idea, but my background in psychology convinced me that their tool can be further extended, increasing the "entertainment factor", as well as the utility of the whole thing. The general idea is to collect a lot of statistics about the images and the users, and then look for patterns, trends and other things that can be later converted into money :-) (but it's not about money)

What is image labeler about? Two people are shown the same image, then they enter keywords which describe the image. It is clear that different people have different points of view, therefore the keywords are likely to be different. As soon as there is a match, you get some extra points (and time credit), and then move on to the next image. The process lasts until you are out of time.

In order to achieve a better score, you need to be well-synchronized with your partner. Google also gives you a list of high scores. What is the point, you ask? The point is that they have an image search engine, and in order to return relevant results to the searcher, they use us - humans. I am shown an image of a house, I can tag it as "building", "home", "construction", "settlement", etc. If 5 million people are shown an image of a duck and they label it as "duck", then it is a duck ;-) Thus Google is really unlikely to show you an image of a duck when you search "dog" (because many humans confirmed that the image is indeed that of a duck, so it really isn't a dog). What Google gets is a better quality of their results. What's in it for me?

I enjoyed their 'game' a lot, because it was really fun. I remember I once had a hard time getting synchronized with my partner, we didn't really match each other's keywords. Once the time was up, we were shown all the images and our keywords. An image of a remote control was described by me as "RC, remote, remote control, device, gadget", but none of these were ok, because my partner wrote "telecommande" ;-) Yep, the beauty of the whole thing is that it scales really well, so you can play with partners from any country of the world, and then study the results in order to figure out how your partner is thinking.

The problem is that Google keeps the statistics to themselves, and there's not much we can do but play with their tool. Hmmm... I, for one, would love to take a look at things other than the tip of the iceberg, and so the idea of an alternative tool was born.

We were given an assignment at the university, there were 8 people on the team, but in the end only two people did the actual work. We chose the subject ourselves, so I described Google Image Labeling, and proposed my team-mates to create something similar. I coded the server in Python, and a colleague of mine wrote the client in C#. The current version of the program is very simple and it doesn't do even half of what I think it should do. We were under a lot of pressure so we worked for a quick result, not a more distant and better one.

If you are not a programmer, or at least a person who is interested in technologies, networks, programs and all the other similar things, you can skip the next few paragraphs.

What we have now is:

a server written in Python;
it is multi-threaded;
it has a built-in HTTP-server (which allows you to watch the status of the current game in a browser; the status shows you who the players are, what the current image is, which keywords were assigned to it by each player, which were the other images and their keywords, the current score, the time left);
everything works via TCP/IP, so the thing is playable via Internet, not only in a local network;
it works with images of any size and format;
a client written in C#;
on top of all that - everything actually works :-)

What I'd like to add:

user accounts (make an account and then use it in the future, so that others could see your history);
a profile linked to each account (which contains the name, sex, age, location, and some basic details about the person);
a database backend for all the collected data (there will be a lot of things to store, so everything needs to be organized, to simplify data retrieval, sorting, and allow us run different queries);
a nicer web-interface, which allows more people to watch a game in progress; the current web-interface is black-and-white and pretty ugly, we need eye-candy :-)
the web-interface page should update itself whenever there is new data from the players (so that the game watchers don't have to manually refresh the page);

If you are a statistician or a psychologist - tell us which data you'd like to collect from players (the data will then be used to find patterns and trends in the society of today).

If you are a programmer - you can take a look at the code, either for academic purposes (you might be interested to find out how multi-threading was implemented, how the built-in web-server works, how our protocol works, etc), or to give us a hand and write some new code for the project.

Here are some screenshots (click to zoom in):

In this picture you can see the server (the console window), it displays various debugging information; as well as two clients who are connected to the server. Each client sends keywords that are associated with the image, the server receives the keywords and processes them, if there is a match, another image is sent to the clients. Even though the current example shows two clients, the server was designed in a way that it can host a game that includes an unlimited number of players; but you have to realize that it won't be that fun, because it takes a while until 10 people come up with the SAME keyword ;-)

This is the web-interface, as you can see, 'simplistic' is not the best way to describe it :-) It displays paths to images instead of displaying the images themselves, the history is shown in a form of a Python dictionary, and not as a nice table with pretty fonts, columns, separators, etc. But hey, it works, and it can be viewed with any web-browser. I connected to the web-server with my Palm, to watch the status of a game that was hosted by the server, the players were computers on the Internet - everything worked perfectly. It felt great to see that nothing falls apart.

So, to sum things up, we need:

someone who can make the whole thing pretty (CSS, HTML, JS, AJAX, and all the other buzzwords)
someone who can create the database schema that stores all the statistics
someone to decide what should be in that database (i.e. what kind of statistics we are gathering) and tell the database guy (or girl :-) what to do
anyone else who is interested

I can do all these myself, but it won't be fun (as it takes time to tinker with the system), and I am afraid that by the time everything is ready I will hate the product because of all the hard time it has given me.

What should we obtain in the end? Well, it will be the ultimate trend-detector ;-) The collected statistics can be very useful. For instance, (I provided this example when I was describing the project in class) imagine that you're a company that wants to create a new soft drink and sell it on the market. The juice has a yellow colour, and you're thinking about giving it a name. When showing various yellow images to different people, certain patterns will show up, such as "all teenagers call it yellow", "all males above 30 tend to call it 'golden'", "all females call it 'sunny'". If the product targets young people, it should probably be called 'yellow energy', if you want to sell it to women - call it 'sunny liquid', etc. Got the idea? Another example - you want to sell a universal remote control in France. Since you spent your childhood in Germany, but are an American, you'd probably call it "uber-RC", which is a bad idea, because statistics reveals that the French call it a "telecommande".

An intelligent company will definitely invest into such a research, because their goal is to sell. To sell more, they need to know how to promote the product. Statistics is very helpful when choosing a name, a color, a description, a tagline, and so on.

Finally, it's not about money. It's about studying people's behaviour and learning more about the human psyche, about the evolution of our society. Isn't this cool?

1 comment

Comment from: VA Visitor

The server part could be done like web-hosted service, and the client - like add-on for browser, smthg like del.icio.us works with FF. It even could be hosted at Dekart’s former server in Toronto.

Form is loading...