Image Labeling Using Crowdsourcing
Image labeling can be separated into two main types – annotation and tagging. Tagging refers to the process of labeling entire images with specific terms. For example, images of different animals are tagged according to which animal is shown. This data set can then be used to train machines to recognize these animals in different pictures. In addition, image tagging helps to make your photo database more easily searchable. Using an internal search function, the images are shown as results when the tagged terms are used as keywords. Web crawlers also use this information when indexing a website, which can help your SEO ranking.
Image annotation on the other hand labels different parts of an image. That means that there is not one label applied to the entire photo, but only to one aspect of it. This can take different forms such as the following:
- Bounding Boxes:
Bounding boxes are one of the most common types of image labeling. Rectangular boxes are placed around certain objects within the image, for example, cars, cyclists, as well as pedestrians in traffic images. This allows the AI to recognize these shapes in different contexts and thus learn to apply this information to new images.
- Polygons:
Similarly to bounding boxes, polygons are used to annotate specific objects within an image. However, polygons allow for more precision: The lines are drawn closer to the edges of the object and can be used to annotate objects that do not fit into rectangular boxes.
- Semantic Segmentation:
With semantic segmentation, annotators can identify every pixel of the images according to a set of predetermined tags. For example, an aerial photograph of a neighborhood can be annotated to identify which pixels show streets, houses, vehicles, or gardens. This provides you with very specific labels that are particularly important when the environmental context of an image matters.
- Key Points:
Key-point annotation can be used to detect small objects and shapes. For this, dots are placed on specific parts of the image, for example, eyes, eyebrows, or the mouth on the image of a person. Using this information, machines can learn to identify different emotions and recognize facial features from different angles.