The Images of Groups
Dataset
To study these ideas, we built a
collection of people images from Flickr images. The
following three searches were conducted: “wedding+bride+groom+portrait”
“group shot” or “group photo” or “group portrait” “family portrait” A
standard set of negative query terms were used to remove undesirable images.
To prevent a single photographer’s images from over-representation, a maximum
of 100 images are returned for any given image capture day, and this search
is repeated for 270 different days. In each image, we labeled the gender and
the age category for each person. As we are not studying face detection, we
manually add missed faces, but 86% of the faces are automatically found. We
labeled each face as being in one of seven age categories: 0-2, 3-7, 8-12,
13-19, 20-36, 37-65, and 66+, roughly corresponding to different life stages.
In all, 5,080 images containing 28,231 faces are labeled with age and gender,
making this what we believe is the largest dataset of its kind. Many faces
have low resolution. The median face has only 18.5 pixels between the eye
centers, and 25% of the faces have under 12.5
pixels. As is expected with Flickr images, there is
a great deal of variety. Some images have people are sitting, laying, or standing on elevated surfaces. People often have
dark glasses, face occlusions, or unusual facial expressions. |
|
|
|
|
Zip files containing the images and raw text data files: |
Fam2a.zip 85 MBytes |
Fam4a.zip 44 MBytes |
Fam5a.zip 71 MBytes |
Fam8a.zip 43 MBytes |
Group2a.zip 53 MBytes |
Group4a.zip 114 MBytes |
Group5a.zip 37 MBytes |
Group8a.zip 71 MBytes |
Wed2a.zip 42 MBytes |
Wed3a.zip 18 MBytes |
Wed5a.zip 14 MBytes |
|
MatlabFiles.zip 155 MBytes |
|
RowLabeling.zip 2 KBytes |
|
ageGenderClassification.zip 25 MBytes |
|
|