Blink Identity - High throughput, privacy preserving identification service.

View Original

Identity Primer: What is Face Recognition?

How Humans See

Humans are obviously intimately familiar with face recognition. Babies can recognize a facial shape (but not individuals) almost immediately after birth and around four months a baby is able to recognize individuals at almost an adult level, even though the rest of their visual processing is not fully developed. Being able to recognize "mom" obviously has a huge survival advantage so it is not surprising that this skill is developed so early. And the magic isn't constrained to babies either. As adults, we process faces in a different part of our brain than how we process any other object. That's part of why we sometimes see faces in abstract objects or attribute meaning to animal expressions that may be totally incorrect.

Computer Vision

For biometrics, a computer can't take advantage of millions of years of evolution in order to recognize a face. All it has is a 2-dimensional array of pixels, just like every other thing it encounters through its "eyes." When you see a picture of people at a birthday party you aren't seeing pixels, you are seeing objects - people, cake, tables, party hats, etc. Your brain has already broken down the scene into its component elements. When a computer looks at a scene to match faces, it first has to find the faces themselves since it doesn't see the scene as individual objects. The process of face finding is usually called face detection and if you have ever used a modern cell phone you have probably seen it in action when a small square box is drawn around a face on your phone camera screen. Face detection is a pretty mature area and it can be done extremely fast. The basic process is that a computer algorithm is trained in things that are faces and things that are not faces and it learns to distinguish between them. There is a great explanation and video here if you would like to know more.

Once faces are found, the computer turns them into templates - mathematical representations of the face itself. A template is typically much smaller than the original image - anywhere from 1KB to 20KB; it just contains the key information needed to match the face. Interestingly, the region that the computer looks at is much smaller than the region humans look at when recognizing faces. The main reason for this is that this center region doesn't change compared to things like hair and beards. Face matching is also accomplished in gray scale, so the color information is gone by the time the matching starts.

The way computers match faces is difficult to understand without a mathematical background (see here if you do), but here is a very simplified version of it. Imagine all of the people you have ever known. You probably know some people that look similar, some people that look really unusual, and some people that look "generic." Imagine you can break down those faces into the core things than make them different. You might be able to make a statement like "Sally looks a lot like Judy, but with a different eye shape. She looks almost nothing like Doug, except in the cheekbones" You might be able to visualize how you could add or subtract components of your friend's faces to makes something that looks like someone else - 2.5 Judy - .65 Doug + .05 Jennie = Sally.  That's the basic idea, and the components are called eigenfaces. The computer basically turns a particular face into a sum of facial components it knows about and calculates a similarity score. Not the way humans do it, for sure. But it is a whole lot more efficient.

In practice, face matching is challenging to use for high-security applications. You may be familiar with the face matching used by Facebook or other applications. That works quite well, but the price of getting a wrong answer is just a bit of user annoyance. However, in those applications they are only trying to match your face against a small number (10-1000) of people in your social network.

Probably the most successful application of face matching is used by various state DMVs to determine whether someone is getting more than one driver's license. That's a great application because the pictures are perfectly well composed, what we call "passport style" photos. The lighting is 100% uniform and dispersed, the background is uncluttered, the subject is looking right at the camera, and the pictures are all the same size. That makes the process easy, and this kind of matching can be done on databases of millions with great results. However, once you start straying from this ideal things get worse quickly.

If you look at the example eigenfaces in the above photo you can see that they all have the same presentation - looking right at the camera. Once you start changing the pose angles by looking down, looking to the side, etc., they aren't going to match the original photo well or not at all. This can be somewhat alleviated by enrolling different pose angles, but that is not going to be feasible for many applications. The second major issues is lighting. Since face matching uses the patterns of light and dark on the face, lighting is a major factor. If part of your face is in shadow, then that is a huge chunk of information that cannot be used. Or if your face is blurry or dim the situation can get even worse.

For high security applications like access control, face matching is essentially never used. It suffers from a high failure-to-acquire rate because of lighting and pose variations. And when a match is made, it is not as certain as a fingerprint or iris match primarily because of the "fuzziness" of the information used to match compared with those two modalities. Face matches are typically going to have a higher rate of false non-match than other modalities because of difficulties in presentation. And if you lower the threshold to prevent this you will likely be overwhelmed with false matches for large databases.

Face matching is the easiest biometric for humans to identify with because we have done it literally from birth. And for the most part we are really good at it compared to computers - we can recognize faces with very little information, especially if they are familiar. Of course we can't search databases of thousands of faces as well as computers, but in terms of small size, poor quality matching humans are amazing.