We Don’t Know How Computers Recognize Faces - and That’s Ok
In my role as CTO, I get to hear people talk about how they think biometric facial recognition works, which I assume primarily comes from movies and TV shows. Most often people tell me that the system measures the distances and angle from the eyes to the nose, the nose to mouth, etc. Interestingly, this is exactly how face recognition worked before computers, when it was done by the CIA.
Consider the photos below. These two photos were taken when these individuals were 23 years old but separated by two generations and more than eight years. The one on the right is my son and the one on the left is his paternal grandfather. I’ve (crudely) photoshopped them to make them look similar in terms of grain, saturation, lighting, etc. but you may find their facial features to be very similar (or not, depending on your innate ability). However, the “geometry” of these faces is clearly similar, and any measurement of distances and angles would have a hard time telling them apart.
I can say with a high degree of confidence that no modern facial recognition system uses distances and angles as a sole means of matching faces. However, I can’t say that modern systems don’t use this technique to some degree. This may be surprising, but we really can’t say much about how modern systems work at all due to the nature of how neural networks work.
In essence, we feed a computer a bunch of examples of matching and non-matching faces and tell it to figure out how to tell them apart in a general case. We don’t tell it how to match faces, and it can’t tell us how it matches them. We can characterize how the computer performs - likelihood of errors, errors across demographic groups, performance at scale, etc., but at the end of the day it is a black box that performs a useful task.
This doesn’t mean we should abandon face recognition, of course. We rely on systems (and people) every day that make good decisions but can’t explain why. A sculptor knows just where to hit a piece of marble, but she probably can’t explain why she hit that spot in any way that would be useful to a non-sculptor. Intuition is typically the sum of a set of unexplainable factors, and it serves us well more often than not. Computers just have their own form of intuition.
In terms of performance, computers are measurably better than humans at most face recognition tasks - they are free from human biases, they don’t get tired, they don’t get distracted by hats, sunglasses, or personality, and they can find a match in a database of millions.
We have blogged about errors in face recognition before, and of course they do happen. However, I am sometimes called upon to explain why a computer might fail to make a match that seems obvious to a human, and all I can do is shrug my shoulders. I can’t look at the source code, nor can I ask the computer to explain itself. However, this is ultimately no different than how humans work - you likely can’t explain how you recognize faces either, but you get it right enough for the skill to be useful.