When it comes to Artificial Intelligence, the debate can get hot quite quickly, usually between a faction crouching in a self-defense position, sustaining that human capabilities are not to be reached from machines any time soon, and a faction advocating that the era of AI is instead almost here, if not already arrived.
This post is not meant to be an introduction to the arguments mentioned (I might write a more in-depth post later on), but to expose some considerations about how misleading a crude comparison between the results of the two can be if the whole context is not taken into account.
Talking about Deep Neural Networks (DNNs), they are nowadays considered state-of-the-art in many areas of Artificial Intelligence, especially computer vision, so we might as well consider them a significant benchmark for this debate. So, how do they relate to human vision? Are they on par with our own capabilities? Turns out that the answer is not exactly straightforward.
An interesting paper of Christian Szegedy & coll. showed that DNNs have counter-intuitive properties, that is, they seem to be very good at generalization, even better than humans, yet they can be easily fooled with Adversarial Negative Examples. The authors hypothesized that a possible explanation was the extremely low probability of those adversarial sets to be observed in a test set, but (like the rational numbers) dense enough to be found virtually in every test case.
Many years have passed since the first pioneering works on adversarial classification[2,3], and nowadays many adversarial examples are generated with Evolutionary Algorithms (EA) that evolve as a population of images. With this kind of algorithms, it is interesting to note that is possible to fool state-of-the-art neural networks to “recognize” with almost 100% certainty that images evolved to be totally unrecognizable to humans as natural objects.
Using evolutionary algorithms to produce images that match DNN classes can produce a terrific variety of different images, and looking at these, and interestingly the authors note that:
“For many of the produced images, one can begin to identify why the DNN believes the image is of that class once given the class label. This is because evolution need only to produce features that are unique to, or discriminative for, a class, rather than produce an image that contains all of the typical features of a class .”
These examples demonstrate how AI recognition can be intentionally fooled, making it fail to recognize some images which are obvious to us (false negatives), and also making it recognize with strong confidence, something that to us is obviously not there. There is plenty of literature on this topic[5–7], which can be pretty important also from a cybersecurity perspective.
However, we should underline that human recognition has its own shortcomings too: there is plenty of optical illusions to demonstrate it, not least the famous white and gold vs blue and black dress, which sparked a lot of debate.
There are cases where artificial recognition can consistently outperform humans[9,10], like fine-grained intra-class recognition (e.g. dog breeds, snakes, etc). It also appears that humans can be even more susceptible than AI when there is insufficient training data, that is, the human himself did not have enough exposure to that kind of class.
Human perception is a tricky beast, it seems extremely good to us, because it can be pretty robust and adaptive, but as we have just seen it depends a lot on pregressed knowledge as we also need training (sometime lifelong training) in order to perform it with some degree of success. Sure enough, we have also some innate categories where we are very skilled at recognizing since our birth (e.g. human faces of our own race), but guess what? We are also susceptible to be fooled there too, if we only change illumination[11,12].
Even human faces can be hard to recognize for us, with just a change of illumination.
Also, we are reliant on aspects of the reality that is not objective at all, like colors. Everyone knows that colors depend on light wavelengths reflected by the objects, but we often forget that what really makes colors be what they are to us, is our brain interpretation. In short, colors do not exist in nature, they are just a small portion of the light that our brain encodes in specific sensations. We don’t see infrared or ultraviolet, or gamma rays as color, which are definitely there, and we also see colors that do not really “exist” in the spectrum, like brown.
Our perception is strongly tied not only with our neurophysiology but also with our cultural context. There is a by now famous Namibian tribe, named Himba, which has dozens of terms to define green, while it has no words at all for blue, and apparently its members don’t seem able to distinguish blue from green at all, while they are still much better than us at spotting very slight differences of greens[13,14]. Furthermore, very recent studies demonstrated that humans can be prone to be fooled by some kind of adversarial images as much as machines[9,15,16].
The variations of shortcomings between human and artificial image recognition suggest that the process is very different. Human recognition is not better or worse than machine recognition, or at least is a very ill-posed problem, since we consistently neglect to take in account the knowledge and training that is needed to us to perform any recognition at all