New Current Biology paper by Sudhanshu Srivastava finds emergent human-like covert attention in CNNs

It might help explain attention-like behaviors in simple organisms from mice to fruit flies. 

October 3, 2024
Convolutional neural networks applied to landmark tasks generate behavioral signature of human-like covert attention without any explict built-in mechanism that alters processing at the cued or in context location.
Convolutional neural networks applied to landmark tasks generate behavioral signature of human-like covert attention without any explict built-in mechanism that alters processing at the cued or in context location.

Covert visual attention is the ability to focus on part of a scene without moving your eyes. For example, if an observer gets a hint about where a target is likely to appear, they can often find it faster or more accurately without even looking directly at that spot. In humans, these performance boosts from cues have long been explained by the idea of an “attention spotlight” or a limited mental resource that enhances processing at the cued location. However, such concepts are abstract and hard to tie to actual neurons in the brain. This study asked whether a simpler explanation might be possible: can a straightforward artificial neural network – specifically a feedforward convolutional neural network (CNN) – end up showing the same kind of attention-like behavior just by learning to solve visual tasks, even without any built-in attention mechanism?

Graduate student Sudhanshu Srivistava tested this by training a CNN on three classic visual attention tasks. In one task, a brief cue (such as an arrow or a highlighted box) indicated where a target was likely to appear, and people typically respond faster when this cue correctly predicts the target’s location. Another task was a visual search: the target had to be found among many distracting items, and reducing the number of distractions (effectively cueing fewer locations) made the search easier. The third was a contextual cueing task, where certain background layouts were repeated over time so that the context itself became a clue to the target’s location. The CNN was trained to perform all these tasks, but importantly, it wasn’t programmed with any explicit “attention” module or strategy – it simply learned by adjusting its internal connections to maximize its success in detecting the targets.

Remarkably, the CNN did end up exhibiting human-like attention behaviors across all three tasks. It became more accurate when given a valid location cue, had a harder time when more distractors were present, and improved at finding targets when the surrounding context was familiar – all mirroring the patterns seen in human observers. And it achieved this with no special attention mechanism built in; these helpful “focus” effects emerged naturally from the network learning to optimize its performance. In fact, the magnitude of the cueing and context benefits in the model was comparable to those measured in people, and not far to what a theoretical ideal observer would achieve by using all the available hints optimally. Moreover, these attention-like effects in the CNN proved to be robust: they appeared under different training conditions and even when the network was made much smaller, indicating that the phenomenon did not rely on any particular architecture or a limited resource capacity.

These findings are striking because they show that human-like attention patterns can emerge in a neural network without any explicitly designed attention mechanism. In other words, simply training the model to perform its visual task well made it learn where to focus on its own, as a natural by-product of trying to maximize accuracy. This suggests that we might not need to invoke a special “spotlight” or assume a fixed mental energy for attention – such effects could arise automatically from any system that is optimizing its use of information. Notably, the study may also explain how very simple creatures with no neocortex (like certain fish or insects) manage to show similar attention-related behaviors. For example, even an archerfish or a fruit fly – which have far fewer neurons than mammals – can still benefit from cues and context in their environment, and the CNN model offers a plausible mechanism for how that’s possible. Overall, this research highlights a fundamental principle: complex “attentional” behaviors can spontaneously emerge from basic learning processes, bridging insights between artificial networks and the attention abilities of living organisms.

 Link to the paper: https://www.cell.com/current-biology/fulltext/S0960-9822(23)01758-X

Link to UCSB news article: https://neuroscience.ucsb.edu/news/all/2024/ai-learns-pay-covert-attention