Representational straightening of natural movies in robust feedforward neural networks

Toosi, Tahereh
Issa, Elias B
November 15, 2022

Society for Neuroscience Annual Meeting (SfN 2022), Nanosymposium 513.03, San Diego, CA

The idea of temporal straightening was proposed as a way to make prediction of the next frame possible in natural movie sequences, thus contiguous movie frames should ideally be represented by a linear trajectory in the underlying neural feature space. Prior work established straightening in neural representations of the primate primary visual cortex (V1) and perceptual straightening in human behavior relative to trajectories of movies in the intensity domain. In contrast to biological vision, artificial feedforward neural networks (ANNs) did not demonstrate this phenomenon as they were not explicitly optimized to produce smooth representations over time and are typically not trained on natural movies. Thus, it remained unclear whether and how such straightened representations could be produced in computational models and whether this would indeed lead to models that better predict brain data. Here, we show that standard feedforward ANNs can indeed produce straightened representations of natural movies under certain forms of training for robustness to input noise - using static images without any natural movie exposure. Furthermore, these improvements in a model’s representational straightening metric correlated with increased predictivity of neural data in primate V1 whereas other previously proposed metrics for brainlike representations, such as adversarial robustness, were not as strongly correlated with V1 neural predictivity. Thus, this work demonstrates that a proposed hallmark of biological vision, temporal straightening, is particularly diagnostic of the most brain-like models of early visual cortex and perhaps surprisingly, straightening of movie sequences comparable to primate V1 can be realized in ANN models without fundamentally changing their architecture or direct training on natural movie statistics; rather, a simple bio-plausible constraint such as robustness to input noise can lead to learning a manifold geometry for natural stimuli which exhibits brain-like behavior.