Ventral visual stream neural responses are dynamic, even for static image presentations. However, dynamical neural models of visual cortex are lacking as most progress has been made modeling static, time-averaged responses. Here, we studied population neural dynamics during face detection across three cortical processing stages. Remarkably, ~30 milliseconds after the initially evoked response, we found that neurons in intermediate level areas decreased their preference for faces, becoming anti-face preferring on average even while neurons in higher level areas achieved and maintained a face preference. This pattern of hierarchical neural dynamics was inconsistent with extensions of standard feedforward circuits that implemented recurrence within a cortical stage. Rather, recurrent models computing errors between stages captured the observed temporal signatures. Without additional parameter fitting, this model of neural dynamics, which simply augments the standard feedforward model of online vision to encode errors, also explained seemingly disparate dynamical phenomena in the ventral stream.