Meta, which owns Instagram, Facebook and WhatsApp, recently released its scientific neural cognition model, TRIBEv2. Compared to a typical new Instagram update, this model represents a line of neuroscience research that scientists have been developing for decades. Meta is already well known for incorporating addictive algorithms into its social media platforms to predict users’ interests and engagement, in turn improving overall user experience. Its new model, released last month, achieves a highly accurate artificial brain model that shows which parts of the human brain activate in response to different visual, auditory and language stimuli. In simpler terms, scientists have built a copy of the human brain that can show what happens in your brain in response to what you hear, see and process.
To understand the significance of TRIBEv2, it is important to first understand what brain activity means and how it might affect us. The human brain contains different regions responsible for distinct cognitive functions. When these functions are activated, these regions undergo different changes. These changes are primarily electrical — captured using electroencephalography, a neuroimaging technique — and hemodynamic, which can be measured using functional near-infrared spectroscopy. By analyzing electrical frequencies, observing changes in those frequencies and tracking oxidation levels by monitoring hemoglobin changes in red blood cells through light absorption, researchers gain insight into brain activation. These measurements can provide information about an individual’s emotional state, stress level and concentration. Recent studies have also demonstrated the ability to approximate aspects of inner speech through brain activity.
This does not mean that Meta can predict your inner thoughts while you consume media. Because the human brain is highly individualized, the model is a generalization that can, to some extent, track patterns in emotional response and concentration. Most advanced brain-reading studies, such as those on inner speech decoding, focus on improving resolution and accuracy using small subject groups. These models are typically trained and validated on limited datasets rather than broadly diverse populations.
The TRIBEv2 study used 1,000 hours of functional near-infrared spectroscopy data from 720 subjects, providing a large dataset for training. By using an artificial neural network, researchers built a highly accurate model. While previous models captured brain activation in response to limited stimulus types (such as only visual or auditory input), TRIBEv2 presents a unified “tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions.” So when you watch a reel, the model can estimate how different regions of your brain activate and predict your cognitive response based on those activation patterns.
TRIBEv2 functions by combining three existing models — Llama-3.2-3B for text, Wav2Vec-BERT-2.0 for audio and Video-JEPA-2-Giant for video — into a larger system that processes multimodal inputs. This integration creates a model that approximates how the human brain responds to multimedia content. The model’s performance is evaluated by comparing predicted functional magnetic resonance imaging responses with established benchmarks such as the finite impulse response model, a current standard in the field. The study found that TRIBEv2 demonstrated a “log-linear increase” in performance as more training data was added, with “no plateau,” indicating that the model keeps improving as additional data is incorporated.
Most importantly, Meta has made this research publicly available. TRIBEv2’s weights, code and interactive demos are available online, allowing anyone to explore and test the model. Just as the introduction of chatbots, which has promised benefits for future medical and technological advancements, this model offers potential benefits. While Meta may use TRIBEv2 to make its applications more addicting, the research also has broader implications for understanding how media affects the user’s cognitive state.



