How to extract intelligence from images and videos

Image and Video Forensics: Extracting Intelligence from Visual Media
How to extract intelligence from images and videos
Summary

Images and videos feel immediate and self-explanatory. They present a scene, capture a moment, and appear to offer direct access to reality. For that reason, they are often treated as the most trustworthy form of evidence in open-source investigations.

Yet visual media is rarely as transparent as it seems. Every image is a composition of decisions: where the camera was positioned, what was included or excluded, how the frame was composed, and what happened just outside the shot. Every video is a sequence of moments selected from a much larger unfolding reality. Even when authentic, visual media is always partial.

For investigators, this makes images and videos both powerful and delicate. They can reveal what other sources cannot, but they also demand careful interpretationThe goal of image and video forensics is not to “look at” visual media. It is to read it.

The case: the "He will not divide us" flag

One of the most famous examples of community-driven image and video forensics occurred in 2017 during actor Shia LaBeouf’s art project, “He Will Not Divide Us.”

After repeated disruptions at previous installations, a livestream was set up showing only a flag flying against the sky. The location was intentionally kept secret. Members of online communities, particularly users on 4chan, collaborated to determine the flag’s location using open-source intelligence techniques.

So how did the community crack the case with the information they had? 

Looking beyond the subject

The first instinct when analysing an image is to focus on what is immediately visible: a person, a building, an object, or an event. This is also where most superficial interpretations begin and end.

Experienced investigators take a different approach. They treat the main subject as only one layer of information among many.

A photograph is not a single statement. It is an environment captured in fragments. The subject may be the focal point, but the surrounding details often contain the most valuable intelligence

A street sign partially visible in the background. The type of vehicles passing through the scene. The condition of the roads. The style of architecture. Even the quality of light can provide meaningful clues.

Individually, these details may seem irrelevant. Together, they begin to describe a place, a time, and a context.

The environment as evidence

Every image contains an environment that existed independently of the camera. This environment is often more informative than the subject itself. 

Urban layouts, vegetation, infrastructure design, signage conventions, and even commercial branding styles can all point toward geographic regions. In many cases, the environment can narrow possibilities long before any direct identifier is found.

Investigators learn to treat environments as structured information rather than background noise. A road is not just a road. It is a specific construction style, governed by regional standards. A power line is not just a cable. It reflects local engineering practices. A storefront is not just a business. It is part of an economic and cultural landscape.

Once this shift in perception occurs, visual analysis becomes significantly more powerful.

Shadow analysis

One of the most subtle forms of intelligence in visual media is time. Unlike textual data, images do not explicitly state when they were captured. Yet time is embedded in nearly every visual scene.

Shadows stretch in predictable ways depending on the position of the sun. Weather conditions reveal seasonal context. Clothing choices reflect temperature and cultural expectations. Even vegetation can indicate time of year.

These elements rarely provide exact timestamps on their own. Instead, they create a range of possibilities that can be refined through correlation. 

Videos extend this dimension further by introducing motion. Changes in lighting, traffic density, or human activity patterns can help situate a recording within a specific time window. Time in visual forensics is rarely absolute. It is reconstructed through accumulation.

Movement as behaviour

Still images freeze a moment. Videos reveal behaviour. This distinction is critical. Behaviour often contains more intelligence than static content

Movement patterns, interactions between individuals, reactions to events, and the sequence of actions all provide insight into intent and context. Who leads and who follows. Who reacts and who initiates. Who appears comfortable and who appears constrained. 

These behavioural cues are often subtle, but they are consistent. People tend to behave in recognisable ways when placed in familiar or unfamiliar environments, under stress, or within structured groups.

In group settings, positioning alone can reveal relationships that are not explicitly stated. Individuals who consistently appear together, stand near each other, or interact in predictable ways may be part of a structured association.

Visual media allows investigators to observe these dynamics directly.

The importance of perspective

Every image is shaped by perspective. This includes both technical perspective and human intention.

The position of the camera determines what is included in the frame and what is excluded. Slight shifts in angle can dramatically alter interpretation. A zoomed-in image may obscure context. A wide-angle shot may introduce misleading spatial relationships.

Beyond the technical aspect, there is also intentional perspective. People choose what to capture and what to omit. These choices are rarely neutral.

A photograph taken to document an event may differ significantly from one taken to portray it. Investigators must therefore ask not only what is shown, but why it is shown in that way. What is outside the frame can be as important as what is inside it.

Authenticity is only the first question

A common misconception in visual analysis is that the primary challenge is determining whether an image or video is real. While authenticity is important, it is only the beginning of the investigation. Even genuine media can be misleading if it is presented without context or interpreted incorrectly.

An authentic image may be taken in one location but presented as another. A real video may capture a genuine event but omit critical preceding or subsequent moments. Selective framing can distort meaning without altering factual accuracy.

This is why investigators separate authenticity from interpretation. First, they ask whether the media is genuine. Then they ask what it actually represents.

Visual consistency and contradiction

One of the most effective analytical approaches in image and video forensics is the search for consistency. Authentic environments tend to be internally coherent. Lighting behaves consistently across surfaces. Shadows align with light sources. Reflections match physical objects. Environmental elements correspond with one another.

When inconsistencies appear, they warrant attention.

This does not automatically indicate manipulation. It may simply reflect unusual conditions or limited visibility. However, inconsistencies create uncertainty, and uncertainty must be resolved through additional evidence.

The same principle applies to narrative consistency. Do all elements of the scene align with the claimed context? Do visual cues support the stated location, time, or event? When they do not, investigators must reassess their assumptions.

Corroboration across sources

Visual media is rarely interpreted in isolation. Images and videos gain meaning when compared with other sources of information.

A photograph may be cross-referenced with maps. A video may be compared with news reports. A visual scene may be matched against satellite imagery or historical records. Even unrelated social media posts can provide contextual alignment.

Corroboration transforms visual analysis from interpretation into validation. A single image suggests possibilities. Multiple aligned sources strengthen conclusions. Investigators therefore treat visual evidence as part of a broader network of information rather than a standalone truth.

The discipline of interpretation

Perhaps the most important skill in image and video forensics is restraint. Visual media is persuasive. It feels immediate and authoritative. This can lead to overconfidence in interpretation. Experienced investigators resist this impulse.

They distinguish between what is observed and what is inferred. They avoid drawing conclusions from isolated details. They remain open to alternative explanations.

This discipline is what separates observation from analysis. The goal is to construct the most supported interpretation, not the most compelling one.

From pixels to understanding

When approached systematically, visual media becomes more than documentation. It becomes a structured source of intelligence.

An image can reveal location, time, behaviour, relationships, and intent. A video can show sequences of actions, interactions, and environmental context. Together, they form a multidimensional record of events.

But this intelligence does not emerge automatically. It must be extracted through careful observation, structured reasoning, and continuous verification.

The investigator does not simply see what is in the frame. They reconstruct what the frame represents.

The "He will not divide us" flag investigation

The ultimate objective of image and video forensics is to transform visual observations into intelligence that supports decision-making. The “He will not divide us” investigation is a direct example of the techniques we have seen so far.  

Investigators had almost no obvious clues. The livestream showed a flag, the sky, aircrafts occasionally passing overhead, and sounds from the environment. Despite this limited information, the community identified the location by combining multiple observations from the available cues.

By learning to examine visual media systematically, investigators can uncover hidden details, validate claims, reconstruct events, and generate actionable intelligence from sources that many observers would overlook.

Seeing the invisible

Image and video forensics sits at the intersection of observation and interpretation. It requires attention to detail, awareness of context, and a disciplined approach to uncertainty.

Visual media is powerful because it feels complete. In reality, it is always partial. The investigator’s task is to fill in the gaps responsibly, using environmental cues, temporal indicators, behavioural patterns, and corroborating evidence.

When done carefully, visual analysis transforms from passive viewing into active reconstruction. It allows investigators to move beyond what is shown and begin understanding what is happening, where it is happening, and why it matters.

The next article in this series explores Domain and Website Investigation Techniques, where the focus shifts from visual evidence to the digital infrastructure that supports online identities, organisations, and campaigns.

Share this post :