Metadata
Metadata is the information that is included in addition to the content of a digital file, such as identifiers, dates and times, language, the client/app used, relationships with other accounts, location, counters, etc.
Common fields by entity
In SOCMINT, analysts usually work with different levels of information, or “fields by entity.” This allows them to systematically organize, analyze, and verify the data.
-
Account/profile level: Common metadata includes the unique identifier, username and display name, biography, language and time zone, creation date, number of followers/following, and links declared in the bio. These fields allow analysts to assess account maturity, detect sudden changes in the audience, and locate potential sockpuppets (alternate accounts).
-
Post level: Each post retains a
post_id, creation timestamp (and, if applicable, edit timestamp), language, publishing source or client, hashtags and mentions, embedded links, as well as interaction counters (likes, shares, replies). These data allow the reconstruction of threads (viaconversation_id/thread_id), detection of coordinated spikes, and comparison of narratives over time (timelines). -
Multimedia level: Images and videos provide resolution, duration, and sometimes EXIF metadata (date/time, device, orientation, and, if not stripped by the platform, GPS coordinates). While many platforms remove EXIF upon upload, the content itself provides visual clues (signage, license plates, terrain, shadows, weather) useful for geolocation and temporal verification.
-
Relationship level (social graph): Edges such as follow, reply, mention, quote/repost, and like describe the interaction structure. Analyzing the graph (communities, centrality, bridges) helps identify influential nodes, propagation paths, and potential coordinated campaigns.
Taxonomy
| Category | Examples |
|---|---|
| Technical Metadata | |
| Unique identifiers: UUIDs for posts, comments, and users | |
| Timestamps: Creation, modification, deletion, interaction | |
| Device information: Type, operating system, application version | |
| Connection data: IP addresses (even behind VPNs on some platforms), ASNs, providers | |
| Advanced fingerprinting: Digital signatures of browsing behavior | |
| Session telemetry: Usage patterns, uptime, action sequences | |
| Geospatial Metadata | |
| Explicit geolocation: GPS coordinates voluntarily shared | |
| Implicit geolocation: Location inferred from network connection | |
| Mobility patterns: Temporal sequences of locations | |
| Proximity data: Bluetooth and NFC information from nearby devices | |
| Geographic references: Mentions of places in textual content | |
| EXIF metadata: Geographical information embedded in images and videos | |
| Relational Metadata | |
| Interaction graphs: Communication patterns between users | |
| Engagement metrics: Frequency, intensity, and nature of interactions | |
| Affiliation data: Membership in groups, communities, and lists | |
| Influence patterns: Information cascades and content diffusion | |
| Relational timeline: Temporal evolution of connections | |
| Messaging metadata: Information about private communications (when available) | |
| Behavioral Metadata | |
| Activity patterns: Posting schedules, frequency, and rhythms | |
| Attention data: Time spent on specific content | |
| Linguistic footprints: Stylometry and writing patterns | |
| Emotional metadata: Emoji usage patterns, reactions, and sentiment | |
| Navigation sequences: User paths through the platform | |
| Consumption metadata: Content preferences and viewing time |
Use cases
- Temporal and Geographical Verification: Cross-checking timestamps with visible weather conditions and solar lighting, or with known public events; using location tags or visual inferences.
Inaccurate geotags
Few users enable GPS; many locations are imprecise manual tags.
- Narrative Reconstruction:
conversation_idand reply/mention relationships make it possible to trace the origin and amplification of a message. - Automation Detection: Exact posting cadences, improbable time windows, repeated publishing clients, and template-like texts suggest automated behavior.
- Lightweight Attribution: Correlating aliases, recurring domains, and activity schedules with other profiles or open forums.
- Impact Measurement: Reading time-stamped counters (which vary over time) to assess reach and momentum.
Noise and bias
Counters may be artificially influenced; some cameras/editors modify EXIF data, or device clocks may be incorrect.
Recommendations
-
Convert all dates to UTC ISO 8601 and retain the platform’s original timestamp value.
-
Preserve evidence by saving captures (HTML/PDF/images) and media hashes.
-
Version edited posts and clarify whether counters reflect the time of collection.
-
Tag language and detect character encodings to avoid search errors.
-
Model data as a graph (nodes: accounts/posts/locations; edges: interactions) to facilitate community analysis.