Developer Notes¶
Repository layout (public):
src/include/openmeta/: public headerssrc/openmeta/: implementationsrc/tools/: CLI toolssrc/python/: Python bindings and helper scriptstests/: unit tests and fuzz targets
OpenMeta structure¶
See Interpretation Status for the semantic interpretation matrix.
OpenMeta’s public architecture is organized around a small set of user-facing capabilities. Internally some of these split into more stages, but the public model should stay compact:
Area |
Purpose |
Readiness |
|---|---|---|
Decoding |
Find metadata carriers and decode EXIF, XMP, IPTC, ICC, Photoshop IRB,
JUMBF/C2PA, EXR, and related blocks into |
High, about 98-100% for the current target scope. |
Interpretation |
Normalize names and values, group entries by meaning, and classify source-bound data such as RAW crop, exposure adjustment, color/profile/source-color-transform evidence, lens-correction, sensor, BMFF brand/item-property associations, item semantic counts, and primary item properties, JUMBF labels, Photoshop IRB embedded carriers plus fixed-layout, XML/text, path-record, byte-count, and descriptor-header summaries, computational, thermal, stitch/panorama capture state, and vendor-private fields. |
Medium-high, about 90%. |
Query |
Find entries by name, fuzzy term, or semantic group, then expose normalized query candidates, structured interpretation records, and bounded cross-family concept resolutions, transfer hints, and conflict flags for crop/border/active-area, exposure/gain, color/WB/profile/source-color-transform, orientation, date/time, GPS, lens-correction, computational/thermal/stitch, and RAW/source-processing fields across standard and vendor metadata. |
Medium-high, about 77-83%. |
Creation |
Build fresh metadata entries from host-provided values. |
Medium, about 55-65%. |
Editing |
Modify existing logical metadata entries while preserving valid surrounding structure. |
Medium, about 60-70%. |
Transfer |
Move metadata between files using explicit compatible-file or rendered-image safety policies. |
Medium-high, about 80-85%. |
Translation |
Project metadata between families, mainly bounded EXIF/IPTC/XMP portable mappings. |
Medium, about 60-70%. |
Writing |
Serialize metadata and write or rewrite it into target containers. |
Medium, about 65-75%. |
Adapters |
Thin integration layers for host APIs or format-specific ecosystems such as EXR, DNG SDK, LibRaw orientation mapping, and flat host exports. |
Medium, about 60-70%. |
Utilities |
Small standalone helpers such as capability queries, compatibility dumps, safety audits, tag-name lookup, and orientation conversion. |
Medium, about 65-75%. |
Query results should expose both inspection-level matches and interpreted
candidates. A crop query, for example, may match separate
DefaultCropOrigin and DefaultCropSize tags, an ActiveArea rectangle,
vendor margin fields, or a raw integer array. OpenMeta should return the source
entries, confidence, value shape, match provenance, and any normalized
interpretation rather than hiding ambiguity behind a single value.
The first experimental C++ query surface is openmeta/metadata_query.h.
It returns both raw matches and normalized candidates for crop/active-area,
exposure/gain, white balance, color/profile, lens correction, orientation,
descriptive, and RAW-processing queries.
Crop queries include DNG crop tags, ActiveArea, Phase One/Leaf raw
geometry, Fujifilm RAF raw crop/zoom rectangles, Canon aspect/crop metadata,
Nikon Capture crop bounds, Sony panorama crop margins, and fuzzy
crop/border-style XMP property paths. The non-crop queries expose per-entry
value candidates and reuse standard tag names, selected DNG tags, fuzzy XMP
paths, canonical border-margin parsing, and vendor RAW-processing
classification where applicable.
They also append grouped candidates for related DNG color matrix/calibration/
reduction/forward matrix tags, DNG white-balance vector tags, and
lens-correction table groups. Color queries expose a distinct
color_profile semantic for EXIF color-space evidence, ICC header/tag
entries, XMP ICC/profile/color-space fields, and PNG profile text carriers.
Vendor-classified MakerNote/RAW fields can also form per-family grouped
candidates for white balance, color, raw-storage, sensor, computational,
thermal, stitch/panorama, and source-processing records. RAW-processing queries
add conservative groups for black/white levels,
linearization tables, CFA/sensor layout, source geometry, raw-storage
identifiers, and source-private processing buckets.
Exposure/gain concept resolution promotes exposure time, aperture, ISO,
exposure bias, exposure program/mode, gain, and raw exposure-adjustment records
into host-visible roles, with raw exposure adjustments kept unsafe for rendered
targets. Standard EXIF exposure program/mode and gain-control values and
selected Canon/Nikon MakerNote exposure-adjacent print conversions are exposed
as bounded labels when a stable enum mapping is available.
Current source-private aliases include camera-to-XYZ/RGB matrices, creative and
picture styles, film simulation, dynamic-range processing, optical/lens
correction, white-balance gains, and raw-development terms.
Grouped candidates use matrix_set, vector_set, and table value
shapes. Color matrix sets, white-balance vector sets, and lens-correction
tables are promoted only when the numeric payloads meet conservative minimum
shapes; other records stay visible as per-entry matches/candidates. When
OPENMETA_ENABLE_RAPIDFUZZ=ON, the same query helpers also use RapidFuzz to
score near-miss XMP/property paths; default builds keep the deterministic
substring/tag matcher only. Each raw match reports exact_match,
fuzzy_match, and fuzzy_score so UI code can distinguish exact tag/name
matches from near-miss search hits.
Python Document and TransferSourceSnapshot mirror this as thin wrappers
returning the same match/candidate dictionary shape.
For code that wants an iterable semantic record stream instead of raw query
matches, use openmeta/metadata_interpretation.h. It projects query
candidates into records with query class, semantic kind, normalized shape,
confidence, source entries, and normalized geometry/value arrays where
available.
For cross-family duplicated concepts, use openmeta/metadata_concepts.h.
It currently resolves orientation, date/time, exposure/gain, color/profile,
GPS, geometry, lens-correction, and RAW-processing into candidate lists with
candidate source entries, source families, preferred entries, normalized
compare keys, parsed date/time fields, date/time precision, timezone kind, GPS
altitude-reference state, canonical geometry origin/size/rect/margins,
normalized exposure values, full normalized value vectors for grouped
matrix/vector/table records, transfer hints, compatible and rendered safety
booleans, and same-role conflict flags. This is deliberately an
inspection/policy surface; host code still decides whether a conflict should
be shown, ignored, or corrected during editing/transfer.
Read-path coverage snapshot¶
Tracked HEIC/HEIF, CR3, and mixed RAW EXIF compare gates are passing.
EXR header metadata compare gate is passing for the documented name/type/value-class contract.
MakerNote support is broad and baseline-gated; unknown tags remain lossless.
EXIF + MakerNotes (code organization)¶
Core EXIF/TIFF decoding:
src/openmeta/exif_tiff_decode.ccCRW/CIFF decode + derived EXIF bridge:
src/openmeta/crw_ciff_decode.ccVendor MakerNote decoders:
src/openmeta/exif_makernote_*.cc(Canon, Nikon, Sony, Olympus, Pentax, Casio, Panasonic, Kodak, Ricoh, Samsung, FLIR, etc.)Shared internal-only helpers:
src/openmeta/exif_tiff_decode_internal.h(not installed)Unit tests for MakerNote paths:
tests/makernote_decode_test.cc
Internal helper conventions (used by vendor decoders):
read_classic_ifd_entry(...)+ClassicIfdEntry: parse a single 12-byte classic TIFF IFD entry.resolve_classic_ifd_value_ref(...)+ClassicIfdValueRef: compute the value location/size for a classic IFD entry (inline vs out-of-line), usingMakerNoteLayout+OffsetPolicy.MakerNoteLayout+OffsetPolicy: makes “value offsets are relative to X” explicit for vendor formats.OffsetPolicysupports both the common unsigned base (default) and a signed base for vendors that require it (eg Canon).ExifContext: a small, decode-time cache for frequently accessed EXIF values.MakerNote tag-name tables are generated from
registry/exif/makernotes/*.jsonland looked up via binary search (exif_makernote_tag_names.cc).
Interop adapters¶
export-only naming/traversal surface:
src/include/openmeta/interop_export.hexport-only adapter:
src/include/openmeta/ocio_adapter.hhost-apply adapter:
src/include/openmeta/exr_adapter.hdirect bridge:
src/include/openmeta/dng_sdk_adapter.hnarrow translator:
src/include/openmeta/libraw_adapter.h
Notes:
ExportNamePolicy::ExifToolAliasandExportNamePolicy::Specare both covered by interop tests and used for split-parity workflows.Flat host-style interop naming keeps numeric unknown names (
Exif_0x....) for parity workflows.
Python binding entry points:
Document.export_names(...)Document.ocio_metadata_tree(...)Document.unsafe_ocio_metadata_tree(...)Document.dump_xmp_sidecar(...)(lossless or portable via format switch)Document.phaseone_raw_geometry()andDocument.phaseone_raw_processing()for normalized Phase One/Leaf RAW source metadata queries.Document.vendor_raw_processing(family)for Sony/Canon/Nikon/Fujifilm/Pentax/Panasonic/Olympus/Kodak/Minolta/Sigma/ Samsung/Ricoh/Apple/DJI/Google/FLIR/Casio/Sanyo/KyoceraRaw/Reconyx/HP/JVC/ GE/Motorola/Nintendo/Microsoft grouped RAW/source-processing field summaries.
C++ adapter entry points:
visit_metadata(...)inopenmeta/interop_export.his the intended base for host-owned metadata mappingsbuild_exr_attribute_batch(...)inopenmeta/exr_adapter.hexports one owned EXR-native attribute batch (part_index,name,type_name,value,is_opaque) fromMetaStorebuild_exr_attribute_part_spans(...)groups that batch into contiguous per-part spansbuild_exr_attribute_part_views(...)exposes zero-copy grouped per-part views over the same batchreplay_exr_attribute_batch(...)replays the grouped batch through explicit host callbacks
Python typed behavior:
Document.export_names(style=ExportNameStyle.FlatHost, ...)exposes the stable v1 flat-host naming contract used by host-side metadata mappings. See FlatHost Mapping Contract.Document.ocio_metadata_tree(...)is safe-by-default and raises on unsafe raw byte payloads; useDocument.unsafe_ocio_metadata_tree(...)for legacy/raw fallback output.safe API:
build_ocio_metadata_tree_safe(..., InteropSafetyError*)unsafe API:
build_ocio_metadata_tree(...)build_ocio_metadata_tree(..., const OcioAdapterRequest&)inopenmeta/ocio_adapter.h(stable flat request API)build_ocio_metadata_tree(..., const OcioAdapterOptions&)(advanced/legacy shape)
C++ XMP sidecar entry points:
dump_xmp_sidecar(..., const XmpSidecarRequest&)inopenmeta/xmp_dump.h(stable flat request API)dump_xmp_sidecar(..., const XmpSidecarOptions&)(advanced/legacy shape)
Optional dependencies¶
OpenMeta’s core scanning and EXIF/TIFF decoding do not require third-party libraries. Some metadata payloads are compressed or structured; these optional dependencies let OpenMeta decode more content:
Expat (
OPENMETA_WITH_EXPAT): parses XMP RDF/XML packets (embedded blocks and.xmpsidecars) using a streaming parser with strict limits.RapidFuzz (
OPENMETA_ENABLE_RAPIDFUZZ): opt-in semantic-query name matching for inspection/search UI. It is disabled by default; when enabled, CMake requires either arapidfuzz::rapidfuzzpackage target orOPENMETA_RAPIDFUZZ_INCLUDE_DIRpointing at headers containingrapidfuzz/fuzz.hpp.zlib (
OPENMETA_WITH_ZLIB): inflates Deflate-compressed payloads such as PNGiCCP(ICC profiles) and compressed text/XMP chunks (iTXt,zTXt).Brotli (
OPENMETA_WITH_BROTLI): decompresses JPEG XLbrob“compressed metadata” boxes so wrapped metadata payloads can be decoded.
CLI tool¶
metaread prints a human-readable dump of blocks and decoded entries
(EXIF/TIFF-IFD tags, XMP properties, IPTC-IIM datasets, ICC profile fields/tags,
and Photoshop IRB resource blocks). Output is ASCII-only and truncated by
default to reduce terminal injection risk.
metavalidate reports decode/validation issues in text or JSON and emits
machine-readable issue codes (for example xmp/output_truncated and
xmp/invalid_or_malformed_xml_text) suitable for CI gating. For draft C2PA
verification, use --c2pa-verify-require-trusted-chain when an untrusted or
missing certificate chain must fail validation instead of being reported as a
separate chain-detail signal.
Python¶
Python bindings use nanobind. The wheel also ships helper scripts as
openmeta.python.* modules.
python3 -m openmeta.python.metaread file.jpg
python3 -m openmeta.python.metadump --format portable file.jpg
python3 -m openmeta.python.metadump file.jpg output.xmp
python3 -m openmeta.python.metadump --format portable --c2pa-verify --c2pa-verify-backend auto file.jpg
python3 -m openmeta.python.metadump --format portable --c2pa-verify --c2pa-verify-require-trusted-chain file.jpg
python3 -m openmeta.python.metadump --format portable --portable-include-existing-xmp --xmp-sidecar file.jpg
openmeta.python.metatransfer remains a thin command-line wrapper. Its
--xmp-writeback, --xmp-destination-embedded,
--xmp-destination-sidecar, --output, and --force flags map directly
onto the C++ file-helper options and persistence flags. It reports sidecar and
cleanup paths returned by the C++ result instead of deriving a separate
Python-side contract. Its --target-width, --target-height,
--target-orientation, --target-samples-per-pixel,
--target-bits-per-sample, --target-sample-format,
--target-photometric, --target-planar-configuration,
--target-compression, and --target-exif-color-space flags populate the
same target image spec used by the C++ transfer request.
Resource policy defaults¶
For C++ callers, initialize from recommended_resource_policy() and only
override fields you need:
#include "openmeta/resource_policy.h"
openmeta::OpenMetaResourcePolicy policy
= openmeta::recommended_resource_policy();
policy.jumbf_limits.max_box_depth = 24; // optional override
For JUMBF/C2PA preflight traversal checks, call
measure_jumbf_structure(bytes, policy.jumbf_limits) before full decode.
Other preflight estimate APIs use the same bounded-options model:
measure_scan_auto(file_bytes)measure_scan_jpeg(bytes),measure_scan_png(bytes),measure_scan_webp(bytes),measure_scan_gif(bytes),measure_scan_tiff(bytes),measure_scan_jp2(bytes),measure_scan_jxl(bytes),measure_scan_bmff(bytes)measure_exif_tiff(exif_bytes, exif_options)measure_xmp_packet(xmp_bytes, xmp_options)measure_icc_profile(icc_bytes, icc_options)measure_iptc_iim(iptc_bytes, iptc_options)measure_photoshop_irb(irb_bytes, irb_options)measure_exr_header(exr_bytes, exr_options)measure_jumbf_payload(jumbf_bytes, jumbf_options)
Documentation build¶
Sphinx docs require:
doxygenPython packages listed in
docs/requirements.txt
uv pip install -r docs/requirements.txt
cmake -S . -B build -DOPENMETA_BUILD_SPHINX_DOCS=ON
cmake --build build --target openmeta_docs_sphinx