pdq_hash_enricher.pdq_hash_enricher#
PDQ Hash Enricher for generating perceptual hashes of media files.
The PdqHashEnricher processes media files (e.g., images) in Metadata objects and calculates perceptual hashes using the PDQ hashing algorithm. These hashes are designed specifically for images and can be used for detecting duplicate or near-duplicate visual content.
This enricher is typically used after thumbnail or screenshot (antibot) enrichers to ensure images are available for hashing.
Module Contents#
- class pdq_hash_enricher.pdq_hash_enricher.PdqHashEnricher#
Bases:
auto_archiver.core.EnricherCalculates perceptual hashes for Media instances using PDQ, allowing for (near-)duplicate detection. Ideally this enrichment is orchestrated to run after the thumbnail_enricher.
- enrich(to_enrich: auto_archiver.core.Metadata) None#
Enriches a Metadata object with additional information or context.
Takes the metadata object to enrich as an argument and modifies it in place, returning None.
- calculate_pdq_hash(filename)#