utils.deletion_detection#

Deletion Detection Utilities

Provides a best-effort detection of deleted, missing, or unavailable content across various social media platforms based on presence of expected keywords.

This module helps identify removed content, helps to: - Document content that existed but was deleted - Track patterns of content removal - Preserve metadata about missing content

Module Contents#

class utils.deletion_detection.DeletionIndicators#

Platform-specific indicators that content has been deleted or is unavailable, alongside generic indicators.

TWITTER = ["Hmm...this page doesn't exist", 'Try searching for something else', 'This Tweet is...#
FACEBOOK = ["This content isn't available", "Sorry, this content isn't available", 'This content is no...#
INSTAGRAM = ["Sorry, this page isn't available", 'The link you followed may be broken', 'Media not found or...#
TIKTOK = ["Couldn't find this account", 'This video is no longer available', 'This video is currently...#
YOUTUBE = ["This video isn't available anymore", 'Video unavailable', 'This video has been removed', 'This...#
REDDIT = ['this post has been removed', 'this comment has been removed', '[removed]', '[deleted]', 'page...#
VK = ['Post deleted', 'Page not found', 'Content unavailable', 'Access denied']#
TELEGRAM = ['Message not found', 'Deleted message', 'Channel is private']#
GENERIC = ['has been removed', 'no longer available', 'content removed', 'access denied', 'page not found']#
classmethod all_indicators() List[str]#

Returns all deletion indicators from all platforms.

classmethod for_url(url: str) List[str]#

Returns platform-specific indicators based on URL domain.

utils.deletion_detection.detect_deletion(html_content: str = None, page_title: str = None, error_message: str = None, url: str = None, video_data: dict = None) Dict[str, any] | None#

Best-effort deletion detection across multiple signals.

Checks HTML content, page titles, error messages, and video metadata for indicators that content has been deleted or is unavailable.

Parameters:
  • html_content – Raw HTML source of the page

  • page_title – Browser page title

  • error_message – Any error message from the extractor

  • url – The URL being archived (for platform-specific detection)

  • video_data – Video metadata from yt-dlp or other extractors

Returns:

Dictionary with deletion details if detected, None otherwise. Format: {

”is_deleted”: True, “indicator”: “specific text that was found”, “source”: “html|title|error|metadata”, “platform”: “twitter|facebook|etc”

}

utils.deletion_detection.flag_as_deleted(metadata, deletion_info: Dict[str, any]) None#

Flags metadata object as deleted/unavailable. Adds tentative deletion information to the metadata object.

Parameters:
  • metadata – Metadata object to update

  • deletion_info – Dictionary from detect_deletion()