utils.deletion_detection
========================

.. py:module:: utils.deletion_detection

.. autoapi-nested-parse::

   Deletion Detection Utilities

   Provides a best-effort detection of deleted, missing, or unavailable content
   across various social media platforms based on presence of expected keywords.

   This module helps identify removed content, helps to:
   - Document content that existed but was deleted
   - Track patterns of content removal
   - Preserve metadata about missing content







Module Contents
---------------

.. py:class:: DeletionIndicators

   Platform-specific indicators that content has been deleted or is unavailable, alongside generic indicators.


   .. py:attribute:: TWITTER
      :value: ["Hmm...this page doesn't exist", 'Try searching for something else', 'This Tweet is...



   .. py:attribute:: FACEBOOK
      :value: ["This content isn't available", "Sorry, this content isn't available", 'This content is no...



   .. py:attribute:: INSTAGRAM
      :value: ["Sorry, this page isn't available", 'The link you followed may be broken', 'Media not found or...



   .. py:attribute:: TIKTOK
      :value: ["Couldn't find this account", 'This video is no longer available', 'This video is currently...



   .. py:attribute:: YOUTUBE
      :value: ["This video isn't available anymore", 'Video unavailable', 'This video has been removed', 'This...



   .. py:attribute:: REDDIT
      :value: ['this post has been removed', 'this comment has been removed', '[removed]', '[deleted]', 'page...



   .. py:attribute:: VK
      :value: ['Post deleted', 'Page not found', 'Content unavailable', 'Access denied']



   .. py:attribute:: TELEGRAM
      :value: ['Message not found', 'Deleted message', 'Channel is private']



   .. py:attribute:: GENERIC
      :value: ['has been removed', 'no longer available', 'content removed', 'access denied', 'page not found']



   .. py:method:: all_indicators() -> List[str]
      :classmethod:


      Returns all deletion indicators from all platforms.



   .. py:method:: for_url(url: str) -> List[str]
      :classmethod:


      Returns platform-specific indicators based on URL domain.



.. py:function:: detect_deletion(html_content: str = None, page_title: str = None, error_message: str = None, url: str = None, video_data: dict = None) -> Optional[Dict[str, any]]

   Best-effort deletion detection across multiple signals.

   Checks HTML content, page titles, error messages, and video metadata for
   indicators that content has been deleted or is unavailable.

   :param html_content: Raw HTML source of the page
   :param page_title: Browser page title
   :param error_message: Any error message from the extractor
   :param url: The URL being archived (for platform-specific detection)
   :param video_data: Video metadata from yt-dlp or other extractors

   :returns: Dictionary with deletion details if detected, None otherwise.
             Format: {
                 "is_deleted": True,
                 "indicator": "specific text that was found",
                 "source": "html|title|error|metadata",
                 "platform": "twitter|facebook|etc"
             }


.. py:function:: flag_as_deleted(metadata, deletion_info: Dict[str, any]) -> None

   Flags metadata object as deleted/unavailable.
   Adds tentative deletion information to the metadata object.

   :param metadata: Metadata object to update
   :param deletion_info: Dictionary from detect_deletion()


