antibot_extractor_enricher.dropin
=================================

.. py:module:: antibot_extractor_enricher.dropin




Module Contents
---------------

.. py:class:: Dropin(sb: seleniumbase.SB, extractor: auto_archiver.core.Extractor)

   A class to handle drop-in functionality for the antibot extractor enricher module.
   This class is designed to be a base class for drop-ins that can handle specific websites.


   .. py:method:: documentation() -> Mapping[str, str]
      :staticmethod:


      Each Dropin should auto-document itself with this method.
      Return dictionary can include:
      - 'name': A string representing the name of the dropin.
      - 'description': A string describing the functionality of the dropin.
      - 'site': A string representing the site this dropin is for.
      - 'authentication': A dictionary with authentication example for the site.




   .. py:attribute:: sb
      :type:  seleniumbase.SB


   .. py:attribute:: extractor
      :type:  auto_archiver.core.Extractor


   .. py:method:: suitable(url: str) -> bool
      :staticmethod:

      :abstractmethod:


      Check if the URL is suitable for processing with this dropin.
      :param url: The URL to check.
      :return: True if the URL is suitable for processing, False otherwise.



   .. py:method:: sanitize_url(url: str) -> str
      :staticmethod:


      Used to clean URLs before processing them.



   .. py:method:: images_selectors() -> str
      :staticmethod:


      CSS selector to find images in the HTML page



   .. py:method:: video_selectors() -> str
      :staticmethod:


      CSS selector to find videos in the HTML page.



   .. py:method:: js_for_image_css_selectors() -> str

      A configurable JS script that receives a css selector from the dropin itself and returns an array of Image elements according to the selection.

      You can overwrite this instead of `images_selector` for more control over scraped images.



   .. py:method:: js_for_video_css_selectors() -> str

      A configurable JS script that receives a css selector from the dropin itself and returns an array of Video elements according to the selection.

      You can overwrite this instead of `video_selector` for more control over scraped videos.



   .. py:method:: open_page(url) -> bool
      :abstractmethod:


      Make sure the page is opened, even if it requires authentication, captcha solving, etc.
      :param url: The URL to open.
      :return: True if success, False otherwise.



   .. py:method:: add_extra_media(to_enrich: auto_archiver.core.Metadata) -> tuple[int, int]

      Extract image and/or video data from the currently open post with SeleniumBase. Media is added to the `to_enrich` Metadata object.
      :return: A tuple (number of Images added, number of Videos added).



   .. py:method:: hit_auth_wall() -> bool

      Custom check to see if the current page is behind an authentication wall, if True is returned the default global auth wall detector is used instead. If false, no auth wall is detected and the page is considered open.



