save_data
data
: Rows of data (as dictionaries) to savesource_url
: Optional URL to associate with the data, defaults to current page URL. Only use this if the source of the data is different than the current page when the data is savedSchemaValidationError
: If any of the saved data does not match the provided schemaenqueue
urls
: urls to enqueuecontext
: additional context to pass to the next run of the next stage/url. Typically just data that is only available on the current page but required in the schema. Only use this when some data is available on this page, but not on the page that is enqueued.options
: job level options to pass to the next stage/urlpaginate
sdk.paginate
at the end of your scrape function. The element will automatically be used to paginate the site and run the scraping code against all pages
Pagination will conclude once all pages are reached no next page element is found.
This method should ALWAYS be used for pagination instead of manual for loops and if statements.get_next_page_element
: the url or ElementHandle of the next pagetimeout
: milliseconds to sleep for before continuing. Only use if there is no other wait optioncapture_url
clickable
: the element to clickresource_type
: the type of resource to capturetimeout
: the time to wait for the new page to open (in ms)ValueError
: if more than one page is created by the click eventcapture_download
url
and filename
capture_html
selector
: CSS selector of element to capture. Defaults to “html” for the document element.exclude_selectors
: List of CSS selectors for elements to exclude from capture.soup_transform
: A function to transform the BeautifulSoup html prior to saving. Use this to remove aspects of the returned contenthtml_converter_type
: Type of HTML converter to use for the inner text. Defaults to “markdown”.html
of the element, the formatted text
of the element, along with the url
and filename
of the document
Raises:
ValueError
: If the specified selector doesn’t match any element.capture_pdf
url
and filename
Example:
log
print
and console.log
if a browser is running
Concatenates all arguments with spaces.
Args:
*args: Values to log (will be concatenated)
Signature: