WebRSS
This feeder creates a stream by scraping a website and extracting articles using CSS selectors.
It is useful for websites that do not expose an RSS/Atom feed. It is based on Colly so you can refer to it for more info on how selectors work.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| url | STRING | empty | URL of the page to scrape |
| item_selector | STRING | empty | CSS selector matching each article/item block on the page |
| title_selector | STRING | empty | CSS selector for the article title (relative to the item block) |
| link_selector | STRING | empty | CSS selector for the article link (relative to the item block) |
| desc_selector | STRING | empty | CSS selector for the article description (relative to the item block) |
| date_selector | STRING | empty | CSS selector for the article date (relative to the item block) |
| link_attr | STRING | “href” | HTML attribute to read the URL from on the matched link element |
| freq | DURATION | 60m | how often the page should be scraped |
... | <webrss: url="https://website.io/blog", item_selector="a[href^='/blog/']", title_selector="h4, h3", link_selector="self", desc_selector="p", freq="1h"> | ...Output
Text
The main field of the Message will contain the title of the article extracted via title_selector.
Extra
| Name | Description |
|---|---|
| title | title of the article extracted via title_selector |
| link | absolute URL of the article extracted via link_selector |
| description | description or excerpt of the article extracted via desc_selector |
| published_at | publication date of the article extracted via date_selector |
description and published_at will be empty if the respective selectors (desc_selector, date_selector) are not configured or the element is not found on the page.Notes
- Relative URLs are automatically resolved to absolute URLs based on the scraped page URL.
- Already seen links are tracked in memory and will not be propagated again within the same run. This prevents duplicate messages across polling cycles.
- The selectors for
title_selector,link_selector,desc_selectoranddate_selectorare evaluated relative to the element matched byitem_selector.
Examples
Soon…