Web Tool Routing

Web tool routing, detection, and fallback chains for any skill or agent that accesses websites. Provides credit-aware tool selection — chooses the cheapest working tool first, escalates only on failure. Use this skill when building or running any research workflow that reads web pages or searches the web. Skills invoke this directly; agents (which can't load skills at runtime) reference this as the canonical source and carry quality-optimized overrides inline. Other skills (competitor-research) invoke this for shared routing logic and add their own domain-specific overrides.

Web Tool Routing

Shared routing logic for web access. Detects available tools, selects the cheapest one that works, escalates on failure, tracks credit spend.

Why This Exists

Different MCP tools have different free tiers, credit costs, and capabilities. Some sites block some tools. Getting the order wrong wastes paid credits on things free tools could handle, or wastes time on tools that will fail. This skill encodes the optimal routing so every research skill benefits from the same logic.

Tool Inventory

Tool	Category	Free Tier	Renewable?	Operations
WebSearch	Built-in	Unlimited	—	Web search
WebFetch	Built-in	Unlimited	—	Page read (AI-summarized)
Jina	Free MCP	10M tokens	One-time	`read_url` (page read), `search_web` (search)
Tavily	Free MCP	1,000 credits/mo	Monthly	`tavily_search`, `tavily_extract`, `tavily_crawl`, `tavily_map`
Firecrawl	Paid MCP	1,000 credits	One-time	`firecrawl_scrape`, `firecrawl_search`, `firecrawl_crawl`
ScrapingBee	Paid MCP	1,000 credits	One-time	`get_page_text`, `fast_search`, `get_google_search_results`
Playwright	Browser	Unlimited	—	`browser_navigate`, `browser_snapshot`
Chrome MCP	Browser	Unlimited	—	`navigate`, `read_page`, `computer`

Cost per Operation

Tool	Operation	Credits	Notes
WebSearch	search	FREE	Always available
WebFetch	read page	FREE	Returns AI summary, not raw content
Jina `search_web`	search	~tokens	Free tier (10M tokens shared across all Jina ops)
Jina `read_url`	read page	~2-10K tokens	Depends on page size
Tavily `tavily_search`	search (basic)	1 credit	`search_depth: "basic"`
Tavily `tavily_search`	search (advanced)	2 credits	`search_depth: "advanced"` — slower, more thorough
Tavily `tavily_extract`	read page	1 credit/URL	Clean markdown extraction
Tavily `tavily_crawl`	crawl site	1 credit/page	Multi-page crawl
Firecrawl `firecrawl_scrape`	read page	1 credit	Supports `waitFor` for JS rendering
Firecrawl `firecrawl_search`	search + scrape	1 credit	Returns scraped content, not just snippets
ScrapingBee `get_page_text`	read page (basic)	1 credit	Plain text extraction
ScrapingBee `get_page_text`	read page + JS	5 credits	`render_js=true`
ScrapingBee `get_page_text`	Cloudflare bypass	10 credits	`premium_proxy=true`
ScrapingBee + AI query	read + extract	+5 credits	Added on top of base cost
ScrapingBee `fast_search`	search	1 credit	Good for Reddit discovery
ScrapingBee `get_google_search_results`	SERP features	1 credit	PAA, Related Searches
Playwright	browser session	FREE	Slow, context-heavy
Chrome MCP	browser session	FREE	Slowest, requires extension

Routing Principle

Unlimited → Renewable → One-time → Browser

Built-in tools first (WebSearch, WebFetch) — truly unlimited, no MCP needed
Jina next — generous free tier (10M tokens), but one-time and shared across operations
Tavily next — smaller free tier (1,000 credits) but renews monthly
Firecrawl — 1,000 one-time credits, best for JS-rendered pages
ScrapingBee — 1,000 one-time credits, best for anti-bot bypass (but expensive: 5-10 credits for JS/proxy)
Browser tools last — free but slow, context-heavy, use only when all API tools fail

Tool Detection

Before starting any web research, probe which MCP tools are available. Run these ToolSearch queries:

ToolSearch("+jina read")
ToolSearch("+firecrawl scrape")
ToolSearch("+scrapingbee")
ToolSearch("+tavily search")
ToolSearch("+playwright navigate")
ToolSearch("+chrome navigate")

Classify capability level:

Level	Tools Available	What You Can Do
4 (FULL)	Built-in + Jina + Firecrawl + ScrapingBee + Browser	Everything — JS rendering, anti-bot bypass, full browser fallback
3 (STRONG)	Built-in + Jina + (Firecrawl OR ScrapingBee)	Most sites — JS rendering OR anti-bot, not both
2 (BASIC)	Built-in + (Jina OR Tavily)	Good coverage — clean pages, no anti-bot bypass
1 (BASELINE)	WebSearch + WebFetch only	Always available. Search + AI-summarized reads
0 (NONE)	No web tools at all	Recommend MCP installs (see below)

At Level 0, output MCP installation commands:

claude mcp add jina -s user -- npx -y @anthropic/jina-mcp-server
claude mcp add firecrawl-mcp -s user -- npx -y firecrawl-mcp
claude mcp add scrapingbee -s user -- npx -y scrapingbee-mcp

Page Reading Fallback Chain

When you need to read a web page (product page, article, forum thread, etc.), follow this chain. Stop at the first tier that returns usable content.

TIER 1: WebFetch(url, prompt="extract [what you need]")
  Cost: FREE | Speed: Fast | Always available
  Returns AI-summarized content. Good enough for simple pages.
  ↓ fails? (JS content invisible, complex pages, empty response)

TIER 2: Jina read_url(url)
  Cost: FREE (10M token budget) | Speed: Fast
  Better extraction than WebFetch. Clean markdown output.
  Best for: EU stores, documentation, simple product pages.
  ↓ fails? (cookie wall, Cloudflare, JS-only content, network block)

TIER 3: Tavily tavily_extract(urls=[url])
  Cost: 1 credit/URL (renewable monthly) | Speed: Fast
  Good for pages where WebFetch and Jina return empty or blocked.
  ↓ fails? (blocked, timeout, anti-bot)

TIER 4: Firecrawl firecrawl_scrape(url, formats=["markdown"], waitFor=2000)
  Cost: 1 credit/page (one-time budget) | Speed: Medium
  Best for: JS-heavy pages (size selectors, dynamic pricing).
  Use waitFor=2000+ for pages that render content with JavaScript.
  ↓ fails? (credits exhausted, still blocked)

TIER 5: ScrapingBee get_page_text(url, render_js=true)
  Cost: 5 credits (one-time budget) | Speed: Medium
  For Cloudflare/heavy anti-bot: add premium_proxy=true (10 credits).
  Best for: Anti-bot protected sites, Cloudflare-fronted pages.
  ↓ fails? (credits exhausted)

TIER 6: Playwright browser_navigate(url) → browser_snapshot
  Cost: FREE | Speed: SLOW
  Full browser rendering. Context-heavy — use sparingly.
  ↓ fails? (page blocks headless browsers)

TIER 7: Chrome MCP navigate(url) → read_page
  Cost: FREE | Speed: SLOWEST
  Real browser fingerprint. Requires Chrome extension.
  ↓ ALL failed?

MARK AS: "⚠️ UNVERIFIED — all tools failed, manual check required" + include URL

Web Search Fallback Chain

When you need to search the web for pages, follow this chain.

TIER 1 (run in parallel when both available):
  WebSearch(query)                                    — FREE, built-in, broad coverage
  ∥ Jina search_web(query, gl="{code}", hl="{lang}") — FREE, adds geo-targeting
  Combine and deduplicate results from both.
  ↓ insufficient results?

TIER 2: Tavily tavily_search(query, include_domains=[...])
  Cost: 1 credit (renewable monthly) | Best for: domain-filtered search
  ↓ still insufficient?

TIER 3: Firecrawl firecrawl_search(query, limit=5)
  Cost: 1 credit (one-time budget) | Returns scraped content, not just snippets
  ↓ still insufficient?

TIER 4: ScrapingBee fast_search(query)
  Cost: 1 credit (one-time budget) | Best Reddit discovery (finds 3x more threads)
  ↓ ALL failed?

MARK AS: "SEARCH FAILED — manual check required"

Geo-Targeting Quick Reference

Always pass geographic context when available. Different tools use different parameter names:

Tool	Country Param	Language Param	Example (US English)
Jina `search_web`	`gl: "us"`	`hl: "en"`	`gl: "us", hl: "en"`
ScrapingBee `fast_search`	`country_code: "us"`	—	`country_code: "us"`
Firecrawl	`location: "United States"`	—	`location: "United States"`
Tavily `tavily_search`	`country: "United States"`	—	Rejects ISO codes like "us"

Tavily gotcha: The country parameter requires full country name ("United States"), not ISO codes ("us"). It will silently ignore invalid values.

Credit Budget Management

Tracking Template

Every research session should track credit usage:

## Credit Usage
| Tool | Credits Used | Free Tier | Remaining | Status |
|------|-------------|-----------|-----------|--------|
| WebSearch | — | unlimited | — | ✅ |
| WebFetch | — | unlimited | — | ✅ |
| Jina | ~X tokens | 10M tokens | ~Y tokens | ✅ |
| Tavily | X | 1,000/mo | Y | ✅ |
| Firecrawl | X | 1,000 | Y | ✅ |
| ScrapingBee | X | 1,000 | Y | ✅ |

Budget Alerts

At 80% of a paid tool's budget: Switch to free tools for remaining items where possible
At 100% of a paid tool's budget: Stop using that tool entirely, rely on remaining tiers
Jina tokens running low: Fall back to WebFetch (free, unlimited) + Tavily
All paid tools exhausted: Built-in tools only + UNVERIFIED marks for failures

Team Budget Formula (for parallel agent workflows)

per_agent_budget = total_credits / (num_agents + 1_reserve)

Example: 1,000 Firecrawl credits, 4 agents → 200 credits each, 200 reserve for leader.

When to Escalate vs. Mark UNVERIFIED

Escalate to next tier when:

Current tool returns empty/blocked content
Content is clearly incomplete (page loaded but key data missing)
Anti-bot page detected (Cloudflare challenge, CAPTCHA text in response)

Mark UNVERIFIED when:

All tiers exhausted for that URL
Item found via search snippets but product page inaccessible
Only partial information available

Stop and skip when:

Page returns 404 (content removed)
Site is geo-blocked for user's country
Content is behind login/paywall with no free access

Domain-Specific Overrides

This skill provides the default routing. Domain skills may override the order for specific use cases:

Reddit content: Most tools fail on Reddit. ScrapingBee get_page_text is typically the only working reader. See the invoking skill for Reddit-specific routing.
Comment-heavy pages: When capturing user comments matters (forums, review sites), tools that return full page content (Tavily, Firecrawl) may be preferred over those that return article body only (Jina).
E-commerce anti-bot: Marketplace sites (Amazon, Allegro, eBay) have heavy anti-bot. Search-based approaches work better than direct page scraping. See the invoking skill for site-specific patterns.
JS-rendered content: Size selectors, dynamic pricing, configuration dropdowns need JS rendering. Jump to Firecrawl (waitFor=2000) or ScrapingBee (render_js=true) — free tools won't see this content.

The invoking skill's overrides take precedence over the default chain for their specific scenarios.

Web Tool Routing

Web Tool Routing

Why This Exists

Tool Inventory

Cost per Operation

Routing Principle

Tool Detection

Page Reading Fallback Chain

Web Search Fallback Chain

Geo-Targeting Quick Reference

Credit Budget Management

Tracking Template

Budget Alerts

Team Budget Formula (for parallel agent workflows)

When to Escalate vs. Mark UNVERIFIED

Domain-Specific Overrides

Search Framework Explorer