firecrawl
Built by Metorial, the integration platform for agentic AI.
firecrawl
Server Summary
Scrape web pages
Crawl entire websites
Extract structured data
Automate browser interactions
The Firecrawl MCP Server provides powerful web scraping and crawling capabilities through the Model Context Protocol. It enables you to extract content from web pages in multiple formats, perform advanced browser automation tasks, and crawl entire websites with sophisticated filtering and control options. Whether you need to scrape a single page or systematically harvest data from an entire domain, this server offers the tools to get structured, clean data from the web.
Firecrawl is a comprehensive web scraping solution that goes beyond simple HTML retrieval. It handles JavaScript-heavy sites, performs browser automation, extracts structured data using AI, and provides multiple output formats including markdown, HTML, screenshots, and custom JSON schemas. The server supports both single-page scraping and large-scale website crawling with features like proxy rotation, ad-blocking, mobile emulation, and intelligent content extraction.
Scrape a single URL and return its content in various formats. This is your primary tool for extracting data from individual web pages.
Parameters:
markdown
: Clean markdown representation of the pagehtml
: Cleaned HTML contentrawHtml
: Original unprocessed HTMLlinks
: All links found on the pagescreenshot
: Visual capture of the pagesummary
: AI-generated summary of the contentjson
: Structured data extraction using a custom schema with optional promptwait
: Pause for a specified duration or until a selector appearsclick
: Click on elements matching a CSS selectorwrite
: Type text into input fieldspress
: Press keyboard keysscroll
: Scroll the page up or downscreenshot
: Capture a screenshot at this pointexecuteJavascript
: Run custom JavaScript codescrape
: Extract content at this pointpdf
: Generate a PDF of the pagebasic
, stealth
, or auto
)Start a crawl job to systematically spider an entire website or domain. This tool initiates a background job that discovers and scrapes multiple pages according to your specifications.
Parameters:
include
to use sitemap.xml or skip
to discover from links)url
: Webhook endpoint URLevents
: Events to subscribe to (started
, page
, completed
, failed
)headers
: Custom headers for webhook requestsmetadata
: Additional metadata to include in webhook payloadsscrape_url
apply to each crawled pageCheck the status and retrieve results from an ongoing or completed crawl job.
Parameters:
start_crawl
Returns: Current status, progress metrics, credits used, and scraped data from all pages.
Stop a crawl job that is currently in progress.
Parameters:
The Firecrawl MCP Server provides resource templates for accessing scraped content and crawl job data through a URI-based interface.
Access the scraped content of a specific URL.
URI Template: firecrawl://scraped/{url}
Use this resource to retrieve previously scraped content for a given URL. The URL should be properly encoded.
Access information about a specific crawl job including its status and metadata.
URI Template: firecrawl://crawl/{crawlId}
Retrieve comprehensive information about a crawl job, including its current state, configuration, and summary statistics.
Access all pages discovered and scraped during a crawl job.
URI Template: firecrawl://crawl/{crawlId}/pages
Get the complete collection of pages from a crawl job, including their content in the requested formats.
Access a specific page from a crawl job by its index position.
URI Template: firecrawl://crawl/{crawlId}/page/{pageIndex}
Retrieve an individual page from a crawl job's results using its zero-based index.
Extract web content in the format that best suits your needs. Convert web pages to clean markdown for LLM consumption, preserve HTML structure for parsing, capture visual screenshots, or extract structured data using custom JSON schemas with AI-powered extraction.
Perform complex interactions with web pages before scraping. Click buttons, fill forms, scroll to load dynamic content, wait for elements to appear, and execute custom JavaScript. These actions enable scraping of JavaScript-heavy applications and content behind interactions.
Use AI-powered extraction to get only the content you need. The onlyMainContent
option removes navigation, footers, and sidebars automatically. Custom JSON schemas with prompts allow you to extract specific structured data points using natural language instructions.
Systematically crawl entire websites with sophisticated control over what gets scraped. Use path filtering with regex patterns to include or exclude specific sections. Control crawl depth, handle subdomains, and manage concurrency for efficient data collection. Real-time webhook notifications keep you informed of progress.
Choose your proxy type based on needs: basic for speed, stealth for reliability, or auto for automatic fallback. Enable zero data retention for sensitive operations. Use caching to avoid redundant requests. Block ads and cookie popups for cleaner extraction and faster processing.
Scrape as if you're browsing from different countries with location settings. Emulate mobile devices to see mobile-optimized content. Set custom headers to match specific browser configurations.
This MCP server excels at research and data collection tasks. Use it to monitor competitor websites, aggregate news and articles, extract product information from e-commerce sites, collect real estate listings, gather job postings, archive web content, validate web page changes, or build datasets for machine learning. The combination of single-page scraping and site-wide crawling makes it suitable for both targeted extraction and comprehensive data harvesting operations.
The structured data extraction with custom schemas is particularly powerful for transforming unstructured web content into clean, typed data that can be directly used in applications or analysis pipelines. The browser automation capabilities enable scraping of modern single-page applications that traditional scrapers cannot handle.