OnPagePilot Bots
OnPagePilot operates three automated agents that access external websites on behalf of our users. Each agent serves a distinct purpose: site crawling, technical security auditing, and technology detection. All three operate exclusively on domains registered by their owners on our platform.
Our Bots
OnPagePilot Crawler
- User-agent string
OnPagePilot Crawler/2.0 (+https://onpagepilot.com/bot)- Robots.txt token
OnPagePilot- Obeys robots.txt
- Yes, fully RFC 9309 compliant
- Obeys Crawl-delay
- Yes (default: 500 ms between requests)
- Request method
- HTTP GET; headless Chromium when JavaScript rendering is required
- Purpose
- Technical SEO audits, site crawling, indexing analysis, and link discovery for owner-registered domains.
The OnPagePilot Crawler is our primary site-crawling agent. It visits websites that have been registered by their owners on our platform and collects publicly available information such as page structure, meta tags, link relationships, and technical SEO signals to power our analysis tools.
OnPagePilot Technical Auditor
- Request method
- HTTP HEAD and GET probes
- Obeys robots.txt
- N/A. Targets specific URLs registered in IndexMonitors, does not spider
- Purpose
- HTTPS/SSL certificate validation, HSTS header checks, HTTP-to-HTTPS redirect detection, and Lighthouse performance and accessibility audits.
The Technical Auditor performs targeted security and performance probes on URLs registered by site owners in our IndexMonitor system. It does not crawl or discover new pages. It checks specific endpoints for correct HTTPS configuration, security headers, and runs Lighthouse audits to measure page performance and accessibility scores.
OnPagePilot Technology Scanner
- Request method
- Headless Chromium (Playwright)
- Obeys robots.txt
- N/A. Single-page visits initiated by the site owner, does not spider
- Purpose
- Technology stack detection and screenshot capture when a site owner adds or updates a URL monitor.
The Technology Scanner is triggered when a user adds a URL monitor on our platform. It performs a single-page visit using a headless browser to detect the technology stack in use (CMS, frameworks, analytics tools, etc.) and captures a visual screenshot. It does not follow links or crawl additional pages beyond the monitored URL.
Verification
You can verify that a request is genuinely from the OnPagePilot Crawler by checking the User-Agent header. All legitimate crawler requests will contain:
OnPagePilot Crawler/2.0 (+https://onpagepilot.com/bot)
The Technical Auditor and Technology Scanner make targeted single-URL requests to domains registered by their owners and do not carry a custom user-agent token.
If you have concerns about traffic claiming to be from OnPagePilot, please contact us.
Benefits for Site Owners
Our bots help site owners maintain and improve their online presence by providing data for:
- Technical SEO Audits: the Crawler identifies broken links, missing meta tags, redirect chains, duplicate content, and other on-page issues across your entire site.
- Site Structure Analysis: the Crawler maps internal linking patterns and site architecture to reveal navigation bottlenecks and orphan pages.
- Indexing Insights: crawl data is compared against search engine behaviour to highlight pages that may be under-indexed or excluded.
- Security & HTTPS Validation: the Technical Auditor verifies SSL certificates, checks for missing HSTS headers, and detects faulty HTTP-to-HTTPS redirects before they affect visitors.
- Performance & Accessibility Scoring: Lighthouse audits run by the Technical Auditor measure page speed, accessibility compliance, and Core Web Vitals.
- Technology Detection: the Technology Scanner identifies your CMS, JavaScript frameworks, analytics tools, and other technologies so you can benchmark your stack.
Policies and Commitments
- Respect for robots.txt: our Crawler fully complies with the Robots Exclusion Protocol (RFC 9309), including Disallow, Allow, and Crawl-delay directives.
- Minimal server impact: the Crawler enforces a default delay of 500 ms between requests and automatically backs off when a server responds slowly or returns errors. The Technical Auditor and Technology Scanner make infrequent, single-URL requests.
- Owner-initiated only: all three bots operate exclusively on domains that have been registered on our platform by their owners or authorised representatives. We never crawl, probe, or scan sites without owner consent.
- No personal data collection: we process only publicly accessible page content. None of our bots attempt to access password-protected areas, submit forms, or collect personal data.
- Transparent identification: the Crawler identifies itself with a unique User-Agent string that includes a link back to this page. The Technical Auditor and Technology Scanner target only owner-registered URLs.
- GDPR compliant: our data processing complies with the EU General Data Protection Regulation. See our Legal & Privacy Policy for details.
Controlling Bot Behaviour
The OnPagePilot Crawler respects standard robots.txt directives.
You can control its behaviour using the OnPagePilot token:
Block the OnPagePilot Crawler entirely
User-agent: OnPagePilot
Disallow: /
Block specific directories
User-agent: OnPagePilot
Disallow: /private/
Disallow: /staging/
Set a custom crawl delay
User-agent: OnPagePilot
Crawl-delay: 2
Changes to your robots.txt are typically picked up within 24 hours.
Technical Auditor and Technology Scanner
These two agents do not spider your site and are not governed by robots.txt.
They only visit specific URLs that the site owner has registered on our platform.
To stop these requests, remove the corresponding monitors from your OnPagePilot account
or contact us.
For urgent requests
Our bot uses a mix of regex and machine‑learning, and it can occasionally make mistakes.
Contact Us >