
hOpauto
hOpauto is a 100% online, multi-brand automotive solution built to simplify the car-buying experience across France. Backed by over 30 years of automotive expertise from Espace 3 and its network of Nissan dealerships, hOpauto was created to modernize and streamline vehicle purchasing. The brand was born from a simple insight: buying a car can be overwhelming—from choosing among countless makes and models to navigating the financial and administrative hurdles. hOpauto reimagines this process, offering a digital-first approach that delivers convenience, transparency, and trust to today’s driver
🛠️ Case Study: Screaming Frog Crawl for a 5M+ Page Automotive Website
Client: François Lebreton – Hopauto.com
Date: April–May 2024
Service: Technical SEO Crawl Audit (Screaming Frog)
🧩 Challenge:
François reached out via a service request asking for a full Screaming Frog crawl of his automotive website, Hopauto.com. As stated, the site contains a high number of dynamically generated pages, and he needed a complete list of indexable URLs — those not excluded by the site’s robots.txt
.
Initial expectations were that the site would have ~10,000 URLs.
🔍 Discovery & Findings:
Upon launching the crawl:
- Screaming Frog quickly uncovered a massive site structure, identifying over 4 million internal URLs.
- Further analysis revealed high crawl volume from filtered and faceted URLs, even with
robots.txt
exclusions in place. - François confirmed use of several URL parameters and facet filters, and requested that these be excluded (e.g.
/km/
,/prix/
,/sieges/
, and/p/
parameter pages).
🛠️ Solution & Technical Response:
To handle the unexpected scale and ensure accuracy, I implemented:
✅ Updated Configuration:
- Respected
robots.txt
rules using Screaming Frog’s setting to exclude disallowed paths. - Created custom regex exclusions for parameterized URLs (e.g.,
.*\/p\/.*
) and unnecessary facets.
✅ Multiple Crawl Segmentation:
- Proposed dividing the site into crawl segments like
/catalogue/
,/achat/
,/f/
, allowing for focused diagnostics. - Flagged the impracticality of using Excel for datasets exceeding 1 million rows and suggested using BigQuery, Looker Studio, or database tools for analysis.
✅ Ongoing Client Communication:
- Provided visual progress reports, crawl stats, and crawl status updates.
- Highlighted the cost, time, and computing resource implications of crawling 5M+ pages.
- Asked for clarification on business goals to tailor the crawl output to actionable segments.
📊 Results:
- Over 700,000 pages crawled within the first 48 hours.
- Identified multiple areas for crawl budget optimization and potential indexing bloat.
- Despite limited final communication, the project surfaced valuable insights into site structure, crawl efficiency, and dynamic URL handling at scale.
💡 Key Takeaways:
- Robust Technical SEO Requires Flexibility: When assumptions about site size are wrong, scalable strategies and tools are essential.
- Crawling ≠ Crawling Everything: Targeted segmentation based on business goals leads to clearer, more useful insights.
- Transparency Builds Trust: Frequent updates, technical transparency, and clear scope boundaries are vital when projects evolve beyond their initial scope.
🚀 Services Delivered:
Strategic Segmentation Recommendations to provide seo services
Full-site Screaming Frog Configuration
robots.txt + Parameter Exclusion
Regex Filtering for Custom Crawl Rules
Large-Scale Crawl Handling Strategy
Crawl Progress Reporting & Visual Logs
“I’m just amazed by the volume of the number of pages 😉”
— François Lebreton, Hopauto.com