hOpauto

Client
hOpauto
Prestations de service
SEO, Technical SEO, Crawl Audit
Équipe
François Lebreton

hOpauto is a 100% online, multi-brand automotive solution built to simplify the car-buying experience across France. Backed by over 30 years of automotive expertise from Espace 3 and its network of Nissan dealerships, hOpauto was created to modernize and streamline vehicle purchasing. The brand was born from a simple insight: buying a car can be overwhelming—from choosing among countless makes and models to navigating the financial and administrative hurdles. hOpauto reimagines this process, offering a digital-first approach that delivers convenience, transparency, and trust to today’s driver

🛠️ Case Study: Screaming Frog Crawl for a 5M+ Page Automotive Website

Client: François Lebreton – Hopauto.com
Date: April–May 2024
Service: Technical SEO Crawl Audit (Screaming Frog)


🧩 Challenge:

François reached out via a service request asking for a full Screaming Frog crawl of his automotive website, Hopauto.com. As stated, the site contains a high number of dynamically generated pages, and he needed a complete list of indexable URLs — those not excluded by the site’s robots.txt.

Initial expectations were that the site would have ~10,000 URLs.


🔍 Discovery & Findings:

Upon launching the crawl:

  • Screaming Frog quickly uncovered a massive site structure, identifying over 4 million internal URLs.
  • Further analysis revealed high crawl volume from filtered and faceted URLs, even with robots.txt exclusions in place.
  • François confirmed use of several URL parameters and facet filters, and requested that these be excluded (e.g. /km/, /prix/, /sieges/, and /p/ parameter pages).

🛠️ Solution & Technical Response:

To handle the unexpected scale and ensure accuracy, I implemented:

Updated Configuration:

  • Respected robots.txt rules using Screaming Frog’s setting to exclude disallowed paths.
  • Created custom regex exclusions for parameterized URLs (e.g., .*\/p\/.*) and unnecessary facets.

Multiple Crawl Segmentation:

  • Proposed dividing the site into crawl segments like /catalogue/, /achat/, /f/, allowing for focused diagnostics.
  • Flagged the impracticality of using Excel for datasets exceeding 1 million rows and suggested using BigQuery, Looker Studio, or database tools for analysis.

Ongoing Client Communication:

  • Provided visual progress reports, crawl stats, and crawl status updates.
  • Highlighted the cost, time, and computing resource implications of crawling 5M+ pages.
  • Asked for clarification on business goals to tailor the crawl output to actionable segments.

📊 Results:

  • Over 700,000 pages crawled within the first 48 hours.
  • Identified multiple areas for crawl budget optimization and potential indexing bloat.
  • Despite limited final communication, the project surfaced valuable insights into site structure, crawl efficiency, and dynamic URL handling at scale.

💡 Key Takeaways:

  • Robust Technical SEO Requires Flexibility: When assumptions about site size are wrong, scalable strategies and tools are essential.
  • Crawling ≠ Crawling Everything: Targeted segmentation based on business goals leads to clearer, more useful insights.
  • Transparency Builds Trust: Frequent updates, technical transparency, and clear scope boundaries are vital when projects evolve beyond their initial scope.

🚀 Services Delivered:

Strategic Segmentation Recommendations to provide seo services

Full-site Screaming Frog Configuration

robots.txt + Parameter Exclusion

Regex Filtering for Custom Crawl Rules

Large-Scale Crawl Handling Strategy

Crawl Progress Reporting & Visual Logs

“I’m just amazed by the volume of the number of pages 😉”
François Lebreton, Hopauto.com

Laisser un commentaire

Votre adresse courriel ne sera pas publiée. Les champs obligatoires sont indiqués avec *