Depth Limiting and Path Filtering in Lighthouse Parade

Depth Limiting and Path Filtering in Lighthouse Parade

At 4/19/2024

In case you missed it, last month we released Lighthouse Parade, a CLI tool to automatically run and aggregate Lighthouse performance reports across an entire site. One of the most requested features has been the ability to limit which pages are crawled. We’re excited to release Lighthouse Parade 1.1, which introduces three new flags to accommodate these use cases.

We can install and run lighthouse-parade using npx, and we will use cloudfour.com as our example site.

npx lighthouse-parade https://cloudfour.com
Code language: JavaScript (javascript)

At a glance this doesn’t look like a large site, but when you consider all the blog posts and indexes, there are a lot of pages to run Lighthouse on, so it will take a while. We can reduce the number of pages that are crawled by limiting the crawl depth using the new --max-crawl-depth flag. Depth limiting allows you to control how far to traverse—how many “clicks” the crawler will take. We’ll set it to two so that it crawls the home page and only pages that are linked directly from the home page:

npx lighthouse-parade https://cloudfour.com --max-crawl-depth 2
Code language: JavaScript (javascript)

This helps speed up the crawling (only twelve pages get crawled). But maybe we want to crawl more pages than that. Let’s bump up the crawl depth to three, and filter out blog posts (which have URLs like https://cloudfour.com/thinks/*). The new --exclude-path-glob flag lets us do that. Keep in mind that in order for the glob to work, it has to be specified in quotes, otherwise your shell will try to expand it.

npx lighthouse-parade https://cloudfour.com --max-crawl-depth 3 --exclude-path-glob "/thinks/*"
Code language: Bash (bash)

This works pretty well. It provides a broader picture of the site’s performance than simply limiting the depth to 2 (specifically, this covers more kinds of pages) without being slowed down by running Lighthouse for every single blog post.

This option is especially useful on e-commerce sites where you wouldn’t want Lighthouse to run on every single product page.

Going back to the cloudfour.com example, maybe we don’t want to limit the depth, but we still want to exclude blog posts. If we tried that, we would see it starting to pick up sitemap pages like https://cloudfour.com/sitemap-pt-post-2020-12.html, and paginated links, so we’ll exclude those too by passing the --exclude-path-glob flag two more times:

npx lighthouse-parade https://cloudfour.com --exclude-path-glob "/thinks/*" --exclude-path-glob "/sitemap-*" --exclude-path-glob "**/page/*"
Code language: Bash (bash)

We’ll look at another example to show the last new flag, --include-path-glob. Maybe we want to run Lighthouse only on blog posts, so that we can see which blog posts might have unoptimized images or other resources that slow them down. The --include-path-glob allows us to ignore any URL that doesn’t match the specified glob:

npx lighthouse-parade https://cloudfour.com --include-path-glob "/thinks/*"
Code language: Bash (bash)

Another example would be sites that are internationalized with URL prefixes like /en/. The --include-path-glob flag could be used to make it so that Lighthouse only runs on one version of translated pages.

Combining these new features gives you fine-grained control over which pages are crawled. We hope that these new features are helpful, and feel free to leave feedback on GitHub!

Copyrights

We respect the property rights of others and are always careful not to infringe on their rights, so authors and publishing houses have the right to demand that an article or book download link be removed from the site. If you find an article or book of yours and do not agree to the posting of a download link, or you have a suggestion or complaint, write to us through the Contact Us, or by email at: support@freewsad.com.

More About us