Sitemap URL Inspector

Inspect and validate a sitemap.xml (or sitemap index), including .xml.gz sitemaps. Follow redirects, parse up to a configurable number of URLs, highlight common SEO/crawler issues, and export JSON/PDF reports.

Loading…

About Sitemap URL Inspector

A clean sitemap helps search engines discover, crawl, and understand your URLs efficiently. This tool fetches a sitemap URL, supports redirects and gzipped sitemaps, parses entries (including sitemap indexes), and surfaces common problems such as invalid structure, missing <loc>, suspicious <lastmod>, and other crawler pitfalls. Export the results as JSON/PDF to track fixes over time.

Features

  • Parse standard sitemaps and sitemap indexes (sitemap-of-sitemaps).
  • Supports gzipped sitemaps (.xml.gz) for real-world large sites.
  • Optional redirect following to audit the final fetched sitemap URL.
  • Configurable parsing limit (max URLs to parse) to keep audits fast and predictable.
  • Validates core sitemap fields and highlights missing/invalid tags (especially <loc>).
  • Extracts and reviews <lastmod> usage for consistency and crawler friendliness.
  • Helps spot sitemap patterns relevant to multi-locale SEO (e.g., URL grouping and hints for hreflang strategies).
  • Copyable findings and summaries for SEO tickets and debugging.
  • Export reports as JSON or PDF for documentation, sharing, and regression tracking.

🧭 How to use for sitemap-url-inspector

1

Paste your sitemap URL

Enter the full sitemap URL. This can be a regular XML sitemap or a gzipped sitemap ending with .xml.gz.

2

Enable “Follow Redirects” if needed

If your sitemap URL redirects (http→https, non-www→www, CDN rewrites), enabling redirects ensures the tool fetches the final sitemap location.

3

Set “Max URLs to parse”

Choose how many URL entries to parse. Use smaller limits for quick checks, larger limits for deeper audits (up to the tool's cap).

4

Review validation results and URL stats

Look for structural issues (missing <loc>, invalid dates, unexpected formats) and any warnings that could affect crawling and indexing.

5

Export the report (JSON/PDF)

Download a JSON or PDF report to attach to SEO tasks, share with teammates, or compare before/after changes.

Technical specs

Supported inputs

The tool is designed to fetch and parse sitemaps served over HTTP(S), including compressed variants.

Input typeExamplesNotes
XML sitemap[https://example.com/sitemap.xml](https://example.com/sitemap.xml)Parses <urlset> entries.
Sitemap index[https://example.com/sitemap_index.xml](https://example.com/sitemap_index.xml)Parses <sitemapindex> and nested sitemap URLs.
Gzipped sitemap[https://example.com/sitemap.xml.gz](https://example.com/sitemap.xml.gz)Fetches and parses compressed sitemaps.

Fetch behavior and limits

Request behavior is tuned for predictable performance and crawler-like constraints.

SettingBehaviorDefault
Follow RedirectsFollows redirects when fetching the sitemap URLEnabled
Max RedirectsMaximum redirects followed when enabled10
TimeoutRequest timeout budget20000 ms
Max URLs to parseLimits how many entries are parsed from the sitemap content500 (range 10–5000)
User-AgentRequest identification headerEncode64Bot/1.0 (+[https://encode64.com](https://encode64.com))
Private networksBlocks private-network targetsNot allowed

What validation focuses on

The inspector prioritizes issues that commonly break sitemap ingestion or reduce crawl efficiency: missing/invalid <loc>, malformed XML structures, suspicious or inconsistent <lastmod>, and patterns that can confuse crawlers when sitemaps are generated incorrectly.

A sitemap can be valid XML but still low-quality for SEO. Use findings to improve clarity, consistency, and maintainability.

Command line

Use curl (or PowerShell) to debug sitemap fetching and redirects the same way crawlers do.

macOS / Linux

Fetch sitemap headers (no redirect)

curl -I [https://example.com/sitemap.xml](https://example.com/sitemap.xml)

Check status code, content-type, and caching headers.

Follow redirects and fetch headers

curl -IL [https://example.com/sitemap.xml](https://example.com/sitemap.xml)

Useful when a sitemap URL is redirected by CDN or HTTPS canonicalization.

Download sitemap content (preview)

curl -s [https://example.com/sitemap.xml](https://example.com/sitemap.xml) | head -n 40

Quickly inspect the XML prolog and root tags.

Inspect a gzipped sitemap (preview)

curl -s [https://example.com/sitemap.xml.gz](https://example.com/sitemap.xml.gz) | gzip -dc | head -n 40

Decompress and preview the beginning of a .xml.gz sitemap.

Windows (PowerShell)

Download sitemap content

Invoke-WebRequest -Uri [https://example.com/sitemap.xml](https://example.com/sitemap.xml) | Select-Object -ExpandProperty Content

Fetches the XML body for quick inspection.

If your sitemap is huge, validate a representative subset first, then run larger parses to spot systemic generation issues.

Use cases

Validate a newly generated sitemap

Quickly verify that sitemap.xml is fetchable, well-formed, and contains correct URL entries.

  • Confirm your generator outputs valid XML structure
  • Catch missing <loc> values early

Audit gzipped sitemaps for crawler compatibility

Ensure compressed sitemaps are served correctly and parse cleanly.

  • Check .xml.gz content is readable and consistent
  • Spot CDN/proxy content-type issues

Debug redirect and canonicalization problems

Find unexpected redirects or non-200 responses that can block sitemap consumption.

  • http→https redirect chains
  • www vs non-www canonicalization

Track sitemap quality over time

Export reports and compare after releases, CMS migrations, or multi-locale expansions.

  • Before/after deploy regression checks
  • Monitor <lastmod> consistency after content updates

❓ Frequently Asked Questions

What's the difference between a sitemap and a sitemap index?

A sitemap lists URLs directly (usually under ). A sitemap index lists multiple sitemap files (under ) which is common for large sites.

Should my sitemap include <lastmod>?

It's optional, but useful if it's accurate and consistently formatted. Incorrect or constantly changing values can reduce trust and may not help crawling.

Why would a sitemap be ignored by crawlers?

Common reasons include fetch errors (non-200), blocked access, invalid XML structure, missing , incorrect content type, or redirect loops.

Is it OK if my sitemap redirects?

Usually yes, but it's better to submit and publish the final canonical sitemap URL to reduce crawler overhead and avoid accidental breakage.

Can this tool check every URL in the sitemap for status codes?

This inspector focuses on parsing and validating the sitemap and extracting stats. Use a dedicated URL status checker or crawler if you want to fetch and validate every listed URL.

Does this tool support multi-locale / hreflang sitemaps?

It's designed to help spot patterns relevant to multi-locale SEO. If you publish alternate-language URLs, ensure your sitemap structure and URL grouping is consistent with your hreflang strategy.

Pro Tips

Best Practice

Submit the final canonical sitemap URL in Search Console (avoid relying on redirects).

Best Practice

For very large sites, split sitemaps and use a sitemap index. Keep each sitemap within protocol limits and operationally manageable chunks.

Best Practice

Use <lastmod> only if it's accurate. Don't update it for every deploy if the page content didn't change.

Best Practice

If you have multi-locale URLs (like /fr/, /en/), ensure your sitemap generation is consistent across locales so crawlers don't see partial coverage.

Best Practice

Export JSON/PDF after major releases so you have evidence for debugging Search Console indexing swings.

Additional Resources

Other Tools

Sitemap URL Inspector — validate sitemap.xml and extract URL stats | Encode64