McGarrah Technical Blog

Jekyll SEO: Fixing Canonical URLs and Google Search Console Issues

8 min read

After running this Jekyll blog for a couple of years, Google Search Console revealed several SEO issues that were impacting indexing and search visibility. This post documents the systematic approach to resolving canonical URL conflicts, redirect warnings, and 404 errors that commonly affect Jekyll sites on GitHub Pages.

The Problem: Google Search Console Warnings

Google Search Console flagged several critical issues affecting my site’s SEO performance:

  1. “Page with redirect” warnings for multiple URLs
  2. “Duplicate without user-selected canonical” errors
  3. “Not found (404)” errors for non-existent pages
  4. Canonical URL mismatches between Jekyll config and actual hosting

These issues were preventing proper indexing and potentially hurting search rankings.

Root Cause Analysis

Issue 1: Canonical URL Mismatch

The primary issue was a conflict between Jekyll configuration and GitHub Pages hosting behavior:

# _config.yml - BEFORE (problematic)
url: "https://www.mcgarrah.org"
canonical_url: "https://mcgarrah.org"  # Mismatch!

Problem: GitHub Pages was redirecting www.mcgarrah.orgmcgarrah.org at the hosting level, but Jekyll was generating canonical URLs pointing to www.mcgarrah.org. This created conflicting signals for search engines.

Issue 2: Testing Artifacts in Generated Site

The _site folder contained remnants from testing that included malformed URLs like /something/to(do from kramdown test files, causing 404 errors when crawled by Google.

Issue 3: Missing Canonical URL Implementation

While using the jekyll-seo-tag plugin, the canonical URL configuration wasn’t properly aligned with the actual serving domain.

The Solution: Systematic SEO Fixes

Step 1: Align Canonical URLs with Hosting Reality

First, I determined the actual canonical domain by testing browser behavior:

# Test actual redirect behavior
curl -I https://www.mcgarrah.org
# Result: 301 redirect to https://mcgarrah.org

Then updated _config.yml to match:

# _config.yml - AFTER (fixed)
url: "https://mcgarrah.org"
canonical_url: "https://mcgarrah.org"

Step 2: Clean Site Regeneration

Removed testing artifacts by completely rebuilding the site:

# Remove all generated files
rm -rf _site

# Clean regeneration
bundle exec jekyll build

This eliminated the problematic kramdown test URLs that were causing 404 errors.

Step 3: Implement Proper Canonical URL Tags

Enhanced the default layout with explicit canonical URL support:

<!-- _layouts/default.html -->
<head>
  <!-- Jekyll SEO Tag handles most meta tags -->
  {% seo %}
  
  <!-- Explicit canonical URL fallback -->
  {% unless page.canonical_url %}
    <link rel="canonical" href="{{ site.url }}{{ page.url }}" />
  {% endunless %}
</head>

Step 4: Update Robots.txt

Cleaned up robots.txt to remove blocks for URLs that are no longer generated:

# robots.txt - AFTER cleanup
User-agent: *
Allow: /

# Sitemap
Sitemap: https://mcgarrah.org/sitemap.xml

# Disallow crawling of admin/development files
Disallow: /_site/
Disallow: /vendor/
Disallow: /.jekyll-cache/
Disallow: /bin/
Disallow: /start-jekyll.sh
Disallow: /TODO.md
Disallow: /JMM-TODO.md

# Allow crawling of important content
Allow: /assets/
Allow: /categories/
Allow: /tags/
Allow: /archive/
Allow: /search/
Allow: /about/
Allow: /privacy/
Allow: /resume/

Verification and Testing

1. Canonical URL Consistency Check

Verified that all pages now have consistent canonical URLs:

# Check generated canonical URLs
grep -n "canonical" _site/index.html
# Result: <link rel="canonical" href="https://mcgarrah.org/" />

2. Sitemap Validation

Confirmed sitemap.xml points to the correct canonical domain:

<!-- Generated sitemap.xml -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://mcgarrah.org/</loc>
    <lastmod>2025-01-11T00:00:00+00:00</lastmod>
  </url>
  <!-- All URLs now use consistent domain -->
</urlset>

3. No More 404 Artifacts

Verified no problematic URLs remain in the generated site:

# Search for problematic patterns
find _site -name "*.html" -exec grep -l "something/to(do" {} \;
# Result: No matches (clean)

SEO Configuration Best Practices

Essential Jekyll SEO Plugin Setup

# _config.yml - Complete SEO configuration
plugins:
  - jekyll-seo-tag
  - jekyll-sitemap
  - jekyll-feed

# SEO Configuration
seo:
  type: Person
  name: "Michael McGarrah"
  links:
    - "https://www.linkedin.com/in/michaelmcgarrah/"
    - "https://github.com/mcgarrah"
    - "https://gitlab.com/mcgarrah"

# Social Media
twitter:
  username: mcgarrah
  card: summary_large_image

# Logo for structured data
logo: "/assets/images/logo.png"

# Consistent URL configuration
url: "https://mcgarrah.org"
baseurl: ""
canonical_url: "https://mcgarrah.org"

Post-Level SEO Optimization

# Front matter example for posts
---
title: "Article Title"
layout: post
categories: [web-development, technical]
tags: [jekyll, seo, github-pages]
excerpt: "Brief description for search results and social sharing."
description: "Longer meta description for search engines (150-160 characters)."
image: /assets/images/article-image.png
author: Michael McGarrah
date: 2025-12-31
last_modified_at: 2025-12-31
published: true
seo:
  type: BlogPosting
  date_published: 2025-12-31
  date_modified: 2025-12-31
---

Common Jekyll SEO Pitfalls to Avoid

1. Inconsistent Domain Configuration

Problem: Using different domains in url and canonical_url

# DON'T DO THIS
url: "https://www.example.com"
canonical_url: "https://example.com"  # Mismatch!

Solution: Test your actual hosting behavior and align configuration

2. Ignoring GitHub Pages Redirects

Problem: Not accounting for GitHub Pages’ automatic redirects

Solution: Use browser developer tools or curl to test actual redirect behavior

3. Leaving Testing Artifacts

Problem: Development/testing files getting into production builds

Solution: Regular clean rebuilds and proper .gitignore configuration

4. Missing Sitemap Submission

Problem: Not informing Google about sitemap location

Solution: Submit sitemap to Google Search Console and reference in robots.txt

Results and Impact

Google Search Console Improvements

After implementing these fixes:

SEO Performance Metrics

Monitoring and Maintenance

Regular SEO Health Checks

  1. Weekly: Monitor Google Search Console for new issues
  2. Monthly: Verify canonical URL consistency across sample pages
  3. Quarterly: Full site audit for SEO best practices
  4. After major changes: Complete rebuild and verification

Automated Monitoring

I’ve implemented a comprehensive SEO health check workflow that runs automatically. The complete workflow can be found at .github/workflows/seo-health-check.yml in the repository.

Key features of the automated monitoring:

# Excerpt from .github/workflows/seo-health-check.yml
name: SEO Health Check
on:
  schedule:
    - cron: '0 6 * * 1'  # Weekly on Monday
  workflow_dispatch:
  push:
    branches: [main]
    paths:
      - '_config.yml'
      - '_layouts/**'
      - '_posts/**'
      - 'robots.txt'

jobs:
  seo-check:
    runs-on: ubuntu-latest
    steps:
      - name: Check canonical URLs consistency
      - name: Validate sitemap.xml
      - name: Check robots.txt
      - name: Check meta tags on sample pages
      - name: Check for 404 page
      - name: Validate feed.xml
      - name: Check for broken internal links
      - name: Generate SEO report

The workflow automatically:

Conclusion

Fixing canonical URL issues and SEO problems in Jekyll requires a systematic approach:

  1. Identify the root cause through testing actual hosting behavior
  2. Align configuration with reality, not assumptions
  3. Clean up artifacts from development and testing
  4. Implement proper meta tags using Jekyll SEO plugins
  5. Monitor and maintain SEO health over time

The key insight is that Jekyll configuration must match your actual hosting environment. GitHub Pages’ redirect behavior at the hosting level takes precedence over Jekyll configuration, so your canonical URLs must align with the final serving domain.

These fixes resulted in immediate improvements in Google Search Console metrics and organic search performance. Regular monitoring ensures SEO health is maintained as the site evolves.

References


This article documents the SEO fixes implemented on my Jekyll blog in January 2025. The complete source code and configuration can be found in my GitHub repository.

Categories: web-development, technical