The Technical GEO Setup Guide: Schema, Robots.txt, and AI Crawler Config

Zach Chmael

Head of Marketing

5 minutes

In This Article

Copy-paste JSON-LD schema templates, robots.txt configs for every AI crawler, and the full technical checklist. Implementation guide, not theory.

Updated

Trusted by 1,000+ teams

★★★★★ 4.9/5

Startups use Averi to build
content engines that rank.

TL;DR

🤖 Layer 1 — Robots.txt: Allow all AI crawlers (GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended). Copy-paste config included. 15 minutes.

📋 Layer 2 — Schema: Organization + Article + FAQPage JSON-LD. Sites with complete Tier 1 schema see ~40% more AI Overview appearances. Copy-paste templates for all three included. 2–4 hours initial setup.

Layer 3 — Performance: FCP under 0.4s = 3x citation probability. Compress images, remove unused JS, enable CDN. 1–4 hours.

📄 llms.txt: Emerging standard for AI communication. Template included. 15 minutes. Low-risk, potential upside.

Full checklist: 30+ items across robots.txt, schema, performance, and additional technical. Run it once, maintain monthly.

🔧 Averi handles per-page technical GEO (Article schema, FAQ schema, content structure) automatically during publishing.

Zach Chmael

CMO, Averi

"We built Averi around the exact workflow we've used to scale our web traffic over 6000% in the last 6 months."

Your content should be working harder.

Averi's content engine builds Google entity authority, drives AI citations, and scales your visibility so you can get more customers.

The Technical GEO Setup Guide: Schema, Robots.txt, and AI Crawler Config

Most GEO guides tell you to "implement schema markup" and "allow AI crawlers" without showing you exactly what to implement.

They describe the what. This guide provides the how — with copy-paste code you can deploy today.

Sites with complete Tier 1 schema see approximately 40% more AI Overview appearances.

Pages with schema markup are 2.8x more likely to be cited by ChatGPT.

Pages with FCP under 0.4 seconds are 3x more likely to be cited.

These aren't content improvements. They're infrastructure improvements that take a few hours to implement and benefit every page on your site permanently.

This guide covers three layers: AI crawler access (robots.txt), structured data (JSON-LD schema), and site performance (speed and technical health). Each section includes the exact code, where to place it, and how to verify it's working.

This is part of the Definitive Guide to Generative Engine Optimization (GEO). The pillar covers the full GEO framework.

This piece is the technical implementation layer.

Layer 1: AI Crawler Access (Robots.txt Configuration)

If AI crawlers can't access your content, they can't cite it. This is the most common technical GEO failure — and the easiest to fix.

AI Crawlers in 2026

Each AI platform operates one or more dedicated web crawlers. These crawlers function independently from Googlebot and Bingbot.

Allowing search engine crawlers does not automatically allow AI crawlers. They must be permitted separately.

Crawler

User-Agent String

Platform

Purpose

GPTBot

GPTBot

OpenAI

ChatGPT training + search

OAI-SearchBot

OAI-SearchBot

OpenAI

ChatGPT Search (live retrieval)

ChatGPT-User

ChatGPT-User

OpenAI

ChatGPT browse mode

Google-Extended

Google-Extended

Google

Gemini / AI training

PerplexityBot

PerplexityBot

Perplexity

Perplexity search

ClaudeBot

ClaudeBot

Anthropic

Claude

Bytespider

Bytespider

ByteDance

TikTok AI

CCBot

CCBot

Common Crawl

Used by many AI systems

Amazonbot

Amazonbot

Amazon

Alexa / Amazon AI

FacebookBot

FacebookExternalHit

Meta

Meta AI

AppleBot-Extended

Applebot-Extended

Apple

Apple Intelligence

The Recommended Robots.txt Configuration

For startups pursuing GEO, allow all AI crawlers. The citation benefit outweighs the content access concern.




Replace yourdomain.com with your actual domain.

How to Implement

WordPress: Edit the robots.txt file through your SEO plugin (Yoast → Tools → File Editor, or RankMath → General Settings → Edit robots.txt). Or edit the file directly at your site root via FTP/SFTP.

Webflow: Go to your project settings → SEO tab → Custom robots.txt. Paste the full configuration. Publish.

Framer: Add a robots.txt file through your site settings. Framer supports custom robots.txt content.

How to Verify

After updating, test with these steps:

  1. Visit yourdomain.com/robots.txt in your browser. Confirm the file displays correctly.

  2. In Google Search Console → Settings → Crawl stats → Open report. Check for crawl errors.

  3. Use Google's robots.txt Tester (available in the old Search Console interface) to verify specific user-agents are allowed.

The "Should I Block AI Crawlers?" Decision

Some publishers block AI crawlers to prevent training data scraping. This makes sense for large media companies protecting subscription content. For startups building visibility, blocking AI crawlers means:

  • ChatGPT can't cite your content (GPTBot/OAI-SearchBot blocked)

  • Perplexity can't cite your content (PerplexityBot blocked)

  • Google's AI features can't draw from your content (Google-Extended blocked)

ChatGPT drives 87.4% of all AI referral traffic. Blocking GPTBot eliminates your visibility in the dominant AI discovery channel.

For startups, the trade-off is clear: allow everything.

Layer 2: Structured Data (JSON-LD Schema)

Schema markup tells AI systems what your content is, who wrote it, and what entity it represents. Without schema, AI crawlers must infer this information. With schema, you declare it explicitly.

Tier 1: Essential Schema (Implement First)

These three schema types create the minimum viable structured data layer for GEO.

Organization Schema

Place this in the <head> of every page on your site (typically in your site-wide header template).

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>

Customization guide:

  • @id: Use your domain + /#organization. This creates a persistent entity identifier.

  • sameAs: List every official profile URL. Each one strengthens entity recognition. Include LinkedIn, Twitter/X, Crunchbase, GitHub, YouTube, and any industry directories.

  • knowsAbout: List 5–8 topics your company has expertise in. These directly inform AI systems about your authority domain. Be specific: "Content Marketing for B2B SaaS Startups" is better than "Marketing."

  • foundingDate: Establishes entity age. Older entities have stronger recognition signals.

Article Schema

Place this in the <head> of every blog post or article page.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>

Critical fields:

  • datePublished and dateModified: Must reflect actual dates. Content freshness is a primary GEO signal. When you update an article, update dateModified. Don't update dateModified without making real content changes — Google's John Mueller has warned against this.

  • author with url: Links the article to a real person page with credentials. AI systems evaluate author authority as part of citation decisions. The author page should exist and include the person's bio, expertise, and other published work.

  • worksFor connecting to your Organization @id: This tells AI that the article author is part of the entity, strengthening the connection between the article, the author, and the organization.

FAQPage Schema

Place this in the <head> of any page with an FAQ section. This is the highest-impact GEO schema.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>

Implementation notes:

  • Include every FAQ question from your on-page FAQ section. The schema should mirror the visible content exactly.

  • Each text field should contain the same self-contained answer that appears on the page. Don't put different content in the schema versus the visible page — Google treats this as cloaking.

  • Add as many Question objects as you have FAQ items. 5–7 is the standard for long-form content.

Tier 2: Enhanced Schema (Implement When Ready)

These additional schema types strengthen GEO signals but aren't essential for getting started.

Person Schema (Author Page)

Create a dedicated author page (e.g., yourdomain.com/about/author-name) with Person schema:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>

This schema connects the author entity to external profiles and expertise areas. AI systems use this to evaluate whether the article author is a credible source on the topic.

HowTo Schema

For step-by-step guides and tutorials:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>

How to Implement Schema on Each Platform

WordPress:

  • Install a schema plugin (Schema Pro, Rank Math, or Yoast SEO Premium)

  • Most plugins auto-generate Article schema from post metadata

  • Add Organization schema as a custom code snippet in your theme's <head> (via Appearance → Theme Editor → header.php, or use a plugin like WPCode)

  • For FAQPage schema: use Rank Math's FAQ block or add manually with WPCode

Webflow:

  • Add JSON-LD as custom code in Project Settings → Custom Code → Head Code (for site-wide Organization schema)

  • For page-specific Article and FAQ schema: add custom code in each page's settings → Custom Code → Inside <head> tag

Framer:

  • Add JSON-LD scripts in your site settings under Custom Code → Head

  • For page-specific schema: use the page-level custom code injection

How to Verify Schema

  1. Google Rich Results Test: Go to search.google.com/test/rich-results. Enter your URL. It shows which schema types are detected and flags any errors.

  2. Schema.org Validator: Go to validator.schema.org. Paste your JSON-LD code directly. It validates syntax and structure.

  3. Google Search Console → Enhancements: After implementation, GSC shows FAQ, Article, and other rich result eligibility across your site. Check for errors weekly for the first month after implementation.

Common schema errors to avoid:

  • Missing required fields (headline, author, datePublished for Article)

  • Mismatched content between schema text and visible page content

  • Invalid date formats (use ISO 8601: YYYY-MM-DD)

  • Broken URLs in sameAs or image fields

  • Nested schema referencing an @id that doesn't exist on the page

Layer 3: Site Performance for AI Citation

AI crawlers evaluate page speed when selecting sources. Slow pages get skipped even when content quality is high.

The Speed Benchmarks That Matter for GEO

Pages with First Contentful Paint under 0.4 seconds are 3x more likely to be cited by ChatGPT than pages above 1.13 seconds. Pages with INP scores of 0.4–0.5 seconds have 1.6x higher citation chances than those above 1 second.

Target benchmarks:

Metric

Good

Needs Work

Poor

First Contentful Paint (FCP)

Under 0.4s

0.4–1.0s

Over 1.0s

Largest Contentful Paint (LCP)

Under 2.5s

2.5–4.0s

Over 4.0s

Interaction to Next Paint (INP)

Under 200ms

200–500ms

Over 500ms

Cumulative Layout Shift (CLS)

Under 0.1

0.1–0.25

Over 0.25

Quick Wins for Speed Improvement

These fixes address 80% of speed issues for most startup websites:

Image optimization. Compress all images. Use WebP format. Lazy-load images below the fold. A single uncompressed hero image can add 2+ seconds to LCP.

Remove unused JavaScript. Audit your third-party scripts. Every analytics tag, chat widget, and tracking pixel adds load time. Remove anything you're not actively using. Defer non-critical scripts.

Enable CDN. If your hosting doesn't include a CDN, add Cloudflare (free tier works). CDN caching reduces server response time for users (and crawlers) worldwide.

Minimize render-blocking CSS. Inline critical CSS. Defer non-critical stylesheets. This directly improves FCP.

How to Measure

  • Google PageSpeed Insights: pagespeed.web.dev. Enter your URL for FCP, LCP, CLS, and INP scores with specific recommendations.

  • Google Search Console → Core Web Vitals: Shows site-wide performance with pages grouped by status (Good, Needs Improvement, Poor).

  • WebPageTest.org: Advanced waterfall analysis showing exactly which resources delay loading.

The llms.txt File (Emerging Standard)

A newer convention for communicating directly with AI systems. Place a Markdown file at yourdomain.com/llms.txt that describes your site and its most important content.

Template

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

Current Status

The llms.txt standard is not universally adopted. Perplexity and some Common Crawl-based systems have shown early support. It's not confirmed that ChatGPT or Google AI read it. Implementation takes 15 minutes and has no downside, so it's worth adding even if the impact is uncertain. Think of it as a free option on future AI crawler behavior.

The Complete Technical GEO Checklist

Run this checklist on your site. Each item takes minutes to hours, not days.

Robots.txt (15 minutes)

☐ robots.txt exists at site root

☐ GPTBot allowed

☐ OAI-SearchBot allowed

☐ ChatGPT-User allowed

☐ PerplexityBot allowed

☐ ClaudeBot allowed

☐ Google-Extended allowed

☐ Sitemap URL included in robots.txt

☐ No blanket Disallow: / rules blocking content directories

Schema Markup (2–4 hours initial setup)

☐ Organization JSON-LD on every page (site-wide header)

@id set for Organization

sameAs includes all brand profile URLs (5+ platforms)

knowsAbout includes 5–8 expertise topics

☐ Article JSON-LD on every blog post

author linked to real person with URL

datePublished and dateModified populated with real dates

publisher references Organization @id

☐ FAQPage JSON-LD on every page with FAQ section

☐ FAQ schema text matches visible page content exactly

☐ Schema validated with Google Rich Results Test (zero errors)

Site Performance (1–4 hours depending on current state)

☐ FCP under 1 second (ideally under 0.4s)

☐ LCP under 2.5 seconds

☐ INP under 200ms

☐ CLS under 0.1

☐ Images compressed and in WebP format

☐ Unused JavaScript removed or deferred

☐ CDN active

☐ HTTPS active (no mixed content warnings)

☐ Mobile responsive

Additional Technical (30 minutes)

☐ Bing Webmaster Tools connected (ChatGPT sources from Bing)

☐ XML sitemap submitted to both Google and Bing

☐ Author page exists with Person schema

☐ llms.txt file placed at site root

☐ No login walls or paywalls on content you want cited

Maintenance Schedule

Technical GEO isn't a one-time setup. It requires periodic maintenance.

Weekly (5 minutes):

  • Check Google Search Console for new crawl errors or schema validation issues

Monthly (15 minutes):

  • Verify robots.txt hasn't been overwritten by CMS updates or plugin changes

  • Check that new blog posts have Article and FAQ schema (some CMS themes drop schema on new templates)

  • Review Core Web Vitals in GSC for any performance regressions

Quarterly (30 minutes):

  • Audit sameAs links in Organization schema — add any new brand profiles created during the quarter

  • Update knowsAbout if your expertise areas have expanded

  • Check for new AI crawlers that should be allowed (new crawlers appear regularly)

  • Re-validate all schema with the Rich Results Test

How Averi Handles Technical GEO

For startups that want the technical GEO layer handled automatically, Averi's content engine builds these elements into the publishing workflow:

  • Schema generation: Organization schema guidance provided during onboarding.

  • FAQ structure: Every piece includes a 5–7 question FAQ section with self-contained answers formatted for both human reading and schema extraction.

  • Content scoring: The 55% SEO / 45% GEO scoring system evaluates structural elements (answer capsules, extractable blocks, factual density) before publishing.

  • CMS publishing: Direct publishing to WordPress, Webflow, and Framer preserves schema and formatting without manual code insertion.

The robots.txt configuration, site performance optimization, and Organization schema are still site-level tasks that need to be done once on your end.

Averi handles the per-page technical GEO: the Article schema, FAQ schema, and content structure that make each piece citation-ready.

Start a free 14-day trial. No credit card. The technical GEO content layer applies to every piece you publish through the engine.

Related Resources

FAQs

What schema markup do I need for GEO?

Three essential types. Organization JSON-LD on every page (establishes your entity with @id, sameAs profile links, and knowsAbout expertise topics). Article JSON-LD on every blog post (with author, dates, and publisher reference). FAQPage JSON-LD on every page with an FAQ section (question-answer pairs matching visible content). Sites with complete Tier 1 schema see approximately 40% more AI Overview appearances. Validate with Google's Rich Results Test after implementation.

Which AI crawlers should I allow in robots.txt?

All of them, if you want AI citations. The critical ones: GPTBot and OAI-SearchBot (ChatGPT), PerplexityBot (Perplexity), ClaudeBot (Claude), and Google-Extended (Google AI/Gemini). ChatGPT drives 87.4% of all AI referral traffic. Blocking GPTBot eliminates your content from the dominant AI discovery channel. The full robots.txt configuration with all AI crawlers is included in this guide. Copy and paste it directly.

Does page speed actually affect AI citations?

Yes. Pages with FCP under 0.4 seconds are 3x more likely to be cited by ChatGPT than pages above 1.13 seconds. AI retrieval systems operate under time constraints. When evaluating multiple candidate pages for citation, slow-loading pages risk being skipped regardless of content quality. Target FCP under 1 second (ideally under 0.4s), LCP under 2.5 seconds, and INP under 200ms. Quick wins: compress images to WebP, remove unused JavaScript, and enable a CDN.

How do I implement schema on WordPress?

Install a schema plugin (Rank Math, Yoast SEO Premium, or Schema Pro). Most auto-generate Article schema from your post metadata. Add Organization JSON-LD as a custom code snippet in your site-wide header using a plugin like WPCode or through Appearance → Theme Editor → header.php. For FAQPage schema, use Rank Math's built-in FAQ block or add JSON-LD manually via WPCode. Validate each page with Google's Rich Results Test after implementation.

What is llms.txt and should I implement it?

llms.txt is an emerging standard for communicating directly with AI systems, similar to how robots.txt communicates with web crawlers. It's a Markdown file placed at your site root that describes your company, expertise, and most important content pages. Perplexity and some Common Crawl-based systems show early support. Implementation takes 15 minutes and has no downside. It's not confirmed that ChatGPT or Google AI read it yet, so treat it as a low-cost option on future AI behavior rather than a required element.

How often do I need to maintain technical GEO setup?

Weekly: 5-minute check of Google Search Console for crawl errors and schema issues. Monthly: 15-minute verification that robots.txt hasn't been overwritten, new posts have proper schema, and Core Web Vitals haven't regressed. Quarterly: 30-minute audit updating sameAs links for new brand profiles, expanding knowsAbout topics, checking for new AI crawlers, and re-validating schema. The initial setup takes 2–4 hours total. Maintenance is minimal after that.

Do I need Bing Webmaster Tools for GEO?

Yes. 73% of ChatGPT's results align with Bing's search results. ChatGPT's retrieval system pulls from Bing's index. If your content isn't indexed on Bing, ChatGPT can't retrieve or cite it. Connect Bing Webmaster Tools (free), submit your sitemap, and verify your content appears in Bing's index. Many site owners focus only on Google and are invisible to the largest AI citation platform because they neglected Bing.

Continue Reading

The latest handpicked blog articles

Experience The AI Content Engine

Already have an account?

Join 30,000+ Founders, Marketers & Builders

Don't Feed the Algorithm

“Top 3 tech + AI newsletters in the country. Always sharp, always actionable.”

"Genuinely my favorite newsletter in tech. No fluff, no cheesy ads, just great content."

“Clear, practical, and on-point. Helps me keep up without drowning in noise.”

User-Generated Content & Authenticity in the Age of AI

Zach Chmael

Head of Marketing

5 minutes

In This Article

Copy-paste JSON-LD schema templates, robots.txt configs for every AI crawler, and the full technical checklist. Implementation guide, not theory.

Don’t Feed the Algorithm

The algorithm never sleeps, but you don’t have to feed it — Join our weekly newsletter for real insights on AI, human creativity & marketing execution.

TL;DR

🤖 Layer 1 — Robots.txt: Allow all AI crawlers (GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended). Copy-paste config included. 15 minutes.

📋 Layer 2 — Schema: Organization + Article + FAQPage JSON-LD. Sites with complete Tier 1 schema see ~40% more AI Overview appearances. Copy-paste templates for all three included. 2–4 hours initial setup.

Layer 3 — Performance: FCP under 0.4s = 3x citation probability. Compress images, remove unused JS, enable CDN. 1–4 hours.

📄 llms.txt: Emerging standard for AI communication. Template included. 15 minutes. Low-risk, potential upside.

Full checklist: 30+ items across robots.txt, schema, performance, and additional technical. Run it once, maintain monthly.

🔧 Averi handles per-page technical GEO (Article schema, FAQ schema, content structure) automatically during publishing.

"We built Averi around the exact workflow we've used to scale our web traffic over 6000% in the last 6 months."

founder-image
founder-image
Your content should be working harder.

Averi's content engine builds Google entity authority, drives AI citations, and scales your visibility so you can get more customers.

The Technical GEO Setup Guide: Schema, Robots.txt, and AI Crawler Config

Most GEO guides tell you to "implement schema markup" and "allow AI crawlers" without showing you exactly what to implement.

They describe the what. This guide provides the how — with copy-paste code you can deploy today.

Sites with complete Tier 1 schema see approximately 40% more AI Overview appearances.

Pages with schema markup are 2.8x more likely to be cited by ChatGPT.

Pages with FCP under 0.4 seconds are 3x more likely to be cited.

These aren't content improvements. They're infrastructure improvements that take a few hours to implement and benefit every page on your site permanently.

This guide covers three layers: AI crawler access (robots.txt), structured data (JSON-LD schema), and site performance (speed and technical health). Each section includes the exact code, where to place it, and how to verify it's working.

This is part of the Definitive Guide to Generative Engine Optimization (GEO). The pillar covers the full GEO framework.

This piece is the technical implementation layer.

Layer 1: AI Crawler Access (Robots.txt Configuration)

If AI crawlers can't access your content, they can't cite it. This is the most common technical GEO failure — and the easiest to fix.

AI Crawlers in 2026

Each AI platform operates one or more dedicated web crawlers. These crawlers function independently from Googlebot and Bingbot.

Allowing search engine crawlers does not automatically allow AI crawlers. They must be permitted separately.

Crawler

User-Agent String

Platform

Purpose

GPTBot

GPTBot

OpenAI

ChatGPT training + search

OAI-SearchBot

OAI-SearchBot

OpenAI

ChatGPT Search (live retrieval)

ChatGPT-User

ChatGPT-User

OpenAI

ChatGPT browse mode

Google-Extended

Google-Extended

Google

Gemini / AI training

PerplexityBot

PerplexityBot

Perplexity

Perplexity search

ClaudeBot

ClaudeBot

Anthropic

Claude

Bytespider

Bytespider

ByteDance

TikTok AI

CCBot

CCBot

Common Crawl

Used by many AI systems

Amazonbot

Amazonbot

Amazon

Alexa / Amazon AI

FacebookBot

FacebookExternalHit

Meta

Meta AI

AppleBot-Extended

Applebot-Extended

Apple

Apple Intelligence

The Recommended Robots.txt Configuration

For startups pursuing GEO, allow all AI crawlers. The citation benefit outweighs the content access concern.




Replace yourdomain.com with your actual domain.

How to Implement

WordPress: Edit the robots.txt file through your SEO plugin (Yoast → Tools → File Editor, or RankMath → General Settings → Edit robots.txt). Or edit the file directly at your site root via FTP/SFTP.

Webflow: Go to your project settings → SEO tab → Custom robots.txt. Paste the full configuration. Publish.

Framer: Add a robots.txt file through your site settings. Framer supports custom robots.txt content.

How to Verify

After updating, test with these steps:

  1. Visit yourdomain.com/robots.txt in your browser. Confirm the file displays correctly.

  2. In Google Search Console → Settings → Crawl stats → Open report. Check for crawl errors.

  3. Use Google's robots.txt Tester (available in the old Search Console interface) to verify specific user-agents are allowed.

The "Should I Block AI Crawlers?" Decision

Some publishers block AI crawlers to prevent training data scraping. This makes sense for large media companies protecting subscription content. For startups building visibility, blocking AI crawlers means:

  • ChatGPT can't cite your content (GPTBot/OAI-SearchBot blocked)

  • Perplexity can't cite your content (PerplexityBot blocked)

  • Google's AI features can't draw from your content (Google-Extended blocked)

ChatGPT drives 87.4% of all AI referral traffic. Blocking GPTBot eliminates your visibility in the dominant AI discovery channel.

For startups, the trade-off is clear: allow everything.

Layer 2: Structured Data (JSON-LD Schema)

Schema markup tells AI systems what your content is, who wrote it, and what entity it represents. Without schema, AI crawlers must infer this information. With schema, you declare it explicitly.

Tier 1: Essential Schema (Implement First)

These three schema types create the minimum viable structured data layer for GEO.

Organization Schema

Place this in the <head> of every page on your site (typically in your site-wide header template).

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>

Customization guide:

  • @id: Use your domain + /#organization. This creates a persistent entity identifier.

  • sameAs: List every official profile URL. Each one strengthens entity recognition. Include LinkedIn, Twitter/X, Crunchbase, GitHub, YouTube, and any industry directories.

  • knowsAbout: List 5–8 topics your company has expertise in. These directly inform AI systems about your authority domain. Be specific: "Content Marketing for B2B SaaS Startups" is better than "Marketing."

  • foundingDate: Establishes entity age. Older entities have stronger recognition signals.

Article Schema

Place this in the <head> of every blog post or article page.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>

Critical fields:

  • datePublished and dateModified: Must reflect actual dates. Content freshness is a primary GEO signal. When you update an article, update dateModified. Don't update dateModified without making real content changes — Google's John Mueller has warned against this.

  • author with url: Links the article to a real person page with credentials. AI systems evaluate author authority as part of citation decisions. The author page should exist and include the person's bio, expertise, and other published work.

  • worksFor connecting to your Organization @id: This tells AI that the article author is part of the entity, strengthening the connection between the article, the author, and the organization.

FAQPage Schema

Place this in the <head> of any page with an FAQ section. This is the highest-impact GEO schema.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>

Implementation notes:

  • Include every FAQ question from your on-page FAQ section. The schema should mirror the visible content exactly.

  • Each text field should contain the same self-contained answer that appears on the page. Don't put different content in the schema versus the visible page — Google treats this as cloaking.

  • Add as many Question objects as you have FAQ items. 5–7 is the standard for long-form content.

Tier 2: Enhanced Schema (Implement When Ready)

These additional schema types strengthen GEO signals but aren't essential for getting started.

Person Schema (Author Page)

Create a dedicated author page (e.g., yourdomain.com/about/author-name) with Person schema:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>

This schema connects the author entity to external profiles and expertise areas. AI systems use this to evaluate whether the article author is a credible source on the topic.

HowTo Schema

For step-by-step guides and tutorials:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>

How to Implement Schema on Each Platform

WordPress:

  • Install a schema plugin (Schema Pro, Rank Math, or Yoast SEO Premium)

  • Most plugins auto-generate Article schema from post metadata

  • Add Organization schema as a custom code snippet in your theme's <head> (via Appearance → Theme Editor → header.php, or use a plugin like WPCode)

  • For FAQPage schema: use Rank Math's FAQ block or add manually with WPCode

Webflow:

  • Add JSON-LD as custom code in Project Settings → Custom Code → Head Code (for site-wide Organization schema)

  • For page-specific Article and FAQ schema: add custom code in each page's settings → Custom Code → Inside <head> tag

Framer:

  • Add JSON-LD scripts in your site settings under Custom Code → Head

  • For page-specific schema: use the page-level custom code injection

How to Verify Schema

  1. Google Rich Results Test: Go to search.google.com/test/rich-results. Enter your URL. It shows which schema types are detected and flags any errors.

  2. Schema.org Validator: Go to validator.schema.org. Paste your JSON-LD code directly. It validates syntax and structure.

  3. Google Search Console → Enhancements: After implementation, GSC shows FAQ, Article, and other rich result eligibility across your site. Check for errors weekly for the first month after implementation.

Common schema errors to avoid:

  • Missing required fields (headline, author, datePublished for Article)

  • Mismatched content between schema text and visible page content

  • Invalid date formats (use ISO 8601: YYYY-MM-DD)

  • Broken URLs in sameAs or image fields

  • Nested schema referencing an @id that doesn't exist on the page

Layer 3: Site Performance for AI Citation

AI crawlers evaluate page speed when selecting sources. Slow pages get skipped even when content quality is high.

The Speed Benchmarks That Matter for GEO

Pages with First Contentful Paint under 0.4 seconds are 3x more likely to be cited by ChatGPT than pages above 1.13 seconds. Pages with INP scores of 0.4–0.5 seconds have 1.6x higher citation chances than those above 1 second.

Target benchmarks:

Metric

Good

Needs Work

Poor

First Contentful Paint (FCP)

Under 0.4s

0.4–1.0s

Over 1.0s

Largest Contentful Paint (LCP)

Under 2.5s

2.5–4.0s

Over 4.0s

Interaction to Next Paint (INP)

Under 200ms

200–500ms

Over 500ms

Cumulative Layout Shift (CLS)

Under 0.1

0.1–0.25

Over 0.25

Quick Wins for Speed Improvement

These fixes address 80% of speed issues for most startup websites:

Image optimization. Compress all images. Use WebP format. Lazy-load images below the fold. A single uncompressed hero image can add 2+ seconds to LCP.

Remove unused JavaScript. Audit your third-party scripts. Every analytics tag, chat widget, and tracking pixel adds load time. Remove anything you're not actively using. Defer non-critical scripts.

Enable CDN. If your hosting doesn't include a CDN, add Cloudflare (free tier works). CDN caching reduces server response time for users (and crawlers) worldwide.

Minimize render-blocking CSS. Inline critical CSS. Defer non-critical stylesheets. This directly improves FCP.

How to Measure

  • Google PageSpeed Insights: pagespeed.web.dev. Enter your URL for FCP, LCP, CLS, and INP scores with specific recommendations.

  • Google Search Console → Core Web Vitals: Shows site-wide performance with pages grouped by status (Good, Needs Improvement, Poor).

  • WebPageTest.org: Advanced waterfall analysis showing exactly which resources delay loading.

The llms.txt File (Emerging Standard)

A newer convention for communicating directly with AI systems. Place a Markdown file at yourdomain.com/llms.txt that describes your site and its most important content.

Template

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

Current Status

The llms.txt standard is not universally adopted. Perplexity and some Common Crawl-based systems have shown early support. It's not confirmed that ChatGPT or Google AI read it. Implementation takes 15 minutes and has no downside, so it's worth adding even if the impact is uncertain. Think of it as a free option on future AI crawler behavior.

The Complete Technical GEO Checklist

Run this checklist on your site. Each item takes minutes to hours, not days.

Robots.txt (15 minutes)

☐ robots.txt exists at site root

☐ GPTBot allowed

☐ OAI-SearchBot allowed

☐ ChatGPT-User allowed

☐ PerplexityBot allowed

☐ ClaudeBot allowed

☐ Google-Extended allowed

☐ Sitemap URL included in robots.txt

☐ No blanket Disallow: / rules blocking content directories

Schema Markup (2–4 hours initial setup)

☐ Organization JSON-LD on every page (site-wide header)

@id set for Organization

sameAs includes all brand profile URLs (5+ platforms)

knowsAbout includes 5–8 expertise topics

☐ Article JSON-LD on every blog post

author linked to real person with URL

datePublished and dateModified populated with real dates

publisher references Organization @id

☐ FAQPage JSON-LD on every page with FAQ section

☐ FAQ schema text matches visible page content exactly

☐ Schema validated with Google Rich Results Test (zero errors)

Site Performance (1–4 hours depending on current state)

☐ FCP under 1 second (ideally under 0.4s)

☐ LCP under 2.5 seconds

☐ INP under 200ms

☐ CLS under 0.1

☐ Images compressed and in WebP format

☐ Unused JavaScript removed or deferred

☐ CDN active

☐ HTTPS active (no mixed content warnings)

☐ Mobile responsive

Additional Technical (30 minutes)

☐ Bing Webmaster Tools connected (ChatGPT sources from Bing)

☐ XML sitemap submitted to both Google and Bing

☐ Author page exists with Person schema

☐ llms.txt file placed at site root

☐ No login walls or paywalls on content you want cited

Maintenance Schedule

Technical GEO isn't a one-time setup. It requires periodic maintenance.

Weekly (5 minutes):

  • Check Google Search Console for new crawl errors or schema validation issues

Monthly (15 minutes):

  • Verify robots.txt hasn't been overwritten by CMS updates or plugin changes

  • Check that new blog posts have Article and FAQ schema (some CMS themes drop schema on new templates)

  • Review Core Web Vitals in GSC for any performance regressions

Quarterly (30 minutes):

  • Audit sameAs links in Organization schema — add any new brand profiles created during the quarter

  • Update knowsAbout if your expertise areas have expanded

  • Check for new AI crawlers that should be allowed (new crawlers appear regularly)

  • Re-validate all schema with the Rich Results Test

How Averi Handles Technical GEO

For startups that want the technical GEO layer handled automatically, Averi's content engine builds these elements into the publishing workflow:

  • Schema generation: Organization schema guidance provided during onboarding.

  • FAQ structure: Every piece includes a 5–7 question FAQ section with self-contained answers formatted for both human reading and schema extraction.

  • Content scoring: The 55% SEO / 45% GEO scoring system evaluates structural elements (answer capsules, extractable blocks, factual density) before publishing.

  • CMS publishing: Direct publishing to WordPress, Webflow, and Framer preserves schema and formatting without manual code insertion.

The robots.txt configuration, site performance optimization, and Organization schema are still site-level tasks that need to be done once on your end.

Averi handles the per-page technical GEO: the Article schema, FAQ schema, and content structure that make each piece citation-ready.

Start a free 14-day trial. No credit card. The technical GEO content layer applies to every piece you publish through the engine.

Related Resources

Continue Reading

The latest handpicked blog articles

Join 30,000+ Founders, Marketers & Builders

Don't Feed the Algorithm

“Top 3 tech + AI newsletters in the country. Always sharp, always actionable.”

"Genuinely my favorite newsletter in tech. No fluff, no cheesy ads, just great content."

“Clear, practical, and on-point. Helps me keep up without drowning in noise.”

User-Generated Content & Authenticity in the Age of AI

Zach Chmael

Head of Marketing

5 minutes

In This Article

Copy-paste JSON-LD schema templates, robots.txt configs for every AI crawler, and the full technical checklist. Implementation guide, not theory.

Don’t Feed the Algorithm

The algorithm never sleeps, but you don’t have to feed it — Join our weekly newsletter for real insights on AI, human creativity & marketing execution.

Trusted by 1,000+ teams

★★★★★ 4.9/5

Startups use Averi to build
content engines that rank.

The Technical GEO Setup Guide: Schema, Robots.txt, and AI Crawler Config

Most GEO guides tell you to "implement schema markup" and "allow AI crawlers" without showing you exactly what to implement.

They describe the what. This guide provides the how — with copy-paste code you can deploy today.

Sites with complete Tier 1 schema see approximately 40% more AI Overview appearances.

Pages with schema markup are 2.8x more likely to be cited by ChatGPT.

Pages with FCP under 0.4 seconds are 3x more likely to be cited.

These aren't content improvements. They're infrastructure improvements that take a few hours to implement and benefit every page on your site permanently.

This guide covers three layers: AI crawler access (robots.txt), structured data (JSON-LD schema), and site performance (speed and technical health). Each section includes the exact code, where to place it, and how to verify it's working.

This is part of the Definitive Guide to Generative Engine Optimization (GEO). The pillar covers the full GEO framework.

This piece is the technical implementation layer.

Layer 1: AI Crawler Access (Robots.txt Configuration)

If AI crawlers can't access your content, they can't cite it. This is the most common technical GEO failure — and the easiest to fix.

AI Crawlers in 2026

Each AI platform operates one or more dedicated web crawlers. These crawlers function independently from Googlebot and Bingbot.

Allowing search engine crawlers does not automatically allow AI crawlers. They must be permitted separately.

Crawler

User-Agent String

Platform

Purpose

GPTBot

GPTBot

OpenAI

ChatGPT training + search

OAI-SearchBot

OAI-SearchBot

OpenAI

ChatGPT Search (live retrieval)

ChatGPT-User

ChatGPT-User

OpenAI

ChatGPT browse mode

Google-Extended

Google-Extended

Google

Gemini / AI training

PerplexityBot

PerplexityBot

Perplexity

Perplexity search

ClaudeBot

ClaudeBot

Anthropic

Claude

Bytespider

Bytespider

ByteDance

TikTok AI

CCBot

CCBot

Common Crawl

Used by many AI systems

Amazonbot

Amazonbot

Amazon

Alexa / Amazon AI

FacebookBot

FacebookExternalHit

Meta

Meta AI

AppleBot-Extended

Applebot-Extended

Apple

Apple Intelligence

The Recommended Robots.txt Configuration

For startups pursuing GEO, allow all AI crawlers. The citation benefit outweighs the content access concern.




Replace yourdomain.com with your actual domain.

How to Implement

WordPress: Edit the robots.txt file through your SEO plugin (Yoast → Tools → File Editor, or RankMath → General Settings → Edit robots.txt). Or edit the file directly at your site root via FTP/SFTP.

Webflow: Go to your project settings → SEO tab → Custom robots.txt. Paste the full configuration. Publish.

Framer: Add a robots.txt file through your site settings. Framer supports custom robots.txt content.

How to Verify

After updating, test with these steps:

  1. Visit yourdomain.com/robots.txt in your browser. Confirm the file displays correctly.

  2. In Google Search Console → Settings → Crawl stats → Open report. Check for crawl errors.

  3. Use Google's robots.txt Tester (available in the old Search Console interface) to verify specific user-agents are allowed.

The "Should I Block AI Crawlers?" Decision

Some publishers block AI crawlers to prevent training data scraping. This makes sense for large media companies protecting subscription content. For startups building visibility, blocking AI crawlers means:

  • ChatGPT can't cite your content (GPTBot/OAI-SearchBot blocked)

  • Perplexity can't cite your content (PerplexityBot blocked)

  • Google's AI features can't draw from your content (Google-Extended blocked)

ChatGPT drives 87.4% of all AI referral traffic. Blocking GPTBot eliminates your visibility in the dominant AI discovery channel.

For startups, the trade-off is clear: allow everything.

Layer 2: Structured Data (JSON-LD Schema)

Schema markup tells AI systems what your content is, who wrote it, and what entity it represents. Without schema, AI crawlers must infer this information. With schema, you declare it explicitly.

Tier 1: Essential Schema (Implement First)

These three schema types create the minimum viable structured data layer for GEO.

Organization Schema

Place this in the <head> of every page on your site (typically in your site-wide header template).

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://yourdomain.com/#organization",
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://yourdomain.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "One sentence describing what your company does and who it serves.",
  "foundingDate": "2024",
  "founders": [
    {
      "@type": "Person",
      "name": "Founder Name"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/company/yourcompany",
    "https://twitter.com/yourcompany",
    "https://www.crunchbase.com/organization/yourcompany",
    "https://github.com/yourcompany",
    "https://www.youtube.com/@yourcompany"
  ],
  "knowsAbout": [
    "Your Primary Topic",
    "Your Second Topic",
    "Your Third Topic",
    "Your Fourth Topic",
    "Your Fifth Topic"
  ]
}
</script>

Customization guide:

  • @id: Use your domain + /#organization. This creates a persistent entity identifier.

  • sameAs: List every official profile URL. Each one strengthens entity recognition. Include LinkedIn, Twitter/X, Crunchbase, GitHub, YouTube, and any industry directories.

  • knowsAbout: List 5–8 topics your company has expertise in. These directly inform AI systems about your authority domain. Be specific: "Content Marketing for B2B SaaS Startups" is better than "Marketing."

  • foundingDate: Establishes entity age. Older entities have stronger recognition signals.

Article Schema

Place this in the <head> of every blog post or article page.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Title Here",
  "description": "Your meta description here.",
  "image": "https://yourdomain.com/images/article-featured-image.jpg",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://yourdomain.com/about",
    "jobTitle": "Founder & CEO",
    "worksFor": {
      "@id": "https://yourdomain.com/#organization"
    }
  },
  "publisher": {
    "@id": "https://yourdomain.com/#organization"
  },
  "datePublished": "2026-04-09",
  "dateModified": "2026-04-09",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/blog/your-article-slug"
  }
}
</script>

Critical fields:

  • datePublished and dateModified: Must reflect actual dates. Content freshness is a primary GEO signal. When you update an article, update dateModified. Don't update dateModified without making real content changes — Google's John Mueller has warned against this.

  • author with url: Links the article to a real person page with credentials. AI systems evaluate author authority as part of citation decisions. The author page should exist and include the person's bio, expertise, and other published work.

  • worksFor connecting to your Organization @id: This tells AI that the article author is part of the entity, strengthening the connection between the article, the author, and the organization.

FAQPage Schema

Place this in the <head> of any page with an FAQ section. This is the highest-impact GEO schema.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative Engine Optimization (GEO) is the practice of optimizing digital content to increase its visibility in responses generated by AI systems like ChatGPT, Perplexity, and Google AI Overviews. Unlike traditional SEO, which optimizes for ranking in a list of links, GEO optimizes for citation within an AI-generated answer."
      }
    },
    {
      "@type": "Question",
      "name": "Your second FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here, self-contained and independently citable."
      }
    },
    {
      "@type": "Question",
      "name": "Your third FAQ question here?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Your 40-60 word answer here."
      }
    }
  ]
}
</script>

Implementation notes:

  • Include every FAQ question from your on-page FAQ section. The schema should mirror the visible content exactly.

  • Each text field should contain the same self-contained answer that appears on the page. Don't put different content in the schema versus the visible page — Google treats this as cloaking.

  • Add as many Question objects as you have FAQ items. 5–7 is the standard for long-form content.

Tier 2: Enhanced Schema (Implement When Ready)

These additional schema types strengthen GEO signals but aren't essential for getting started.

Person Schema (Author Page)

Create a dedicated author page (e.g., yourdomain.com/about/author-name) with Person schema:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Author Name",
  "url": "https://yourdomain.com/about/author-name",
  "jobTitle": "Founder & CEO",
  "worksFor": {
    "@id": "https://yourdomain.com/#organization"
  },
  "sameAs": [
    "https://www.linkedin.com/in/authorname",
    "https://twitter.com/authorname"
  ],
  "description": "Brief bio describing expertise and credentials.",
  "knowsAbout": ["Topic 1", "Topic 2", "Topic 3"]
}
</script>

This schema connects the author entity to external profiles and expertise areas. AI systems use this to evaluate whether the article author is a credible source on the topic.

HowTo Schema

For step-by-step guides and tutorials:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Set Up Technical GEO for Your Website",
  "description": "Step-by-step guide to implementing schema markup, robots.txt configuration, and AI crawler access for generative engine optimization.",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Configure robots.txt for AI crawlers",
      "text": "Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in your robots.txt file."
    },
    {
      "@type": "HowToStep",
      "name": "Implement Organization schema",
      "text": "Add JSON-LD Organization schema with @id, sameAs links, and knowsAbout topics to your site-wide header."
    }
  ]
}
</script>

How to Implement Schema on Each Platform

WordPress:

  • Install a schema plugin (Schema Pro, Rank Math, or Yoast SEO Premium)

  • Most plugins auto-generate Article schema from post metadata

  • Add Organization schema as a custom code snippet in your theme's <head> (via Appearance → Theme Editor → header.php, or use a plugin like WPCode)

  • For FAQPage schema: use Rank Math's FAQ block or add manually with WPCode

Webflow:

  • Add JSON-LD as custom code in Project Settings → Custom Code → Head Code (for site-wide Organization schema)

  • For page-specific Article and FAQ schema: add custom code in each page's settings → Custom Code → Inside <head> tag

Framer:

  • Add JSON-LD scripts in your site settings under Custom Code → Head

  • For page-specific schema: use the page-level custom code injection

How to Verify Schema

  1. Google Rich Results Test: Go to search.google.com/test/rich-results. Enter your URL. It shows which schema types are detected and flags any errors.

  2. Schema.org Validator: Go to validator.schema.org. Paste your JSON-LD code directly. It validates syntax and structure.

  3. Google Search Console → Enhancements: After implementation, GSC shows FAQ, Article, and other rich result eligibility across your site. Check for errors weekly for the first month after implementation.

Common schema errors to avoid:

  • Missing required fields (headline, author, datePublished for Article)

  • Mismatched content between schema text and visible page content

  • Invalid date formats (use ISO 8601: YYYY-MM-DD)

  • Broken URLs in sameAs or image fields

  • Nested schema referencing an @id that doesn't exist on the page

Layer 3: Site Performance for AI Citation

AI crawlers evaluate page speed when selecting sources. Slow pages get skipped even when content quality is high.

The Speed Benchmarks That Matter for GEO

Pages with First Contentful Paint under 0.4 seconds are 3x more likely to be cited by ChatGPT than pages above 1.13 seconds. Pages with INP scores of 0.4–0.5 seconds have 1.6x higher citation chances than those above 1 second.

Target benchmarks:

Metric

Good

Needs Work

Poor

First Contentful Paint (FCP)

Under 0.4s

0.4–1.0s

Over 1.0s

Largest Contentful Paint (LCP)

Under 2.5s

2.5–4.0s

Over 4.0s

Interaction to Next Paint (INP)

Under 200ms

200–500ms

Over 500ms

Cumulative Layout Shift (CLS)

Under 0.1

0.1–0.25

Over 0.25

Quick Wins for Speed Improvement

These fixes address 80% of speed issues for most startup websites:

Image optimization. Compress all images. Use WebP format. Lazy-load images below the fold. A single uncompressed hero image can add 2+ seconds to LCP.

Remove unused JavaScript. Audit your third-party scripts. Every analytics tag, chat widget, and tracking pixel adds load time. Remove anything you're not actively using. Defer non-critical scripts.

Enable CDN. If your hosting doesn't include a CDN, add Cloudflare (free tier works). CDN caching reduces server response time for users (and crawlers) worldwide.

Minimize render-blocking CSS. Inline critical CSS. Defer non-critical stylesheets. This directly improves FCP.

How to Measure

  • Google PageSpeed Insights: pagespeed.web.dev. Enter your URL for FCP, LCP, CLS, and INP scores with specific recommendations.

  • Google Search Console → Core Web Vitals: Shows site-wide performance with pages grouped by status (Good, Needs Improvement, Poor).

  • WebPageTest.org: Advanced waterfall analysis showing exactly which resources delay loading.

The llms.txt File (Emerging Standard)

A newer convention for communicating directly with AI systems. Place a Markdown file at yourdomain.com/llms.txt that describes your site and its most important content.

Template

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

# Your Company Name

> One-sentence description of what your company does.

## About

Two to three sentences expanding on your company, 
your expertise, and who you serve.

## Key Resources

- [Resource Title 1](https://yourdomain.com/page-1): 
  Brief description of what this page covers.
- [Resource Title 2](https://yourdomain.com/page-2): 
  Brief description.
- [Resource Title 3](https://yourdomain.com/page-3): 
  Brief description.
- [Blog](https://yourdomain.com/blog)

Current Status

The llms.txt standard is not universally adopted. Perplexity and some Common Crawl-based systems have shown early support. It's not confirmed that ChatGPT or Google AI read it. Implementation takes 15 minutes and has no downside, so it's worth adding even if the impact is uncertain. Think of it as a free option on future AI crawler behavior.

The Complete Technical GEO Checklist

Run this checklist on your site. Each item takes minutes to hours, not days.

Robots.txt (15 minutes)

☐ robots.txt exists at site root

☐ GPTBot allowed

☐ OAI-SearchBot allowed

☐ ChatGPT-User allowed

☐ PerplexityBot allowed

☐ ClaudeBot allowed

☐ Google-Extended allowed

☐ Sitemap URL included in robots.txt

☐ No blanket Disallow: / rules blocking content directories

Schema Markup (2–4 hours initial setup)

☐ Organization JSON-LD on every page (site-wide header)

@id set for Organization

sameAs includes all brand profile URLs (5+ platforms)

knowsAbout includes 5–8 expertise topics

☐ Article JSON-LD on every blog post

author linked to real person with URL

datePublished and dateModified populated with real dates

publisher references Organization @id

☐ FAQPage JSON-LD on every page with FAQ section

☐ FAQ schema text matches visible page content exactly

☐ Schema validated with Google Rich Results Test (zero errors)

Site Performance (1–4 hours depending on current state)

☐ FCP under 1 second (ideally under 0.4s)

☐ LCP under 2.5 seconds

☐ INP under 200ms

☐ CLS under 0.1

☐ Images compressed and in WebP format

☐ Unused JavaScript removed or deferred

☐ CDN active

☐ HTTPS active (no mixed content warnings)

☐ Mobile responsive

Additional Technical (30 minutes)

☐ Bing Webmaster Tools connected (ChatGPT sources from Bing)

☐ XML sitemap submitted to both Google and Bing

☐ Author page exists with Person schema

☐ llms.txt file placed at site root

☐ No login walls or paywalls on content you want cited

Maintenance Schedule

Technical GEO isn't a one-time setup. It requires periodic maintenance.

Weekly (5 minutes):

  • Check Google Search Console for new crawl errors or schema validation issues

Monthly (15 minutes):

  • Verify robots.txt hasn't been overwritten by CMS updates or plugin changes

  • Check that new blog posts have Article and FAQ schema (some CMS themes drop schema on new templates)

  • Review Core Web Vitals in GSC for any performance regressions

Quarterly (30 minutes):

  • Audit sameAs links in Organization schema — add any new brand profiles created during the quarter

  • Update knowsAbout if your expertise areas have expanded

  • Check for new AI crawlers that should be allowed (new crawlers appear regularly)

  • Re-validate all schema with the Rich Results Test

How Averi Handles Technical GEO

For startups that want the technical GEO layer handled automatically, Averi's content engine builds these elements into the publishing workflow:

  • Schema generation: Organization schema guidance provided during onboarding.

  • FAQ structure: Every piece includes a 5–7 question FAQ section with self-contained answers formatted for both human reading and schema extraction.

  • Content scoring: The 55% SEO / 45% GEO scoring system evaluates structural elements (answer capsules, extractable blocks, factual density) before publishing.

  • CMS publishing: Direct publishing to WordPress, Webflow, and Framer preserves schema and formatting without manual code insertion.

The robots.txt configuration, site performance optimization, and Organization schema are still site-level tasks that need to be done once on your end.

Averi handles the per-page technical GEO: the Article schema, FAQ schema, and content structure that make each piece citation-ready.

Start a free 14-day trial. No credit card. The technical GEO content layer applies to every piece you publish through the engine.

Related Resources

"We built Averi around the exact workflow we've used to scale our web traffic over 6000% in the last 6 months."

founder-image
founder-image
Your content should be working harder.

Averi's content engine builds Google entity authority, drives AI citations, and scales your visibility so you can get more customers.

FAQs

Yes. 73% of ChatGPT's results align with Bing's search results. ChatGPT's retrieval system pulls from Bing's index. If your content isn't indexed on Bing, ChatGPT can't retrieve or cite it. Connect Bing Webmaster Tools (free), submit your sitemap, and verify your content appears in Bing's index. Many site owners focus only on Google and are invisible to the largest AI citation platform because they neglected Bing.

Do I need Bing Webmaster Tools for GEO?

Weekly: 5-minute check of Google Search Console for crawl errors and schema issues. Monthly: 15-minute verification that robots.txt hasn't been overwritten, new posts have proper schema, and Core Web Vitals haven't regressed. Quarterly: 30-minute audit updating sameAs links for new brand profiles, expanding knowsAbout topics, checking for new AI crawlers, and re-validating schema. The initial setup takes 2–4 hours total. Maintenance is minimal after that.

How often do I need to maintain technical GEO setup?

llms.txt is an emerging standard for communicating directly with AI systems, similar to how robots.txt communicates with web crawlers. It's a Markdown file placed at your site root that describes your company, expertise, and most important content pages. Perplexity and some Common Crawl-based systems show early support. Implementation takes 15 minutes and has no downside. It's not confirmed that ChatGPT or Google AI read it yet, so treat it as a low-cost option on future AI behavior rather than a required element.

What is llms.txt and should I implement it?

Install a schema plugin (Rank Math, Yoast SEO Premium, or Schema Pro). Most auto-generate Article schema from your post metadata. Add Organization JSON-LD as a custom code snippet in your site-wide header using a plugin like WPCode or through Appearance → Theme Editor → header.php. For FAQPage schema, use Rank Math's built-in FAQ block or add JSON-LD manually via WPCode. Validate each page with Google's Rich Results Test after implementation.

How do I implement schema on WordPress?

Yes. Pages with FCP under 0.4 seconds are 3x more likely to be cited by ChatGPT than pages above 1.13 seconds. AI retrieval systems operate under time constraints. When evaluating multiple candidate pages for citation, slow-loading pages risk being skipped regardless of content quality. Target FCP under 1 second (ideally under 0.4s), LCP under 2.5 seconds, and INP under 200ms. Quick wins: compress images to WebP, remove unused JavaScript, and enable a CDN.

Does page speed actually affect AI citations?

All of them, if you want AI citations. The critical ones: GPTBot and OAI-SearchBot (ChatGPT), PerplexityBot (Perplexity), ClaudeBot (Claude), and Google-Extended (Google AI/Gemini). ChatGPT drives 87.4% of all AI referral traffic. Blocking GPTBot eliminates your content from the dominant AI discovery channel. The full robots.txt configuration with all AI crawlers is included in this guide. Copy and paste it directly.

Which AI crawlers should I allow in robots.txt?

Three essential types. Organization JSON-LD on every page (establishes your entity with @id, sameAs profile links, and knowsAbout expertise topics). Article JSON-LD on every blog post (with author, dates, and publisher reference). FAQPage JSON-LD on every page with an FAQ section (question-answer pairs matching visible content). Sites with complete Tier 1 schema see approximately 40% more AI Overview appearances. Validate with Google's Rich Results Test after implementation.

What schema markup do I need for GEO?

FAQs

How long does it take to see SEO results for B2B SaaS?

Expect 7 months to break-even on average, with meaningful traffic improvements typically appearing within 3-6 months. Link building results appear within 1-6 months. The key is consistency—companies that stop and start lose ground to those who execute continuously.

Is AI-generated content actually good for SEO?

62% of marketers report higher SERP rankings for AI-generated content—but only when properly edited and enhanced with human expertise. Pure AI content without human refinement often lacks the originality and depth that both readers and algorithms prefer.

Is AI-generated content actually good for SEO?

62% of marketers report higher SERP rankings for AI-generated content—but only when properly edited and enhanced with human expertise. Pure AI content without human refinement often lacks the originality and depth that both readers and algorithms prefer.

Is AI-generated content actually good for SEO?

62% of marketers report higher SERP rankings for AI-generated content—but only when properly edited and enhanced with human expertise. Pure AI content without human refinement often lacks the originality and depth that both readers and algorithms prefer.

Is AI-generated content actually good for SEO?

62% of marketers report higher SERP rankings for AI-generated content—but only when properly edited and enhanced with human expertise. Pure AI content without human refinement often lacks the originality and depth that both readers and algorithms prefer.

Is AI-generated content actually good for SEO?

62% of marketers report higher SERP rankings for AI-generated content—but only when properly edited and enhanced with human expertise. Pure AI content without human refinement often lacks the originality and depth that both readers and algorithms prefer.

Is AI-generated content actually good for SEO?

62% of marketers report higher SERP rankings for AI-generated content—but only when properly edited and enhanced with human expertise. Pure AI content without human refinement often lacks the originality and depth that both readers and algorithms prefer.

Is AI-generated content actually good for SEO?

62% of marketers report higher SERP rankings for AI-generated content—but only when properly edited and enhanced with human expertise. Pure AI content without human refinement often lacks the originality and depth that both readers and algorithms prefer.

TL;DR

🤖 Layer 1 — Robots.txt: Allow all AI crawlers (GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended). Copy-paste config included. 15 minutes.

📋 Layer 2 — Schema: Organization + Article + FAQPage JSON-LD. Sites with complete Tier 1 schema see ~40% more AI Overview appearances. Copy-paste templates for all three included. 2–4 hours initial setup.

Layer 3 — Performance: FCP under 0.4s = 3x citation probability. Compress images, remove unused JS, enable CDN. 1–4 hours.

📄 llms.txt: Emerging standard for AI communication. Template included. 15 minutes. Low-risk, potential upside.

Full checklist: 30+ items across robots.txt, schema, performance, and additional technical. Run it once, maintain monthly.

🔧 Averi handles per-page technical GEO (Article schema, FAQ schema, content structure) automatically during publishing.

Continue Reading

The latest handpicked blog articles

Join 30,000+ Founders, Marketers & Builders

Don't Feed the Algorithm

“Top 3 tech + AI newsletters in the country. Always sharp, always actionable.”

"Genuinely my favorite newsletter in tech. No fluff, no cheesy ads, just great content."

“Clear, practical, and on-point. Helps me keep up without drowning in noise.”

How strong is your content engine? Find out in 30 seconds.

Maybe later