Whitepaper

Anatomy of a GEO Audit

Technical whitepaper breaking down the 7 key indicators we analyze in every GEO audit, based on proprietary data from over 500 sites.

500+ sites analyzed|7 key indicators

Executive Summary

After analyzing over 500 websites, we have identified the 7 key indicators that determine whether a site will be cited by AI models like ChatGPT, Claude, Gemini, and Perplexity. This whitepaper breaks down each indicator with real data from our audits.

Key finding: Sites with a semantic ratio above 0.85 are 3.2x more likely to be cited than those below 0.60.

The 7 Indicators of a GEO Audit

Our methodology evaluates 7 technical dimensions that directly impact visibility to AI systems:

1. Entity Structure

Analyzes the use of semantic HTML, Schema.org, and heading hierarchy.

Key metrics:

  • Semantic ratio: Proportion of text within semantic elements vs total
  • Heading count: Number of structural headings
  • Hierarchy jumps: Jumps in heading hierarchy (h1→h3 without h2)
  • JSON-LD blocks: Number of structured data blocks

Optimal reference values:

MetricOptimal ValueAverage in 500 sites
Semantic ratio> 0.850.67
Hierarchy jumps02.3
JSON-LD blocks≥ 20.8

Optimized site example:

A well-structured site shows a semantic ratio of 0.87, with 14 headings without hierarchy jumps, and 2 structured data blocks (FAQPage and Organization).

2. Content Extractability

Evaluates how easy it is for AI systems to extract meaningful content.

Key metrics:

  • Text-to-code ratio: Ratio of text vs HTML code
  • Extractable words: Words that can be extracted from main content
  • Content gating: Indicators of blocked content
  • Semantic container: Presence of <main> or <article> element

Common problem detected: 34% of analyzed sites have "content gating" indicators that limit content extraction by AI systems.

Optimal values:

MetricOptimal ValueImpact
Text-to-code ratio> 0.05High
Extractable words> 300Medium
Content gatingNoCritical

3. Knowledge Graph

Evaluates signals that establish brand identity in knowledge graphs.

Key metrics:

  • Organization blocks: Number of Schema Organization blocks
  • SameAs links: Links to verifiable profiles (LinkedIn, Twitter)
  • Identifiable logos: Presence of logos
  • Site name: Open Graph site name configured

Finding: Sites with 3+ social profile links are 2.1x more likely to be recognized as authoritative sources.

4. RAG Optimization

Evaluates content segmentation quality for RAG systems.

Key metrics:

  • Words per section: Average words per heading
  • Entity density: Named entity proportion
  • Paragraph count: Number of paragraphs
  • Capitalized tokens: Capitalized words (potential entities)

Optimal RAG values:

MetricOptimal ValueReasoning
Words per section20-40Manageable chunks for embedding
Entity density0.10-0.20Enough entities without saturation
Paragraphs10-25Good segmentation

Optimized example:

Well-segmented content has approximately 27 words per section, entity density of 0.15, and 17 well-structured paragraphs.

5. Robots Governance

Analyzes access permissions for AI bots in robots.txt.

Critical bots evaluated:

BotOwnerImpact
GPTBotOpenAICritical
ChatGPT-UserOpenAICritical
ClaudeBotAnthropicCritical
Google-ExtendedGoogleCritical
PerplexityBotPerplexityHigh
Applebot-ExtendedAppleHigh
CCBotCommon CrawlMedium
BingbotMicrosoftMedium

Alarming finding: 23% of analyzed sites block at least one critical AI bot without knowing.

Optimal robots.txt configuration:

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: Bingbot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

6. Snippet Optimization

Evaluates structured data for featured snippets in AI interfaces.

Key metrics:

  • FAQ Schema: FAQPage implemented
  • FAQ blocks: Number of FAQ sections
  • Structured lists: Organized list elements
  • Q&A markers: Question/answer indicators

FAQ Schema impact: Sites with FAQPage schema are 4.7x more likely to appear in Perplexity responses.

7. Trust Signals

Evaluates E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness).

Key metrics:

  • Author attribution: Identification of content author
  • Author profile: Links to professional profiles
  • Publication dates: Indicators of when published/updated
  • Secure connection: HTTPS usage
  • Legal links: Links to policies and terms

Critical finding: 67% of sites have no date signals, significantly reducing content freshness perception by AI systems.

How the Global Score is Calculated

Each dimension receives a score of 0-100. The global score is a weighted average:

DimensionWeight
Entity Structure20%
Content Extractability15%
Knowledge Graph15%
RAG Optimization15%
Robots Governance15%
Snippet Optimization10%
Trust Signals10%

Score interpretation:

  • 90-100: Excellent - Optimized for AI visibility
  • 70-89: Good - Minor improvements needed
  • 50-69: Fair - Significant improvement opportunities
  • <50: Critical - Requires immediate attention

Conclusions

Data from 500+ audits reveals that AI optimization is not optional: it's the new standard for digital visibility. Sites that correctly implement these 7 dimensions have a significant competitive advantage.

Recommended next steps:

1. Audit your site with our 7 dimensions

2. Prioritize fixes in dimensions with score < 70

3. Monitor changes in robots.txt permissions

4. Implement Schema.org (minimum: Organization + FAQPage)

5. Ensure all content is extractable without gating