Executive Summary
After analyzing over 500 websites, we have identified the 7 key indicators that determine whether a site will be cited by AI models like ChatGPT, Claude, Gemini, and Perplexity. This whitepaper breaks down each indicator with real data from our audits.
Key finding: Sites with a semantic ratio above 0.85 are 3.2x more likely to be cited than those below 0.60.
The 7 Indicators of a GEO Audit
Our methodology evaluates 7 technical dimensions that directly impact visibility to AI systems:
1. Entity Structure
Analyzes the use of semantic HTML, Schema.org, and heading hierarchy.
Key metrics:
- Semantic ratio: Proportion of text within semantic elements vs total
- Heading count: Number of structural headings
- Hierarchy jumps: Jumps in heading hierarchy (h1→h3 without h2)
- JSON-LD blocks: Number of structured data blocks
Optimal reference values:
| Metric | Optimal Value | Average in 500 sites |
|---|---|---|
| Semantic ratio | > 0.85 | 0.67 |
| Hierarchy jumps | 0 | 2.3 |
| JSON-LD blocks | ≥ 2 | 0.8 |
Optimized site example:
A well-structured site shows a semantic ratio of 0.87, with 14 headings without hierarchy jumps, and 2 structured data blocks (FAQPage and Organization).
2. Content Extractability
Evaluates how easy it is for AI systems to extract meaningful content.
Key metrics:
- Text-to-code ratio: Ratio of text vs HTML code
- Extractable words: Words that can be extracted from main content
- Content gating: Indicators of blocked content
- Semantic container: Presence of
<main>or<article>element
Common problem detected: 34% of analyzed sites have "content gating" indicators that limit content extraction by AI systems.
Optimal values:
| Metric | Optimal Value | Impact |
|---|---|---|
| Text-to-code ratio | > 0.05 | High |
| Extractable words | > 300 | Medium |
| Content gating | No | Critical |
3. Knowledge Graph
Evaluates signals that establish brand identity in knowledge graphs.
Key metrics:
- Organization blocks: Number of Schema Organization blocks
- SameAs links: Links to verifiable profiles (LinkedIn, Twitter)
- Identifiable logos: Presence of logos
- Site name: Open Graph site name configured
Finding: Sites with 3+ social profile links are 2.1x more likely to be recognized as authoritative sources.
4. RAG Optimization
Evaluates content segmentation quality for RAG systems.
Key metrics:
- Words per section: Average words per heading
- Entity density: Named entity proportion
- Paragraph count: Number of paragraphs
- Capitalized tokens: Capitalized words (potential entities)
Optimal RAG values:
| Metric | Optimal Value | Reasoning |
|---|---|---|
| Words per section | 20-40 | Manageable chunks for embedding |
| Entity density | 0.10-0.20 | Enough entities without saturation |
| Paragraphs | 10-25 | Good segmentation |
Optimized example:
Well-segmented content has approximately 27 words per section, entity density of 0.15, and 17 well-structured paragraphs.
5. Robots Governance
Analyzes access permissions for AI bots in robots.txt.
Critical bots evaluated:
| Bot | Owner | Impact |
|---|---|---|
| GPTBot | OpenAI | Critical |
| ChatGPT-User | OpenAI | Critical |
| ClaudeBot | Anthropic | Critical |
| Google-Extended | Critical | |
| PerplexityBot | Perplexity | High |
| Applebot-Extended | Apple | High |
| CCBot | Common Crawl | Medium |
| Bingbot | Microsoft | Medium |
Alarming finding: 23% of analyzed sites block at least one critical AI bot without knowing.
Optimal robots.txt configuration:
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: CCBot
Allow: /
User-agent: Bingbot
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml6. Snippet Optimization
Evaluates structured data for featured snippets in AI interfaces.
Key metrics:
- FAQ Schema: FAQPage implemented
- FAQ blocks: Number of FAQ sections
- Structured lists: Organized list elements
- Q&A markers: Question/answer indicators
FAQ Schema impact: Sites with FAQPage schema are 4.7x more likely to appear in Perplexity responses.
7. Trust Signals
Evaluates E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness).
Key metrics:
- Author attribution: Identification of content author
- Author profile: Links to professional profiles
- Publication dates: Indicators of when published/updated
- Secure connection: HTTPS usage
- Legal links: Links to policies and terms
Critical finding: 67% of sites have no date signals, significantly reducing content freshness perception by AI systems.
How the Global Score is Calculated
Each dimension receives a score of 0-100. The global score is a weighted average:
| Dimension | Weight |
|---|---|
| Entity Structure | 20% |
| Content Extractability | 15% |
| Knowledge Graph | 15% |
| RAG Optimization | 15% |
| Robots Governance | 15% |
| Snippet Optimization | 10% |
| Trust Signals | 10% |
Score interpretation:
- 90-100: Excellent - Optimized for AI visibility
- 70-89: Good - Minor improvements needed
- 50-69: Fair - Significant improvement opportunities
- <50: Critical - Requires immediate attention
Conclusions
Data from 500+ audits reveals that AI optimization is not optional: it's the new standard for digital visibility. Sites that correctly implement these 7 dimensions have a significant competitive advantage.
Recommended next steps:
1. Audit your site with our 7 dimensions
2. Prioritize fixes in dimensions with score < 70
3. Monitor changes in robots.txt permissions
4. Implement Schema.org (minimum: Organization + FAQPage)
5. Ensure all content is extractable without gating