Schema.org and JSON-LD

What is Schema.org, how structured data works in JSON-LD format, and why it is fundamental for visibility in AI assistants and search engines.

Schema.org is the structured data vocabulary jointly created by Google, Microsoft, Yahoo, and Yandex that allows websites to communicate the semantic meaning of their content to machines, bots, and AI systems. JSON-LD (JavaScript Object Notation for Linked Data) is the format recommended by Google for implementing Schema.org on web pages. In the context of GEO, Schema.org is the most direct and effective way to communicate to RAG systems what type of information your site contains, who created it, and why it is trustworthy.

What is Schema.org?

Schema.org is an open structured data standard that defines a shared vocabulary for describing entities and their relationships on the web. It was launched in 2011 by Google, Microsoft (Bing), Yahoo, and Yandex with the goal of creating a universal language that machines could understand. Today, Schema.org defines over 800 entity types and more than 1,400 properties covering everything from articles and people to events, products, recipes, and organizations.

The importance of Schema.org has grown exponentially with the arrival of AI assistants. The RAG systems that power ChatGPT, Claude, and Perplexity use structured data as a high-trust signal for understanding a page's content. In our audits, sites with correct Schema.org implementation showed a strong correlation with citation frequency by AI assistants.

Why JSON-LD?

There are three formats for implementing Schema.org: Microdata, RDFa, and JSON-LD. Google officially recommends JSON-LD for several reasons:

Separation of presentation and data: JSON-LD is implemented as an independent script block in the HTML, without needing to modify the visual markup of the page. This makes maintenance easier and reduces errors.

Ease of implementation: JSON-LD uses standard JSON syntax that any developer can read and write. It does not require special attributes in the existing HTML.

Better for RAG systems: RAG systems can extract JSON-LD blocks directly from the HTML without needing to parse the full DOM. This makes structured data extraction more efficient and reliable.

Key Schema.org Types for GEO

Schema Type Use Case GEO Impact
Article / BlogPosting Editorial content and articles Identifies author, dates, and organization for RAG
FAQPage Frequently asked questions Significantly higher likelihood of AI citation
Organization Brand identity Foundation for RAG to identify who created the content
LocalBusiness Businesses with physical presence Responses to local queries in AI assistants
HowTo Tutorials and step-by-step guides Direct extraction as structured guides
BreadcrumbList Navigation and site hierarchy Relevance context for RAG evaluation
Person Authors and experts Verifiable credentials for E-E-A-T

Article Schema

The Article type (and its variants BlogPosting, NewsArticle, TechArticle) is fundamental for any site that publishes editorial content. It provides RAG systems with information about the author, publication date, update date, publishing organization, and article content.

Essential properties: headline, author (with Person type and credentials), datePublished, dateModified, publisher (with Organization type), description, image, mainEntityOfPage.

FAQPage Schema

FAQPage is arguably the most impactful Schema.org type for GEO. Our semantic benchmark revealed that sites with correctly implemented FAQPage schema are significantly more likely to be cited by RAG systems. This is because FAQPage provides structured question-answer pairs that AI systems can extract directly.

Structure: FAQPage contains a mainEntity array with Question-type elements, each with an acceptedAnswer property of type Answer. RAG systems can directly map these questions to user queries.

LocalBusiness Schema

For businesses with a physical presence, LocalBusiness (and its subtypes like Restaurant, MedicalBusiness, LegalService) communicates geographic information, hours, services, and contact data. This is particularly relevant for local queries in AI assistants such as "what is the best Italian restaurant near me?"

Key properties: name, address (with PostalAddress type), telephone, openingHoursSpecification, geo (with GeoCoordinates type), priceRange, servesCuisine (for restaurants), areaServed.

Organization Schema

Organization schema establishes your brand identity for AI systems. It includes name, logo, contact data, social media, and description. It is the foundation for RAG systems to identify who is behind the content.

Essential properties: name, url, logo, contactPoint, sameAs (links to social media and verifiable profiles), description, foundingDate, numberOfEmployees.

HowTo Schema

HowTo is valuable for instructional content. It provides structured steps that RAG systems can extract and present as step-by-step guides in their responses. Especially useful for tutorials, implementation guides, and technical processes.

BreadcrumbList Schema

BreadcrumbList helps AI systems understand the hierarchy and navigation of your site. It provides context about where a page sits within the overall site structure, which RAG systems use to evaluate contextual relevance.

Practical Implementation

Step 1: Identify Relevant Types

Analyze your content and determine which Schema.org types apply to each page. A typical site needs at minimum: Organization (global), Article or BlogPosting (for each article), FAQPage (for Q&A content), and BreadcrumbList (for navigation).

Step 2: Implement JSON-LD

Each JSON-LD block is implemented within a script type="application/ld+json" tag in the head or body of your page. You can include multiple JSON-LD blocks on a single page to cover different schema types.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Article Title",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "jobTitle": "Professional Title"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Your Organization",
    "logo": { "@type": "ImageObject", "url": "https://your-site.com/logo.png" }
  },
  "datePublished": "2026-01-15",
  "dateModified": "2026-03-01"
}

Step 3: Validate Your Implementation

Use Google's official tools: Rich Results Test and Schema Markup Validator. Verify that there are no errors or warnings. Errors in Schema.org can cause RAG systems to completely ignore your structured data.

Step 4: Monitor the Impact

After implementing Schema.org, monitor how AI assistants cite your content. Tools like Presencia IA can help you measure your visibility in AI assistants and correlate changes with your structured data implementation.

Common Mistakes

1. Incomplete Schema.org: Implementing only the type without the required properties. An Article without author or datePublished loses most of its value for RAG systems.

2. Inconsistent data: The information in the Schema.org does not match the visible content of the page. RAG systems verify consistency between structured data and actual content.

3. Not updating dateModified: Changing the content of a page without updating the dateModified property in the schema. RAG systems use this date to evaluate content freshness.

4. Ignoring FAQPage: Not implementing FAQPage when your content includes frequently asked questions. Given the significant impact on citation, this represents a major missed opportunity.

Schema.org and the AI Ecosystem

Many of the websites analyzed in our benchmark block AI bots in their robots.txt. Of those that do allow access, only a fraction implement Schema.org correctly. This creates a significant opportunity: sites that combine open access to AI bots with complete Schema.org implementation position themselves as preferred sources for RAG systems.

Schema.org is not just a technical SEO improvement. It is the language your website uses to communicate directly with the AI systems that determine your visibility in the new era of generative search.

Explore our GEO Hub