GEO Scoring Rubric

How the GEO Score (0–100) is computed. Based on the Princeton KDD 2024 research on Generative Engine Optimization, extended with AutoGEO ICLR 2026 and geo-checklist.dev signals.

Score Bands

Band	Range	Meaning
Excellent	86–100	Fully optimized for AI citation engines
Good	68–85	Well-optimized, minor gaps remain
Foundation	36–67	Partially visible — key signals missing
Critical	0–35	AI engines cannot reliably discover or cite you

Categories and Weights (v3.18.3)

The total score is the sum of all points earned across 8 categories, capped at 100.

1. robots.txt — max 18 pts

Signal	Points	Condition
`robots_found`	5	File exists and is reachable
`robots_citation_ok`	13	All 4 citation bots allowed (OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot)
`robots_some_allowed`	10	At least some AI bots allowed (partial credit — not cumulative with `citation_ok`)

Citation bots are the most critical: OAI-SearchBot, ClaudeBot, Claude-SearchBot, and PerplexityBot drive real-time citations in ChatGPT, Claude, and Perplexity. The robots_some_allowed score is awarded only when citation bots are not fully covered — it acts as partial credit for sites that allow some AI bots via wildcard rules.

2. llms.txt — max 18 pts

Signal	Points	Condition
`llms_found`	5	`/llms.txt` exists at site root
`llms_h1`	2	File has a top-level H1 heading
`llms_blockquote`	1	File contains a blockquote (site description)
`llms_sections`	2	File has H2 content sections
`llms_links`	2	File contains at least one URL link
`llms_depth`	2	Word count ≥ 1,000 (substantial index)
`llms_depth_high`	2	Word count ≥ 5,000 (comprehensive index)
`llms_full`	2	`llms-full.txt` also exists at site root

Quality is graduated: a minimal llms.txt scores 5 pts, but a deep, well-structured file with a blockquote description and companion llms-full.txt can earn all 18.

3. Schema JSON-LD — max 16 pts effective (declared 22)

Signal	Points	Condition
`schema_any_valid`	2	Any valid JSON-LD schema found in the page
`schema_richness`	3	Schema contains 5+ relevant attributes (Growth Marshal 2026)
`schema_faq`	3	`FAQPage` schema present
`schema_article`	3	`Article` or `BlogPosting` schema present
`schema_organization`	3	`Organization` schema present
`schema_website`	2	`WebSite` schema present
`schema_sameas`	0	(migrated to `brand_kg_readiness` in v3.18.2 — retained for backwards compatibility, always 0)

Note: The sameAs knowledge graph signal has been moved to the Brand & Entity Signals category as brand_kg_readiness (3 pts). The schema_sameas key is kept for compatibility but contributes 0 points. The effective maximum for this category is 16 pts, not 22.

4. Meta Tags — max 14 pts

Signal	Points	Condition
`meta_title`	5	`<title>` tag present and non-empty
`meta_description`	2	`<meta name="description">` present
`meta_canonical`	3	`<link rel="canonical">` present
`meta_og`	4	Open Graph tags present (`og:title`, `og:description`)

5. Content Quality — max 12 pts

Signal	Points	Condition
`content_h1`	2	Page has at least one `<h1>` heading
`content_numbers`	1	Page contains statistics (numbers, percentages)
`content_links`	1	Page contains external citation links
`content_word_count`	2	Page has ≥ 300 words of substantive content
`content_heading_hierarchy`	2	Has H2 + H3 headings in correct hierarchy
`content_lists_or_tables`	2	Contains `<ul>`, `<ol>`, or `<table>` elements
`content_front_loading`	2	Key information appears in the first 30% of the content

Note: The declared category maximum was 14 pts in v3.14, but the real sum of weights has always been 12 pts. The rubric now reflects the actual values from config.py.

6. Signals — max 6 pts

Signal	Points	Condition
`signals_lang`	3	`<html lang="...">` attribute is set
`signals_rss`	2	RSS or Atom feed is discoverable
`signals_freshness`	1	`dateModified` in schema or `Last-Modified` HTTP header present

Reduced from 8 pts in v3.14. signals_rss reduced from 3 to 2, signals_freshness reduced from 2 to 1, reflecting their relatively lower impact on AI citability.

7. AI Discovery — max 6 pts

Based on the geo-checklist.dev emerging standard.

Signal	Points	Condition
`ai_discovery_well_known`	2	`/.well-known/ai.txt` is present
`ai_discovery_summary`	2	`/ai/summary.json` is present and valid
`ai_discovery_faq`	1	`/ai/faq.json` is present
`ai_discovery_service`	1	`/ai/service.json` is present

8. Brand & Entity Signals — max 10 pts (new in v3.18.2)

Rewards sites that establish a clear, machine-readable brand identity — a key factor in knowledge graph inclusion and AI attribution accuracy.

Signal	Points	Condition
`brand_entity_coherence`	3	Brand name is consistent across title, schema, and OG tags
`brand_kg_readiness`	3	`sameAs` links to authoritative KG domains (Wikipedia, Wikidata, LinkedIn, etc.)
`brand_about_contact`	2	`/about` and `/contact` (or equivalents) are discoverable
`brand_geo_identity`	1	Geographic identity signal present (LocalBusiness schema or address)
`brand_topic_authority`	1	Consistent topical focus across headings, schema, and meta tags

Authoritative sameAs domains include: wikipedia.org, wikidata.org, linkedin.com, crunchbase.com, github.com, twitter.com / x.com, facebook.com.

Total Points Reference

Category	Max Points	Notes
robots.txt	18
llms.txt	18
Schema JSON-LD	16	22 declared; `schema_sameas` migrated (0 pts)
Meta Tags	14
Content Quality	12	14 declared in v3.14; actual sum was always 12
Signals	6	Reduced from 8 in v3.14
AI Discovery	6
Brand & Entity Signals	10	New in v3.18.2
Total	100

WebMCP Readiness (v3.18.3, #233)

WebMCP Readiness measures how well a site exposes machine-readable context for MCP-compatible AI agents. This signal does not contribute to the GEO score but is included in the audit report and JSON output as a standalone indicator.

Level	Value	Meaning
`none`	No MCP signals detected	Site has no machine-readable AI context endpoints
`basic`	Minimal signals present	`/.well-known/ai.txt` or `/ai/summary.json` found, but incomplete
`ready`	MCP-compatible	Full AI Discovery suite present and valid (`ai.txt` + `summary.json` + `faq.json`)
`advanced`	Full MCP + structured data	All AI Discovery endpoints present plus rich schema and llms.txt with depth

WebMCP Readiness is surfaced in the CLI output, HTML report, and JSON API. It helps site owners understand their exposure to next-generation AI agents that consume structured context (not just crawled content) before generating responses.

Changelog

Version	Change
v3.18.3	WebMCP Readiness Check (#233): 4-level indicator (none/basic/ready/advanced), exposed in report but excluded from GEO score
v3.18.2	Brand & Entity Signals category added (10 pts, 5 checks); `schema_sameas` migrated to `brand_kg_readiness` (Schema effective max 22→16); Content max corrected 14→12; Signals max reduced 8→6
v3.18.0	Rich formatter v2 (ASCII art, stacked dashboard), centralized URL validation across 4 endpoints
v3.17.x	Mass bugfix series: citability score accuracy, formatter max scores, security hardening (SSRF, XSS, rate limiting), `@graph` JSON-LD parser (Yoast/RankMath), CI fixes
v3.14	7 categories (added Signals + AI Discovery), `schema_richness` + `schema_sameas`, graduated llms.txt scoring, content structure checks, bands adjusted
v3.0.0	5 categories, `schema_website` 10 pts, `meta_description` 8 pts
v1.5.0	Original weights