What this is

Automated tools catch part of WCAG and people handle the rest — that part is well known. This is an honest routing map of which part is which, for the three criteria a from-scratch automation effort tends to get stuck on (2.1.1, 2.1.2, 1.3.1): what a deterministic tool can own, what an AI model can usefully propose for a human to confirm, and what stays a human call. Every placement is tagged by how well today's evidence backs it. No grand claim — a map you can act on, with the receipts and the gaps marked.

The three criteria

The routing map

This map is the brief's synthesis from the cited sources (not a quotation); the tags rate how well current evidence backs each placement.
Criterion / layerA deterministic tool can ownAI proposes → human confirmsHuman-led
2.1.2 No Keyboard Trap Deterministically automatable — but only with a focus-driving / AT-driver harness (tab through, confirm you can't get stuck), not the static axe-core / Equal Access engines, which catch only DOM-inferable traps. Unusual plug-in / embedded boundaries.
2.1.1 Keyboard Confirm an element is reachable, and — by driving it at runtime — activatable; static engines catch only DOM-inferable problems. Enumerate a custom widget's expected keys (per its role) for a harness or human to verify. Whether all functionality has a keyboard path and honours the role-appropriate key contract — a generic script can't know the intended role.
Companions — 2.4.7 Focus Visible, 2.4.3 Focus Order, 4.1.2 Name/Role/Value A focus style is declared (2.4.7); a name/role/value is present (4.1.2). Whether the focus indicator is actually visible (2.4.7); whether focus order is meaningful (2.4.3); whether the accessible name is the right name.
1.3.1 — markup present A heading is marked up as a heading, a label is tied to its input, a cell has a header.
1.3.1 — relationship correct Check that a <th> / headers/id association exists and is referentially valid — but not whether it scopes the right cells. Propose the likely-correct association from context. Confirm it.
1.3.1 — intent matches the visual meaning Out of reach for a rule engine. Propose structure from the rendered layout — unvalidated (see Research directions). Decide.

proven (tool docs / standards)   partial / case-by-case   promising but unvalidated   not feasible for a rule engine   not applicable

Reading the map

Three honest distinctions hold it together:

Where AI actually helps — and where it only looks like it does

An accepted assist — with the output still unvalidated

Once a deterministic tool localises an issue, an AI model can draft a candidate fix — alt-text, a corrected label, a clearer error message. This is an accepted human-in-the-loop workflow, but a sound workflow is not the same as a correct output: a human must confirm each draft is both accurate and contextually appropriate before it ships. Alt-text is the weakest case — AI descriptions are context-dependent and reliably plausible-but-wrong, the same silent-failure mode flagged below. AI is the assistant on a localised finding, never the detector of record and never the final word.

Promising but unproven: AI judging structure

The tempting move is a vision-language model proposing 1.3.1 structure from the rendered page. The capability to read UI structure exists — ScreenAI is “a vision-language model that specializes in UI and infographics understanding”12 — but that is extraction, not adjudication, and no source here validates a model's accuracy at judging whether markup matches intended meaning. AI's semantic output also fails silently (a confident, wrong alt text slips past a glance). So treat this as a research direction to pilot with measurement, not a tool to trust: take a labelled set of pages where structure does and doesn't match the visual meaning, have the model propose, and measure agreement against expert raters. Until that exists, it stays tagged ⚠.

What the data does — and doesn't — say. The systematic review centres on text and structure (“most studies apply LLMs to text-centric and structurally explicit accessibility tasks, with WCAG serving as the primary reference framework and limited consideration of cognitive accessibility guidelines (COGA)”6), and its issue table records studies touching alt-text most, then contrast and name/role/value, and — notably — keyboard (2.1.1) and heading structure (1.3.1) too.7 Read that as research attention spanning these criteria (so this isn't a text-only frontier) — not as evidence AI succeeds at them: the studies' actual results weren't verified here, and efficacy is precisely the open gap. Treat the counts as approximate rank-order (the review is even internally inconsistent on its own total), and note the same table logs hallucinated image descriptions in several studies — the silent-failure risk, in the data.

The tools, honestly bounded

You build on a mature stack, not a blank page — but each layer has edges worth stating:

Research directions & open gaps

Limitations & scope

Evidence register

Every quotation was re-checked, verbatim, against its captured source at build time. Each source is labelled by type; “preprint” means not yet peer-reviewed.

  1. 1. “All functionality of the content is operable through a keyboard interface without requiring specific timings for individual keystrokes” WCAG 2.1 — SC 2.1.1 Keyboard (Level A) Normative (WCAG)
  2. 2. “then focus can be moved away from that component using only a keyboard interface” WCAG 2.1 — SC 2.1.2 No Keyboard Trap (Level A) Normative (WCAG)
  3. 3. “Information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text” WCAG 2.1 — SC 1.3.1 Info and Relationships (Level A) Normative (WCAG)
  4. 4. “many accessibility problems can only be discovered through manual testing” Playwright — Accessibility testing (docs) Tool documentation
  5. 5. “Absence of detected errors does not indicate that a page is accessible or conformant.” WebAIM Million 2026 Empirical report
  6. 6. “most studies apply LLMs to text-centric and structurally explicit accessibility tasks, with WCAG serving as the primary reference framework and limited consideration of cognitive accessibility guidelines (COGA)” LLMs for Web Accessibility: a Systematic Literature Review (2026) Peer-reviewed
  7. 7. “Keyboard navigation / tabindex issues” LLMs for Web Accessibility: Systematic Review — issue-frequency table Peer-reviewed
  8. 8. “When using roving tabindex to manage focus in a composite UI component” ARIA Authoring Practices Guide — Developing a Keyboard Interface W3C standards / methodology
  9. 9. “tools to automate accessibility checking from a browser or in a continuous development/build environment” IBM Equal Access Accessibility Checker Tool documentation
  10. 10. “AT Driver defines a protocol for introspection and remote control of assistive technology software” W3C AT Driver (draft protocol) W3C standards / methodology
  11. 11. “Screen reader driver for test automation” Guidepup — screen reader driver for test automation Tool documentation
  12. 12. “a vision-language model that specializes in UI and infographics understanding” ScreenAI: A Vision-Language Model for UI and Infographics Understanding Preprint (not peer-reviewed)
  13. 13. “most accessibility checks are not fully automatable, evaluation tools can significantly assist evaluators” W3C WCAG-EM 1.0 (Evaluation Methodology, Working-Group Note) W3C standards / methodology