What this is
Automated tools catch part of WCAG and people handle the rest — that part is well known. This is an honest routing map of which part is which, for the three criteria a from-scratch automation effort tends to get stuck on (2.1.1, 2.1.2, 1.3.1): what a deterministic tool can own, what an AI model can usefully propose for a human to confirm, and what stays a human call. Every placement is tagged by how well today's evidence backs it. No grand claim — a map you can act on, with the receipts and the gaps marked.
The three criteria
- 2.1.1 Keyboard — “All functionality of the content is operable through a keyboard interface without requiring specific timings for individual keystrokes”1.
- 2.1.2 No Keyboard Trap — once focus enters a component, “…then focus can be moved away from that component using only a keyboard interface…”2.
- 1.3.1 Info & Relationships — “Information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text”3.
The routing map
| Criterion / layer | A deterministic tool can own | AI proposes → human confirms | Human-led |
|---|---|---|---|
| 2.1.2 No Keyboard Trap | Deterministically automatable — but only with a focus-driving / AT-driver harness (tab through, confirm you can't get stuck), not the static axe-core / Equal Access engines, which catch only DOM-inferable traps. | Unusual plug-in / embedded boundaries. | |
| 2.1.1 Keyboard | Confirm an element is reachable, and — by driving it at runtime — activatable; static engines catch only DOM-inferable problems. | Enumerate a custom widget's expected keys (per its role) for a harness or human to verify. | Whether all functionality has a keyboard path and honours the role-appropriate key contract — a generic script can't know the intended role. |
| Companions — 2.4.7 Focus Visible, 2.4.3 Focus Order, 4.1.2 Name/Role/Value | A focus style is declared (2.4.7); a name/role/value is present (4.1.2). | Whether the focus indicator is actually visible (2.4.7); whether focus order is meaningful (2.4.3); whether the accessible name is the right name. | |
| 1.3.1 — markup present | A heading is marked up as a heading, a label is tied to its input, a cell has a header. | ||
| 1.3.1 — relationship correct | Check that a <th> / headers/id association exists and is referentially valid — but not whether it scopes the right cells. | Propose the likely-correct association from context. | Confirm it. |
| 1.3.1 — intent matches the visual meaning | Out of reach for a rule engine. | Propose structure from the rendered layout — unvalidated (see Research directions). | Decide. |
✓ proven (tool docs / standards) ◐ partial / case-by-case ⚠ promising but unvalidated ✗ not feasible for a rule engine – not applicable
Reading the map
Three honest distinctions hold it together:
- 2.1.2 is not 2.1.1. Detecting a keyboard trap is close to deterministic — given a harness that actually drives focus; a static scan alone can't. But “all functionality works by keyboard” is harder: a script can confirm an element is reachable, yet it can't know a custom widget's intended interaction model — composite components manage focus with a roving tabindex (“When using roving tabindex to manage focus in a composite UI component…”8), and which arrow / Escape keys should do what differs by role (menu vs grid vs tree). Those patterns come from the ARIA Authoring Practices Guide, which is non-normative: diverging from them isn't automatically a 2.1.1 failure — 2.1.1 fails only when missing keys leave functionality keyboard-inoperable. Keyboard operability is also necessary, not sufficient: pair it with focus order (2.4.3), visible focus (2.4.7) and name/role/value (4.1.2).
- 1.3.1 is three layers, not two. A tool can verify a structural element is present; whether the relationship is correct (does this header scope the right cells?) is partly machine-checkable at best, and whether the markup matches the intended visual meaning is a judgment. Layers two and three are most of 1.3.1's real failures — so the genuinely-deterministic share of 1.3.1 is modest.
- “Own” doesn't mean “complete.” Even where a deterministic tool owns a check, it finds issues only where a rule fires — “Absence of detected errors does not indicate that a page is accessible or conformant.”5, and “many accessibility problems can only be discovered through manual testing”4.
Where AI actually helps — and where it only looks like it does
An accepted assist — with the output still unvalidated
Once a deterministic tool localises an issue, an AI model can draft a candidate fix — alt-text, a corrected label, a clearer error message. This is an accepted human-in-the-loop workflow, but a sound workflow is not the same as a correct output: a human must confirm each draft is both accurate and contextually appropriate before it ships. Alt-text is the weakest case — AI descriptions are context-dependent and reliably plausible-but-wrong, the same silent-failure mode flagged below. AI is the assistant on a localised finding, never the detector of record and never the final word.
Promising but unproven: AI judging structure
The tempting move is a vision-language model proposing 1.3.1 structure from the rendered page. The capability to read UI structure exists — ScreenAI is “a vision-language model that specializes in UI and infographics understanding”12 — but that is extraction, not adjudication, and no source here validates a model's accuracy at judging whether markup matches intended meaning. AI's semantic output also fails silently (a confident, wrong alt text slips past a glance). So treat this as a research direction to pilot with measurement, not a tool to trust: take a labelled set of pages where structure does and doesn't match the visual meaning, have the model propose, and measure agreement against expert raters. Until that exists, it stays tagged ⚠.
What the data does — and doesn't — say. The systematic review centres on text and structure (“most studies apply LLMs to text-centric and structurally explicit accessibility tasks, with WCAG serving as the primary reference framework and limited consideration of cognitive accessibility guidelines (COGA)”6), and its issue table records studies touching alt-text most, then contrast and name/role/value, and — notably — keyboard (2.1.1) and heading structure (1.3.1) too.7 Read that as research attention spanning these criteria (so this isn't a text-only frontier) — not as evidence AI succeeds at them: the studies' actual results weren't verified here, and efficacy is precisely the open gap. Treat the counts as approximate rank-order (the review is even internally inconsistent on its own total), and note the same table logs hallucinated image descriptions in several studies — the silent-failure risk, in the data.
The tools, honestly bounded
You build on a mature stack, not a blank page — but each layer has edges worth stating:
- Deterministic engines — axe-core and its test-runner integrations, IBM Equal Access (“tools to automate accessibility checking from a browser or in a continuous development/build environment”9), Playwright to drive the keyboard. Excellent where a rule fires; not a completeness guarantee (above).
- Driving the screen reader — Guidepup (“Screen reader driver for test automation”11) and the W3C AT Driver, which “AT Driver defines a protocol for introspection and remote control of assistive technology software”10. This captures what a screen reader announces — genuinely on-point for the behavioural criteria — but judging whether the announcement is adequate is itself a human call, and the layer is emerging, not turnkey (AT Driver is a draft; Guidepup is platform-bound).
- The human-led floor — W3C's evaluation methodology (WCAG-EM, a Working-Group Note) frames evaluation so that “most accessibility checks are not fully automatable, evaluation tools can significantly assist evaluators”13: human-led, tool-assisted.
Research directions & open gaps
- The 1.3.1-intent experiment above is the missing measurement: no published accuracy or inter-rater validation exists for an AI judging whether structure matches meaning.
- Screen-reader-driver automation is emerging; coverage across assistive-technology / browser combinations is uneven.
- Cognitive accessibility (COGA) is barely covered in the LLM literature.6
- No controlled study measures how much manual time any of this actually saves.
Limitations & scope
- Scoped to the WCAG 2.1 success criteria cited here; if your target is WCAG 2.2, re-check the specifics against that version.
- The routing tags are this brief's assessment from the cited sources, not a measurement.
- Several sources are preprints; tool capabilities are from the vendors' and standards' own docs (none benchmarked here); the study counts are approximate. State of the field as of June 2026.
Evidence register
Every quotation was re-checked, verbatim, against its captured source at build time. Each source is labelled by type; “preprint” means not yet peer-reviewed.
- 1. “All functionality of the content is operable through a keyboard interface without requiring specific timings for individual keystrokes”
- 2. “then focus can be moved away from that component using only a keyboard interface”
- 3. “Information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text”
- 4. “many accessibility problems can only be discovered through manual testing”
- 5. “Absence of detected errors does not indicate that a page is accessible or conformant.”
- 6. “most studies apply LLMs to text-centric and structurally explicit accessibility tasks, with WCAG serving as the primary reference framework and limited consideration of cognitive accessibility guidelines (COGA)”
- 7. “Keyboard navigation / tabindex issues”
- 8. “When using roving tabindex to manage focus in a composite UI component”
- 9. “tools to automate accessibility checking from a browser or in a continuous development/build environment”
- 10. “AT Driver defines a protocol for introspection and remote control of assistive technology software”
- 11. “Screen reader driver for test automation”
- 12. “a vision-language model that specializes in UI and infographics understanding”
- 13. “most accessibility checks are not fully automatable, evaluation tools can significantly assist evaluators”