Why it exists
Antivirus tools could only collaborate once they had a shared rule format, YARA. Detecting AI attacks has the same problem: everyone re-invents their own regexes for the same prompt injections and the same exposed servers. AIIS is the shared format, and its goal is stated directly:
“The goal is to establish an interoperable detection standard the same way YARA did for malware.”
AIIS signatures cover two surfaces:
- injection
- Matches prompt-injection artefacts embedded in public content, hidden text, HTML comments, script literals, meta tags, attributes. Detects the attack payload.
- exposure
- Matches evidence that a host publicly exposes an AI component, MCP servers, LLM gateways, vector DBs, self-hosted LLM UIs, known-vulnerable versions. Detects the attack surface.
Anatomy of a signature
A signature is a small YAML file: an id, a severity, the OpenA2A attack class and threat-matrix technique it maps to, where to look (surface_types), and the match pattern.
id: AIIS-HIDDEN-ROLE-INJECT-01
name: Role injection via fake system marker in hidden text
severity: high
attack_class: SOUL-INJECT
technique_ids: [T-2001] # ← maps to the Agent Threat Matrix
surface_types: [hidden_text]
match:
type: regex
pattern: '(?i)\[?\s*(SYSTEM|ADMIN|INST)\s*\]?.{0,200}(ignore|override|disregard|reveal)'
excluded_domains: # ← v0.2.1 false-positive control
- '(.+\.)?owasp\.org'
- '(.+\.)?mitre\.org'
- 'docs\.anthropic\.com'
cwe_ids: [CWE-74]
hma_check_ids: [AI-WILD-001] # ← maps to a HackMyAgent check
status: activeFour ways to match
- regex
- A regular expression with optional flags and proximity windows (e.g. a role marker within 200 chars of an override verb).
- substring
- Fast literal containment, ideal for fingerprinting a service's response shape.
- unicode_range
- Match characters from specific Unicode blocks, catches steganography hidden in invisible/tag characters.
- composite
- Combine conditions with all_of (AND) or any_of (OR) for precise, low-false-positive matches.
Fingerprinting exposed infrastructure
Exposure signatures recognize a service by the shape of its response. Here a composite match identifies an unauthenticated Ollama server:
id: AIIS-EXPOSURE-OLLAMA-TAGS-01
category: exposure
name: Exposed Ollama model listing
severity: medium
attack_class: EXPOSURE-SELFHOSTED-LLM
surface_types: [http_body]
match:
type: composite
all_of:
- { type: substring, contains: ["\"models\":"] }
- { type: substring, contains: ["\"digest\":\"sha256:"] }
- { type: substring, contains: ["\"parameter_size\":"] }
cwe_ids: [CWE-200]Cutting false positives
Security researchers and docs sites quote injection samples as examples, and naive rules flag them. AIIS v0.2.1 added excluded_domains: a host allowlist so a signature is skipped on authoritative sources (NIST, OWASP, MITRE, vendor docs).
Specific, not blanket
The allowlist names specific subdomains likeowasp.github.io, it deliberately does not exempt generic hosts like github.io or medium.com, which an attacker could register on. The refinement removed ~72% of one signature's false positives.How a scanner uses them
- 1scannerLoad the signature pack
Any tool that implements the open schema can consume the shared rules.
- 2scannerSelect by surface
For each artefact (a web page, an HTTP response), run only the signatures whose surface_types apply.
- 3scannerCheck excluded_domains
Skip a signature if the source host matches its allowlist, the first line of FP defense.
- 4scannerEvaluate the match
Run the regex / substring / unicode / composite condition.
- 5scannerEmit a finding
On a hit, report severity, attack_class, the mapped T-id and CWE, interoperable across tools.
Who runs them
Because the format is open (Apache 2.0), one signature pack feeds many tools: OpenA2A's HoneyMap crawler uses them as its first classifier tier; HackMyAgent maps them to its AI-WILD-* checks; and any third-party scanner can adopt the schema. A signature written once is detection everywhere.