99% on Human-Written Attacks

Your LLM has no immune system.

Little Canary detects prompt injection attacks before they reach your production model. Open source. Self-hosted. 250ms.

Star on GitHub

Apache 2.0 | pip install little-canary

terminal

How it works

Three steps. No frameworks, no cloud accounts, no API keys.

Install

One command. No containers, no cloud keys.

pip install little-canary

Detect

Behavioral probing, not pattern matching. The canary gets attacked so your LLM doesn't.

Input → Filter (1ms) → Canary (250ms) → Your LLM

Results

Block, flag, or pass. Your rules, your thresholds.

verdict.safe verdict.blocked_by verdict.advisory

app.py

1from little_canary import SecurityPipeline
2
3pipeline = SecurityPipeline(canary_model="qwen2.5:1.5b", mode="full")
4verdict = pipeline.check(user_input)
5
6if not verdict.safe:
7    return "Sorry, I couldn't process that request."
8
9# Prepend advisory to your existing system prompt
10system = verdict.advisory.to_system_prefix() + "\n" + your_system_prompt
11response = your_llm(system=system, messages=[{"role": "user", "content": user_input}])

Results, not promises

Validated against human-written attacks from UC Berkeley's TensorTrust dataset. Same 400 prompts. Numbers you can verify yourself.

TensorTrust Detection

on 400 human-written attacks

Token Savings

of attacks stopped before your production LLM — zero tokens spent

False Positives

on 200 diverse inputs incl. edge cases

Latency

~0ms

per check, local Ollama inference

TensorTrust — 400 Human-Written Attacks

Click to compare. Same dataset — see what the canary adds.

What happens to 400 attacksonly 4 bypass the full pipeline

60.2% blocked by canary pipeline (241/400)38.8% refused by Opus (155/400)1.0% bypassed (4/400)

241

attacks killed in 1ms — before any LLM call

99%

combined detection with Opus 4.6

60%

fewer tokens spent — attacks blocked before LLM

Cross-Model Comparison

Same 400 attacks. Same canary. Different production models — the weaker the model, the bigger the lift.

Opus 4.6API

+5.8pp

93.2%

alone

99%

+ canary

Qwen34B

+5.0pp

91.8%

alone

96.8%

+ canary

Llama 3.23B

+8.0pp

86.8%

alone

94.8%

+ canary

Llama 3.18B

+8.3pp

86.2%

alone

94.5%

+ canary

Dolphin38B

+8.7pp

85.5%

alone

94.2%

+ canary

Mistral7B

+9.9pp

83.6%

alone

93.5%

+ canary

Gemma312B

+11.7pp

80.5%

alone

92.2%

+ canary

A 4B model + Canary matches Opus 4.6 alone

Qwen3 (4B) with the canary pipeline reaches 96.8% — within 0.7pp of what Opus achieves without it (97.5%), at a fraction of the cost. The structural filter blocks 241 attacks consistently, regardless of which model sits behind it.

400 human-written attacks from UC Berkeley TensorTrust. See methodology

Security tools shouldn't be black boxes.

Every layer of Little Canary is open source under Apache 2.0. You can read the detection logic, audit the behavioral analysis, and run the benchmark suite yourself.

We believe the security layer protecting your LLM should be as transparent as the models it guards. No obfuscated classifiers. No API-only access. No trust-us-it-works.

Apache 2.0

Deployment modes

Run it your way. No vendor lock-in.

Self-Hosted

Run entirely on your infrastructure with Ollama. No data leaves your network. Pull a 1.5B model and go.

Design Partner Program

Pro

Open source now. If your team wants to co-build deployment patterns, become a design partner or request integration support.

Become a Design Partner

Little Canary is fully open source. If your team wants guided rollout, hardening, or policy/use-case tuning, request integration support.

Tell us your use case. We'll reply with design partner or integration options. No spam.