99% on Human-Written Attacks

Your LLM has no immune system.

Little Canary detects prompt injection attacks before they reach your production model. Open source. Self-hosted. 250ms.

Star on GitHub

Apache 2.0 | pip install little-canary

How it works

Three steps. No frameworks, no cloud accounts, no API keys.

01

Install

One command. No containers, no cloud keys.

pip install little-canary
02

Detect

Behavioral probing, not pattern matching. The canary gets attacked so your LLM doesn't.

Input → Filter (1ms) → Canary (250ms) → Your LLM
03

Results

Block, flag, or pass. Your rules, your thresholds.

verdict.safe verdict.blocked_by verdict.advisory
app.py
1from little_canary import SecurityPipeline
2
3pipeline = SecurityPipeline(canary_model="qwen2.5:1.5b", mode="full")
4verdict = pipeline.check(user_input)
5
6if not verdict.safe:
7 return "Sorry, I couldn't process that request."
8
9# Prepend advisory to your existing system prompt
10system = verdict.advisory.to_system_prefix() + "\n" + your_system_prompt
11response = your_llm(system=system, messages=[{"role": "user", "content": user_input}])

Results, not promises

Validated against human-written attacks from UC Berkeley's TensorTrust dataset. Same 400 prompts. Numbers you can verify yourself.

TensorTrust Detection

0%

on 400 human-written attacks

Token Savings

0%

of attacks stopped before your production LLM — zero tokens spent

False Positives

0%

on 200 diverse inputs incl. edge cases

Latency

~0ms

per check, local Ollama inference

TensorTrust — 400 Human-Written Attacks

Click to compare. Same dataset — see what the canary adds.

What happens to 400 attacksonly 4 bypass the full pipeline
60.2% blocked by canary pipeline (241/400)38.8% refused by Opus (155/400)1.0% bypassed (4/400)

241

attacks killed in 1ms — before any LLM call

99%

combined detection with Opus 4.6

60%

fewer tokens spent — attacks blocked before LLM

Cross-Model Comparison

Same 400 attacks. Same canary. Different production models — the weaker the model, the bigger the lift.

Opus 4.6API
+5.8pp
93.2%
alone
99%
+ canary
Qwen34B
+5.0pp
91.8%
alone
96.8%
+ canary
Llama 3.23B
+8.0pp
86.8%
alone
94.8%
+ canary
Llama 3.18B
+8.3pp
86.2%
alone
94.5%
+ canary
Dolphin38B
+8.7pp
85.5%
alone
94.2%
+ canary
Mistral7B
+9.9pp
83.6%
alone
93.5%
+ canary
Gemma312B
+11.7pp
80.5%
alone
92.2%
+ canary

A 4B model + Canary matches Opus 4.6 alone

Qwen3 (4B) with the canary pipeline reaches 96.8% — within 0.7pp of what Opus achieves without it (97.5%), at a fraction of the cost. The structural filter blocks 241 attacks consistently, regardless of which model sits behind it.

400 human-written attacks from UC Berkeley TensorTrust. See methodology

Security tools shouldn't be black boxes.

Every layer of Little Canary is open source under Apache 2.0. You can read the detection logic, audit the behavioral analysis, and run the benchmark suite yourself.

We believe the security layer protecting your LLM should be as transparent as the models it guards. No obfuscated classifiers. No API-only access. No trust-us-it-works.

Apache 2.0

Deployment modes

Run it your way. No vendor lock-in.

Self-Hosted

Run entirely on your infrastructure with Ollama. No data leaves your network. Pull a 1.5B model and go.

Design Partner Program

Pro

Open source now. If your team wants to co-build deployment patterns, become a design partner or request integration support.

Become a Design Partner

Little Canary is fully open source. If your team wants guided rollout, hardening, or policy/use-case tuning, request integration support.

Tell us your use case. We'll reply with design partner or integration options. No spam.