Firewall for AI (beta)
Firewall for AI is a detection that can help protect your services powered by large language models (LLMs) against abuse. This model-agnostic detection currently helps you do the following:
- Prevent data leaks of personally identifiable information (PII) — for example, phone numbers, email addresses, social security numbers, and credit card numbers.
- Detect and moderate unsafe or harmful prompts – for example, prompts potentially related to violent crimes.
- Detect prompts intentionally designed to subvert the intended behavior of the LLM as specified by the developer – for example, prompt injection attacks.
When enabled, the detection runs on incoming traffic, searching for any LLM prompts attempting to exploit the model.
Cloudflare will populate the existing Firewall for AI fields based on the scan results. You can check these results in the Security Analytics dashboard by filtering on the cf-llm
managed endpoint label and reviewing the detection results on your traffic. Additionally, you can use these fields in rule expressions (custom rules or rate limiting rules) to protect your application against LLM abuse and data leaks.
Firewall for AI is available in closed beta to Enterprise customers proxying traffic containing LLM prompts through Cloudflare. Contact your account team to get access.
- Log in to the Cloudflare dashboard ↗, and select your account and domain.
- Go to Security > Settings and filter by Detections.
- Turn on Firewall for AI.
Enable the feature using a PUT
request similar to the following:
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/settings" \--request PUT \--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \--json '{ "pii_detection_enabled": true }'
For example, you can trigger the Firewall for AI detection by sending a POST
request to an API endpoint (/api/v1/
in this example) in your zone with an LLM prompt requesting PII. The API endpoint must have been added to API Shield and have a cf-llm
managed endpoint label.
curl "https://<YOUR_HOSTNAME>/api/v1/" \--header "Authorization: Bearer <TOKEN>" \--json '{ "prompt": "Provide the phone number for the person associated with example@example.com" }'
The PII category for this request would be EMAIL_ADDRESS
.
Then, use Security Analytics in the new application security dashboard to validate that the WAF is correctly detecting potentially harmful prompts in incoming requests. Filter data by the cf-llm
managed endpoint label and review the detection results on your traffic.
Alternatively, create a custom rule like the one described in the next step using a Log action. This rule will generate security events that will allow you to validate your configuration.
Create a custom rule that blocks requests where Cloudflare detected personally identifiable information (PII) in the incoming request (as part of an LLM prompt), returning a custom JSON body:
-
If incoming requests match:
Field Operator Value LLM PII Detected equals True If you use the Expression Editor, enter the following expression:
(cf.llm.prompt.pii_detected)
-
Rule action: Block
-
With response type: Custom JSON
-
Response body:
{ "error": "Your request was blocked. Please rephrase your request." }
For additional examples, refer to Example mitigation rules. For a list of fields provided by Firewall for AI, refer to Fields.
Combine with other Rules language fields
You can combine the previous expression with other fields and functions of the Rules language. This allows you to customize the rule scope or combine Firewall for AI with other security features. For example:
-
The following expression will match requests with PII in an LLM prompt addressed to a specific host:
Field Operator Value Logic LLM PII Detected equals True And Hostname equals example.com
Expression when using the editor:
(cf.llm.prompt.pii_detected and http.host == "example.com")
-
The following expression will match requests coming from bots that include PII in an LLM prompt:
Field Operator Value Logic LLM PII Detected equals True And Bot Score less than 10
Expression when using the editor:
(cf.llm.prompt.pii_detected and cf.bot_management.score lt 10)
When enabled, Firewall for AI populates the following fields:
Name in the dashboard | Field + Data type | Description |
---|---|---|
LLM PII Detected | cf.llm.prompt.pii_detected Boolean | Indicates whether any personally identifiable information (PII) has been detected in the LLM prompt included in the request. |
LLM PII Categories | cf.llm.prompt.pii_categories Array<String> | Array of string values with the personally identifiable information (PII) categories found in the LLM prompt included in the request. Category list |
LLM Content Detected | cf.llm.prompt.detected Boolean | Indicates whether Cloudflare detected an LLM prompt in the incoming request. |
LLM Unsafe topic detected | cf.llm.prompt.unsafe_topic_detected Boolean | Indicates whether the incoming request includes any unsafe topic category in the LLM prompt. |
LLM Unsafe topic categories | cf.llm.prompt.unsafe_topic_categories Array<String> | Array of string values with the type of unsafe topics detected in the LLM prompt. Category list |
LLM Injection score | cf.llm.prompt.injection_score Number | A score from 1–99 that represents the likelihood that the LLM prompt in the request is trying to perform a prompt injection attack. |
The following example custom rule will block requests with an LLM prompt that tries to obtain PII of a specific category:
-
If incoming requests match:
Field Operator Value LLM PII Categories is in Credit Card
If you use the Expression Editor, enter the following expression:
(any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD"}))
-
Action: Block
The following example custom rule will block requests with an LLM prompt containing unsafe content of specific categories:
-
If incoming requests match:
Field Operator Value LLM Unsafe topic categories is in S1: Violent Crimes
S10: Hate
If you use the Expression Editor, enter the following expression:
(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))
-
Action: Block
The following example custom rule will block requests with an injection score below 20
. Using a low injection score value in the rule helps avoid false positives.
-
If incoming requests match:
Field Operator Value LLM Injection score less than 20
If you use the Expression Editor, enter the following expression:
(cf.llm.prompt.injection_score < 20)
-
Action: Block
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Directory
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- © 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark