Tutorial: Detecting Real-time Phishing with Local AI (Ollama + Llama 3) in 2026

Phishing remains the number one attack vector in 2026. However, attacks have become far more sophisticated, using generative AI to create hyper-personalized emails and landing pages that deceive even the most advanced filters. To combat this, we need a defense that is as fast and intelligent as the attack: Local AI.

In this tutorial, we will build a real-time phishing detection system using Ollama and the Llama 3 model (or higher) running locally. This ensures that your users' sensitive data never leaves your network, and eliminates network latency from external APIs.

Why Use Local AI for Security in 2026?

Total Privacy: URLs and email content are never sent to the cloud.
Zero Cost: No per-token fees or third-party API subscriptions.
Speed: Instant processing on local hardware (especially with modern GPUs).
Resilience: Works even if the internet is unstable or cloud services go down.

Prerequisites

To follow this tutorial, you will need:

Python 3.10+ installed.
Ollama (download at ollama.com).
At least 8GB of VRAM (recommended for Llama 3 8B) or 16GB of RAM.
Llama 3 model downloaded (ollama run llama3).

Step 1: Setting Up the Environment and Dependencies

First, let's create a virtual environment and install the necessary library to interact with Ollama via Python.

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate
pip install ollama requests beautifulsoup4

ollama is the official SDK, while beautifulsoup4 will help us extract text from suspicious links for analysis.

Step 2: Creating the Analysis Engine (Prompt Engineering)

The effectiveness of our detector depends on a well-structured prompt. In 2026, language models are excellent at detecting patterns of psychological manipulation, which is the heart of phishing.

Create a file named detector.py and add the following code:

import ollama

def analyze_content(text):
    prompt = f"""
    You are a senior cybersecurity expert.
    Analyze the following text and determine the probability of it being a Phishing attack.
    Look for:
    1. Artificial urgency or threats.
    2. Subtle grammatical errors or strange domains.
    3. Requests for sensitive information (passwords, tokens).
    4. Links that do not match the context.

    Text for analysis: "{text}"

    Respond ONLY in JSON format:
    {{
        "phishing_score": (0 to 100),
        "reasons": ["reason 1", "reason 2"],
        "risk": "Low/Medium/High/Critical"
    }}
    """
    
    response = ollama.generate(model='llama3', prompt=prompt)
    return response['response']

# Simple test
test = "URGENT: Your bank account will be blocked in 2 hours. Click here to validate your data: http://secure-bank-login.net"
print(analyze_content(test))

Step 3: Security Web Scraping (Optional, but Recommended)

Often, phishing is not in the email text, but in the content of the destination page. Let's add a function to "peek" at the URL content before the user clicks.

import requests
from bs4 import BeautifulSoup

def extract_url_text(url):
    try:
        response = requests.get(url, timeout=5)
        soup = BeautifulSoup(response.text, 'html.parser')
        return soup.get_text()[:2000] # We take the first 2000 characters
    except Exception as e:
        return f"Error accessing URL: {e}"

Step 4: Implementing Decision Logic

Now, we integrate everything into a loop that can be connected to an email server webhook or a browser extension.

def full_scan(url):
    print(f"[*] Analyzing link: {url}")
    content = extract_url_text(url)
    result = analyze_content(content)
    
    # Here you can send an alert to Slack or block the traffic
    print(f"[!] AI Result: {result}")

# Example usage
full_scan("https://phishing-example.com")

Production Implementation Checklist

Before running this system in your company, check:

[ ] Quantization: Use GGUF (K-Quant) models to save memory without losing precision.
[ ] Whitelist: Always maintain a list of trusted domains to avoid false positives.
[ ] Retrain/Fine-tuning: If possible, fine-tune the model with real examples of attacks your company receives.
[ ] Logs: Keep logs of all detections for further threat analysis (Threat Intel).

Advanced Tip: Integrating with Browser Extensions

To make this detector truly "real-time" for end-users, you can wrap the Python logic in a simple FastAPI wrapper and call it from a Chrome or Firefox extension.

When a user hovers over a link, the extension sends the URL to your local FastAPI server, which triggers the full_scan function. This provides a "safety rating" before the user even clicks.

Sample FastAPI Wrapper:

from fastapi import FastAPI
app = FastAPI()

@app.get("/check-link")
def check(url: str):
    content = extract_url_text(url)
    result = analyze_content(content)
    return {"status": "analyzed", "data": result}

This architecture allows you to scale the protection across multiple machines in your network while keeping the "brain" (the LLM) centralized and secure.

Conclusion: The Future of Defense is Local

In 2026, relying solely on cloud solutions for security is a risk. The ability to process threat intelligence at the "edge" or locally is what differentiates resilient companies from vulnerable ones. This tutorial is the first step in creating an agentic and autonomous security infrastructure.

Want to take your company's security to the next level? At Landingfymax, we develop solutions that use the state-of-the-art in AI to protect your business and convert more leads with total security.

Learn about our AI and Security consulting services

This article was written for the Fymax Sentinel blog. Follow SEO and AdSense guidelines: Original content, high relevance, and focus on user experience.