About This Project

What is a Honeypot?

A honeypot is a security mechanism that sets up a vulnerable-looking system to attract and monitor malicious activity. Think of it as a digital decoy - it appears to be a real web server with exploitable vulnerabilities, but it's actually just collecting data on attack patterns.

How This Works

This honeypot consists of:

Nginx Web Server: Serves fake WordPress, admin panels, and configuration files
Real-time Logging: Every HTTP request is logged with detailed metadata
Multi-Factor Classification: Python parser analyzes user-agent strings AND requested paths to calculate threat scores
Threat Scoring: Each request receives a score based on multiple factors (negative = benign, positive = suspicious/malicious)
SQLite Database: All data stored with threat level, score, and category
Live Dashboard: This dashboard queries the database every 30 seconds for real-time updates

Traffic Classification

The system categorizes all traffic into three threat levels using multi-factor analysis:

Benign Traffic (~25-30%)

Search Engines: Google, Bing, DuckDuckGo indexing the site
Security Research: Censys, Shodan, academic scanning projects
SEO Tools: Ahrefs, Semrush, legitimate analytics
Social Media Bots: Twitter, Reddit generating link previews

Reconnaissance (~60-65%)

Vulnerability Scanning: Looking for PHPUnit RCE, WordPress exploits
Credential Harvesting: Searching for .env files, .git directories
Directory Enumeration: Mapping site structure and common paths
Technology Fingerprinting: Identifying software versions

Malicious (~10-15%)

Remote Code Execution: Shell injection, command execution attempts
Botnet Recruitment: Mirai, Mozi malware installation attempts
Exploit Payloads: Known CVE exploitation (Log4Shell, etc.)
Web Shell Access: Looking for already-compromised sites

Technology Stack

Infrastructure

Linode VPS
Docker & Docker Compose
Ubuntu Linux

Backend

Nginx (Web Server)
Python 3.11
Flask (Dashboard)
SQLite (Database)

Frontend

HTML5 / CSS3
JavaScript
Chart.js

Classification Methodology

The system uses multi-factor threat assessment:

User-Agent Analysis: Identifies known bots (Google, security scanners, generic HTTP clients)
Path Pattern Matching: Regex patterns detect vulnerability scans, exploit attempts, credential harvesting
Threat Scoring: Combines both factors into numerical score (range: -10 to 50+)
Threshold Classification: Score < 0 = Benign, 0-19 = Reconnaissance, 20+ = Malicious

Example: Googlebot requesting /robots.txt = -10 (Benign). Unknown client requesting /shell?wget+malware = 50+ (Malicious).

Key Findings

After collecting data for several months, interesting patterns have emerged:

Not all traffic is malicious: ~25-30% is legitimate bots doing their job
Reconnaissance dominates: ~60% of traffic is scanning for vulnerabilities but not actively exploiting
Active exploitation is rare: Only ~10-15% of traffic attempts actual exploitation
Automated scanners rule: The same vulnerabilities (PHPUnit, WordPress) tested thousands of times by bots
Persistent attackers: Some IPs return repeatedly over days/weeks with identical patterns
Security researchers are noisy: Legitimate companies (Censys, Shodan, Palo Alto) generate significant traffic

Security & Ethics

This honeypot is completely passive - it only logs requests and returns fake responses. No actual vulnerabilities exist, and it cannot be used to attack other systems. All data is used for educational purposes and security research.

The honeypot serves no sensitive data and poses no risk to visitors. If you're seeing your IP in the logs, it means your system (or a bot on your network) is scanning for vulnerabilities.

Source Code

This project is open source and available on GitHub. Check out the code to learn how to build your own honeypot or improve this one.

View on GitHub

Contact

Questions about this project? Want to discuss security research?

Connect with me on LinkedIn