About This Project
What is a Honeypot?
A honeypot is a security mechanism that sets up a vulnerable-looking system to attract and monitor
malicious activity. Think of it as a digital decoy - it appears to be a real web server with
exploitable vulnerabilities, but it's actually just collecting data on attack patterns.
How This Works
This honeypot consists of:
- Nginx Web Server: Serves fake WordPress, admin panels, and configuration files
- Real-time Logging: Every HTTP request is logged with detailed metadata
- Multi-Factor Classification: Python parser analyzes user-agent strings AND requested paths to calculate threat scores
- Threat Scoring: Each request receives a score based on multiple factors (negative = benign, positive = suspicious/malicious)
- SQLite Database: All data stored with threat level, score, and category
- Live Dashboard: This dashboard queries the database every 30 seconds for real-time updates
Traffic Classification
The system categorizes all traffic into three threat levels using multi-factor analysis:
Benign Traffic (~25-30%)
- Search Engines: Google, Bing, DuckDuckGo indexing the site
- Security Research: Censys, Shodan, academic scanning projects
- SEO Tools: Ahrefs, Semrush, legitimate analytics
- Social Media Bots: Twitter, Reddit generating link previews
Reconnaissance (~60-65%)
- Vulnerability Scanning: Looking for PHPUnit RCE, WordPress exploits
- Credential Harvesting: Searching for .env files, .git directories
- Directory Enumeration: Mapping site structure and common paths
- Technology Fingerprinting: Identifying software versions
Malicious (~10-15%)
- Remote Code Execution: Shell injection, command execution attempts
- Botnet Recruitment: Mirai, Mozi malware installation attempts
- Exploit Payloads: Known CVE exploitation (Log4Shell, etc.)
- Web Shell Access: Looking for already-compromised sites
Technology Stack
Infrastructure
- Linode VPS
- Docker & Docker Compose
- Ubuntu Linux
Backend
- Nginx (Web Server)
- Python 3.11
- Flask (Dashboard)
- SQLite (Database)
Frontend
- HTML5 / CSS3
- JavaScript
- Chart.js
Classification Methodology
The system uses multi-factor threat assessment:
- User-Agent Analysis: Identifies known bots (Google, security scanners, generic HTTP clients)
- Path Pattern Matching: Regex patterns detect vulnerability scans, exploit attempts, credential harvesting
- Threat Scoring: Combines both factors into numerical score (range: -10 to 50+)
- Threshold Classification: Score < 0 = Benign, 0-19 = Reconnaissance, 20+ = Malicious
Example: Googlebot requesting /robots.txt = -10 (Benign). Unknown client requesting /shell?wget+malware = 50+ (Malicious).
Key Findings
After collecting data for several months, interesting patterns have emerged:
- Not all traffic is malicious: ~25-30% is legitimate bots doing their job
- Reconnaissance dominates: ~60% of traffic is scanning for vulnerabilities but not actively exploiting
- Active exploitation is rare: Only ~10-15% of traffic attempts actual exploitation
- Automated scanners rule: The same vulnerabilities (PHPUnit, WordPress) tested thousands of times by bots
- Persistent attackers: Some IPs return repeatedly over days/weeks with identical patterns
- Security researchers are noisy: Legitimate companies (Censys, Shodan, Palo Alto) generate significant traffic
Security & Ethics
This honeypot is completely passive - it only logs requests and returns fake responses.
No actual vulnerabilities exist, and it cannot be used to attack other systems.
All data is used for educational purposes and security research.
The honeypot serves no sensitive data and poses no risk to visitors. If you're seeing your
IP in the logs, it means your system (or a bot on your network) is scanning for vulnerabilities.
Source Code
This project is open source and available on GitHub. Check out the code to learn how
to build your own honeypot or improve this one.
View on GitHub
Contact
Questions about this project? Want to discuss security research?
Connect with me on LinkedIn