About This Project

What is a Honeypot?

A honeypot is a security mechanism that sets up a vulnerable-looking system to attract and monitor malicious activity. Think of it as a digital decoy - it appears to be a real web server with exploitable vulnerabilities, but it's actually just collecting data on attack patterns.

How This Works

This honeypot consists of:

  • Nginx Web Server: Serves fake WordPress, admin panels, and configuration files
  • Real-time Logging: Every HTTP request is logged with detailed metadata
  • Multi-Factor Classification: Python parser analyzes user-agent strings AND requested paths to calculate threat scores
  • Threat Scoring: Each request receives a score based on multiple factors (negative = benign, positive = suspicious/malicious)
  • SQLite Database: All data stored with threat level, score, and category
  • Live Dashboard: This dashboard queries the database every 30 seconds for real-time updates

Traffic Classification

The system categorizes all traffic into three threat levels using multi-factor analysis:

Benign Traffic (~25-30%)

  • Search Engines: Google, Bing, DuckDuckGo indexing the site
  • Security Research: Censys, Shodan, academic scanning projects
  • SEO Tools: Ahrefs, Semrush, legitimate analytics
  • Social Media Bots: Twitter, Reddit generating link previews

Reconnaissance (~60-65%)

  • Vulnerability Scanning: Looking for PHPUnit RCE, WordPress exploits
  • Credential Harvesting: Searching for .env files, .git directories
  • Directory Enumeration: Mapping site structure and common paths
  • Technology Fingerprinting: Identifying software versions

Malicious (~10-15%)

  • Remote Code Execution: Shell injection, command execution attempts
  • Botnet Recruitment: Mirai, Mozi malware installation attempts
  • Exploit Payloads: Known CVE exploitation (Log4Shell, etc.)
  • Web Shell Access: Looking for already-compromised sites

Technology Stack

Infrastructure

  • Linode VPS
  • Docker & Docker Compose
  • Ubuntu Linux

Backend

  • Nginx (Web Server)
  • Python 3.11
  • Flask (Dashboard)
  • SQLite (Database)

Frontend

  • HTML5 / CSS3
  • JavaScript
  • Chart.js

Classification Methodology

The system uses multi-factor threat assessment:

  • User-Agent Analysis: Identifies known bots (Google, security scanners, generic HTTP clients)
  • Path Pattern Matching: Regex patterns detect vulnerability scans, exploit attempts, credential harvesting
  • Threat Scoring: Combines both factors into numerical score (range: -10 to 50+)
  • Threshold Classification: Score < 0 = Benign, 0-19 = Reconnaissance, 20+ = Malicious

Example: Googlebot requesting /robots.txt = -10 (Benign). Unknown client requesting /shell?wget+malware = 50+ (Malicious).

Key Findings

After collecting data for several months, interesting patterns have emerged:

  • Not all traffic is malicious: ~25-30% is legitimate bots doing their job
  • Reconnaissance dominates: ~60% of traffic is scanning for vulnerabilities but not actively exploiting
  • Active exploitation is rare: Only ~10-15% of traffic attempts actual exploitation
  • Automated scanners rule: The same vulnerabilities (PHPUnit, WordPress) tested thousands of times by bots
  • Persistent attackers: Some IPs return repeatedly over days/weeks with identical patterns
  • Security researchers are noisy: Legitimate companies (Censys, Shodan, Palo Alto) generate significant traffic

Security & Ethics

This honeypot is completely passive - it only logs requests and returns fake responses. No actual vulnerabilities exist, and it cannot be used to attack other systems. All data is used for educational purposes and security research.

The honeypot serves no sensitive data and poses no risk to visitors. If you're seeing your IP in the logs, it means your system (or a bot on your network) is scanning for vulnerabilities.

Source Code

This project is open source and available on GitHub. Check out the code to learn how to build your own honeypot or improve this one.

View on GitHub

Contact

Questions about this project? Want to discuss security research?

Connect with me on LinkedIn