A simple Python-based CLI tool that parses and analyzes different types of log files. The tool automatically detects the log format and extracts useful information such as log levels and error counts.
This project is designed as a lightweight log analysis pipeline with modular architecture.
- Auto-detection of log format
- Supports real-world log types
- Parses log entries into a normalized structure
- Counts log levels (INFO, WARNING, ERROR)
- CLI-based output for quick diagnostics
- Modular design for easy extension
Currently the parser can detect ad process:
Example:
2026-03-10 10:04:22 ERROR Database timeout
2026-03-10 10:04:23 WARNING Slow database queryExample:
Mar 10 12:11:03 ubuntu kernel: CPU error detectedExample:
127.0.0.1 - - [10/Mar/2026:12:11:05 +0000] "POST /login HTTP/1.1" 500 532HTTP status codes are interpreted as log levels:
- 2xx --> INFO
- 4xx --> WARNING
- 5xx --> ERROR
logparser/
│
├── main.py - Entry point to the program flow
├── parser.py - Detects log format, parses log lines, normalizing
├── patterns.py - RegEx patterns used to identify different log formats
├── analyzer.py - Processes parsed log entries and generates statistics
└── README.md - This readme you are reading right nowThe pipelie follows a simple flow:
Log File
↓
Format Detection
↓
Line Parsing
↓
Normalization
↓
Analysis
↓
CLI OutputThe parser attempts to match each line against known log patterns and extracts structured fields such as:
{
"timestamp": "...",
"level": "...",
"message": "...",
"source": "..."
}Run the parser from the command line and input the name of the logfile:
py main.pyFor Windows PowerShell:
py .\main.pyPotential features that could be added:
- Top error messages
- Error grouping / fingerprinting
- Time-based analysis
- Support for additional log formats
- Streaming large log files
- CLI arguments for filtering
This proejct was created as a learning exercise in:
- Log parsing
- Regex-based format detection
- Modular Python design
- Building CLI developer tools