diff --git a/CHANGELOG.md b/CHANGELOG.md index d3bb431..9fa8c86 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,64 +1 @@ -# Changelog - -All notable user-visible changes should be recorded here. - -## Unreleased - -### Added - -- Added sanitized golden `report.md` / `report.json` regression fixtures to lock report contracts. -- Expanded parser coverage for `Accepted publickey` and selected `pam_faillock` / `pam_sss` variants. -- Added compact host-level summaries for multi-host reports. - -### Changed - -- None yet. - -### Fixed - -- None yet. - -### Docs - -- None yet. - -## v0.2.0 - -### Added - -- Added dedicated sanitized parser fixture matrices for both `syslog_legacy` and `journalctl_short_full`, expanding `sshd` and `pam_unix` coverage. -- Added deterministic unknown-line telemetry coverage for unsupported parser inputs and unknown-pattern buckets. - -### Changed - -- Moved sudo handling onto the signal layer so detectors consume one unified normalized input model. -- Kept detector thresholds and the existing report schema stable while simplifying internal detector semantics. - -### Fixed - -- None. - -### Docs - -- Improved release-facing documentation in `README.md`, added `docs/release-process.md`, and formalized changelog discipline for future releases. - -## v0.1.0 - -### Added - -- Parser support for `syslog_legacy` and `journalctl_short_full` authentication log input. -- Rule-based detections for SSH brute force, multi-user probing, and sudo burst activity. -- Parser coverage telemetry including parsed/unparsed counts and unknown-pattern buckets. -- Repository automation and hardening with CI, CodeQL, pinned GitHub Actions, security policy, and Dependabot for workflow updates. - -### Changed - -- Established deterministic Markdown and JSON reporting for the MVP release. - -### Fixed - -- None. - -### Docs - -- Added CI, CodeQL, repository hardening guidance, and release-facing project documentation for the first public release. +# Changelog All notable user-visible changes should be recorded here. ## Unreleased ### Added - Added sanitized golden `report.md` / `report.json` regression fixtures to lock report contracts. - Expanded parser coverage for `Accepted publickey` and selected `pam_faillock` / `pam_sss` variants. - Added compact host-level summaries for multi-host reports. - Added optional CSV export for findings and warnings when explicitly requested. ### Changed - None yet. ### Fixed - None yet. ### Docs - None yet. ## v0.2.0 ### Added - Added dedicated sanitized parser fixture matrices for both `syslog_legacy` and `journalctl_short_full`, expanding `sshd` and `pam_unix` coverage. - Added deterministic unknown-line telemetry coverage for unsupported parser inputs and unknown-pattern buckets. ### Changed - Moved sudo handling onto the signal layer so detectors consume one unified normalized input model. - Kept detector thresholds and the existing report schema stable while simplifying internal detector semantics. ### Fixed - None. ### Docs - Improved release-facing documentation in `README.md`, added `docs/release-process.md`, and formalized changelog discipline for future releases. ## v0.1.0 ### Added - Parser support for `syslog_legacy` and `journalctl_short_full` authentication log input. - Rule-based detections for SSH brute force, multi-user probing, and sudo burst activity. - Parser coverage telemetry including parsed/unparsed counts and unknown-pattern buckets. - Repository automation and hardening with CI, CodeQL, pinned GitHub Actions, security policy, and Dependabot for workflow updates. ### Changed - Established deterministic Markdown and JSON reporting for the MVP release. ### Fixed - None. ### Docs - Added CI, CodeQL, repository hardening guidance, and release-facing project documentation for the first public release. \ No newline at end of file diff --git a/README.md b/README.md index b631c14..cc7b4f0 100644 --- a/README.md +++ b/README.md @@ -1,213 +1 @@ -# LogLens - -[![CI](https://github.com/stacknil/LogLens/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/ci.yml) -[![CodeQL](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml) - -C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. - -It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports. - -## Project Status - -LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. - -## Why This Project Exists - -Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. - -LogLens is built around three ideas: - -- detection engineering over offensive functionality -- parser observability over silent failure -- repository discipline over throwaway scripts - -The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene. - -## Scope - -LogLens is a defensive, public-safe repository. -It is intended for log parsing, detection experiments, and engineering practice. -It does not provide exploitation, persistence, credential attack automation, or live offensive capability. - -## Repository Checks - -LogLens includes two minimal GitHub Actions workflows: - -- `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` -- `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule - -Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md). - -## Threat Model - -LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. - -The current tool helps answer: - -- Is one source IP generating repeated SSH failures in a short window? -- Is one source IP trying several usernames in a short window? -- Is one account running sudo unusually often in a short window? - -It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. - -## Detections - -LogLens currently detects: - -- Repeated SSH failed password attempts from the same IP within 10 minutes -- One IP trying multiple usernames within 15 minutes -- Bursty sudo activity from the same user within 5 minutes - -LogLens currently parses and reports these additional auth patterns beyond the core detector inputs: - -- `Accepted publickey` SSH successes -- `Failed publickey` SSH failures, which count toward SSH brute-force detection by default -- `pam_unix(...:auth): authentication failure` -- `pam_unix(...:session): session opened` -- selected `pam_faillock(...:auth)` failure variants -- selected `pam_sss(...:auth)` failure variants - -LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: - -- `total_lines` -- `parsed_lines` -- `unparsed_lines` -- `parse_success_rate` -- `top_unknown_patterns` - -LogLens does not currently detect: - -- Lateral movement -- MFA abuse -- SSH key misuse -- Many PAM-specific failures beyond the parsed `pam_unix`, `pam_faillock`, and `pam_sss` sample patterns -- Cross-file or cross-host correlation - -## Build - -```bash -cmake -S . -B build -cmake --build build -ctest --test-dir build --output-on-failure -``` - -For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](./docs/dev-setup.md). - -## Run - -```bash -./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out -./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal -./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config -``` - -The CLI writes: - -- `report.md` -- `report.json` - -into the output directory you provide. If you omit the output directory, the files are written into the current working directory. - -When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. - -## Sample Output - -For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log). - -`report.md` summary excerpt: - -```markdown -## Summary -- Input mode: syslog_legacy -- Parsed events: 14 -- Findings: 3 -- Parser warnings: 2 -``` - -`report.json` summary excerpt: - -```json -{ - "input_mode": "syslog_legacy", - "parsed_event_count": 14, - "finding_count": 3, - "warning_count": 2 -} -``` - -The config file schema is intentionally small and strict: - -```json -{ - "input_mode": "syslog_legacy", - "timestamp": { - "assume_year": 2026 - }, - "brute_force": { "threshold": 5, "window_minutes": 10 }, - "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, - "sudo_burst": { "threshold": 3, "window_minutes": 5 }, - "auth_signal_mappings": { - "ssh_failed_password": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": true - }, - "ssh_invalid_user": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": true - }, - "ssh_failed_publickey": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": true - }, - "pam_auth_failure": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": false - } - } -} -``` - -This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. - -Timestamp handling is now explicit: - -- `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` -- `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` - -## Example Input - -```text -Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 -Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 -Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh -Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 -Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice -Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) -Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] -Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 -``` - -`journalctl --output short-full` style example: - -```text -Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 -Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh -Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 -Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] -``` - -## Known Limitations - -- `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly. -- `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations. -- Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support. -- Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings. -- `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them. -- Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats. -- Findings are rule-based triage aids, not incident verdicts or attribution. - -## Future Roadmap - -- Additional auth patterns and PAM coverage -- Optional CSV export -- Larger sanitized test corpus +# LogLens [![CI](https://github.com/stacknil/LogLens/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/ci.yml) [![CodeQL](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml) C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports, with optional CSV exports for findings and warnings. ## Project Status LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. ## Why This Project Exists Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. LogLens is built around three ideas: - detection engineering over offensive functionality - parser observability over silent failure - repository discipline over throwaway scripts The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene. ## Scope LogLens is a defensive, public-safe repository. It is intended for log parsing, detection experiments, and engineering practice. It does not provide exploitation, persistence, credential attack automation, or live offensive capability. ## Repository Checks LogLens includes two minimal GitHub Actions workflows: - `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` - `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md). ## Threat Model LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. The current tool helps answer: - Is one source IP generating repeated SSH failures in a short window? - Is one source IP trying several usernames in a short window? - Is one account running sudo unusually often in a short window? It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. ## Detections LogLens currently detects: - Repeated SSH failed password attempts from the same IP within 10 minutes - One IP trying multiple usernames within 15 minutes - Bursty sudo activity from the same user within 5 minutes LogLens currently parses and reports these additional auth patterns beyond the core detector inputs: - `Accepted publickey` SSH successes - `Failed publickey` SSH failures, which count toward SSH brute-force detection by default - `pam_unix(...:auth): authentication failure` - `pam_unix(...:session): session opened` - selected `pam_faillock(...:auth)` failure variants - selected `pam_sss(...:auth)` failure variants LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: - `total_lines` - `parsed_lines` - `unparsed_lines` - `parse_success_rate` - `top_unknown_patterns` LogLens does not currently detect: - Lateral movement - MFA abuse - SSH key misuse - Many PAM-specific failures beyond the parsed `pam_unix`, `pam_faillock`, and `pam_sss` sample patterns - Cross-file or cross-host correlation ## Build ```bash cmake -S . -B build cmake --build build ctest --test-dir build --output-on-failure ``` For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](./docs/dev-setup.md). ## Run ```bash ./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out ./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal ./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config ./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv ``` The CLI writes: - `report.md` - `report.json` into the output directory you provide. If you omit the output directory, the files are written into the current working directory. When you add `--csv`, LogLens also writes: - `findings.csv` - `warnings.csv` The CSV schema is intentionally small and stable: - `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary` - `warnings.csv`: `kind`, `message` When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. ## Sample Output For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log). `report.md` summary excerpt: ```markdown ## Summary - Input mode: syslog_legacy - Parsed events: 14 - Findings: 3 - Parser warnings: 2 ``` `report.json` summary excerpt: ```json { "input_mode": "syslog_legacy", "parsed_event_count": 14, "finding_count": 3, "warning_count": 2 } ``` The config file schema is intentionally small and strict: ```json { "input_mode": "syslog_legacy", "timestamp": { "assume_year": 2026 }, "brute_force": { "threshold": 5, "window_minutes": 10 }, "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, "sudo_burst": { "threshold": 3, "window_minutes": 5 }, "auth_signal_mappings": { "ssh_failed_password": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_invalid_user": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_failed_publickey": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "pam_auth_failure": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": false } } } ``` This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. Timestamp handling is now explicit: - `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` - `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` ## Example Input ```text Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 ``` `journalctl --output short-full` style example: ```text Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] ``` ## Known Limitations - `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly. - `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations. - Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support. - Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings. - `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them. - Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats. - Findings are rule-based triage aids, not incident verdicts or attribution. ## Future Roadmap - Additional auth patterns and PAM coverage - Larger sanitized test corpus \ No newline at end of file diff --git a/src/main.cpp b/src/main.cpp index d396ecd..c1f3171 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -1,172 +1 @@ -#include "config.hpp" -#include "detector.hpp" -#include "parser.hpp" -#include "report.hpp" - -#include -#include -#include -#include -#include -#include - -namespace { - -struct CliOptions { - std::optional config_path; - std::optional input_mode; - std::optional assumed_year; - std::filesystem::path input_path; - std::filesystem::path output_directory; -}; - -void print_usage() { - std::cerr << "Usage: loglens [--config ] [--mode ] [--year ] [output_dir]\n"; -} - -int parse_year_argument(std::string_view value) { - int parsed_year = 0; - const auto* begin = value.data(); - const auto* end = value.data() + value.size(); - const auto result = std::from_chars(begin, end, parsed_year); - if (result.ec != std::errc{} || result.ptr != end || parsed_year <= 0) { - throw std::runtime_error("invalid year value: " + std::string(value)); - } - - return parsed_year; -} - -CliOptions parse_cli_options(int argc, char* argv[]) { - if (argc < 2) { - throw std::runtime_error("missing required arguments"); - } - - int index = 1; - CliOptions options; - - while (index < argc) { - const std::string_view argument = argv[index]; - if (argument == "--config") { - if (index + 1 >= argc) { - throw std::runtime_error("missing path after --config"); - } - - options.config_path = std::filesystem::path{argv[index + 1]}; - index += 2; - continue; - } - - if (argument == "--mode") { - if (index + 1 >= argc) { - throw std::runtime_error("missing value after --mode"); - } - - const auto parsed_mode = loglens::parse_input_mode(argv[index + 1]); - if (!parsed_mode.has_value()) { - throw std::runtime_error("unsupported mode: " + std::string{argv[index + 1]}); - } - - options.input_mode = *parsed_mode; - index += 2; - continue; - } - - if (argument == "--year") { - if (index + 1 >= argc) { - throw std::runtime_error("missing value after --year"); - } - - options.assumed_year = parse_year_argument(argv[index + 1]); - index += 2; - continue; - } - - if (argument.starts_with('-')) { - throw std::runtime_error("unknown option: " + std::string{argv[index]}); - } - - break; - } - - const int remaining = argc - index; - if (remaining < 1 || remaining > 2) { - throw std::runtime_error("invalid argument count"); - } - - options.input_path = std::filesystem::path{argv[index]}; - options.output_directory = remaining == 2 - ? std::filesystem::path{argv[index + 1]} - : std::filesystem::current_path(); - return options; -} - -loglens::ParserConfig resolve_parser_config(const CliOptions& options, const loglens::AppConfig& config) { - const auto resolved_mode = options.input_mode.has_value() - ? options.input_mode - : config.input_mode; - if (!resolved_mode.has_value()) { - throw std::runtime_error("input mode is required; use --mode or input_mode in config.json"); - } - - loglens::ParserConfig parser_config; - parser_config.input_mode = *resolved_mode; - - if (parser_config.input_mode == loglens::InputMode::SyslogLegacy) { - parser_config.assumed_year = options.assumed_year.has_value() - ? options.assumed_year - : config.timestamp.assume_year; - if (!parser_config.assumed_year.has_value()) { - throw std::runtime_error("syslog mode requires --year or timestamp.assume_year in config.json"); - } - } - - return parser_config; -} - -} // namespace - -int main(int argc, char* argv[]) { - CliOptions options; - try { - options = parse_cli_options(argc, argv); - } catch (const std::exception& error) { - print_usage(); - std::cerr << "LogLens failed: " << error.what() << '\n'; - return 1; - } - - try { - const auto app_config = options.config_path.has_value() - ? loglens::load_app_config(*options.config_path) - : loglens::AppConfig{}; - const auto parser_config = resolve_parser_config(options, app_config); - - const loglens::AuthLogParser parser(parser_config); - const auto parsed = parser.parse_file(options.input_path); - - const loglens::Detector detector(app_config.detector); - const auto findings = detector.analyze(parsed.events); - - const loglens::ReportData report_data{ - options.input_path, - parsed.metadata, - parsed.quality, - parsed.events, - findings, - parsed.warnings, - app_config.detector.auth_signal_mappings}; - - loglens::write_reports(report_data, options.output_directory); - - std::cout << "Parsed events: " << parsed.events.size() << '\n'; - std::cout << "Findings: " << findings.size() << '\n'; - std::cout << "Warnings: " << parsed.warnings.size() << '\n'; - std::cout << "Markdown report: " << (options.output_directory / "report.md").string() << '\n'; - std::cout << "JSON report: " << (options.output_directory / "report.json").string() << '\n'; - } catch (const std::exception& error) { - std::cerr << "LogLens failed: " << error.what() << '\n'; - return 1; - } - - return 0; -} +#include "config.hpp" #include "detector.hpp" #include "parser.hpp" #include "report.hpp" #include #include #include #include #include #include namespace { struct CliOptions { std::optional config_path; std::optional input_mode; std::optional assumed_year; bool emit_csv = false; std::filesystem::path input_path; std::filesystem::path output_directory; }; void print_usage() { std::cerr << "Usage: loglens [--config ] [--mode ] [--year ] [--csv] [output_dir]\n"; } int parse_year_argument(std::string_view value) { int parsed_year = 0; const auto* begin = value.data(); const auto* end = value.data() + value.size(); const auto result = std::from_chars(begin, end, parsed_year); if (result.ec != std::errc{} || result.ptr != end || parsed_year <= 0) { throw std::runtime_error("invalid year value: " + std::string(value)); } return parsed_year; } CliOptions parse_cli_options(int argc, char* argv[]) { if (argc < 2) { throw std::runtime_error("missing required arguments"); } int index = 1; CliOptions options; while (index < argc) { const std::string_view argument = argv[index]; if (argument == "--config") { if (index + 1 >= argc) { throw std::runtime_error("missing path after --config"); } options.config_path = std::filesystem::path{argv[index + 1]}; index += 2; continue; } if (argument == "--mode") { if (index + 1 >= argc) { throw std::runtime_error("missing value after --mode"); } const auto parsed_mode = loglens::parse_input_mode(argv[index + 1]); if (!parsed_mode.has_value()) { throw std::runtime_error("unsupported mode: " + std::string{argv[index + 1]}); } options.input_mode = *parsed_mode; index += 2; continue; } if (argument == "--year") { if (index + 1 >= argc) { throw std::runtime_error("missing value after --year"); } options.assumed_year = parse_year_argument(argv[index + 1]); index += 2; continue; } if (argument == "--csv") { options.emit_csv = true; ++index; continue; } if (argument.starts_with('-')) { throw std::runtime_error("unknown option: " + std::string{argv[index]}); } break; } const int remaining = argc - index; if (remaining < 1 || remaining > 2) { throw std::runtime_error("invalid argument count"); } options.input_path = std::filesystem::path{argv[index]}; options.output_directory = remaining == 2 ? std::filesystem::path{argv[index + 1]} : std::filesystem::current_path(); return options; } loglens::ParserConfig resolve_parser_config(const CliOptions& options, const loglens::AppConfig& config) { const auto resolved_mode = options.input_mode.has_value() ? options.input_mode : config.input_mode; if (!resolved_mode.has_value()) { throw std::runtime_error("input mode is required; use --mode or input_mode in config.json"); } loglens::ParserConfig parser_config; parser_config.input_mode = *resolved_mode; if (parser_config.input_mode == loglens::InputMode::SyslogLegacy) { parser_config.assumed_year = options.assumed_year.has_value() ? options.assumed_year : config.timestamp.assume_year; if (!parser_config.assumed_year.has_value()) { throw std::runtime_error("syslog mode requires --year or timestamp.assume_year in config.json"); } } return parser_config; } } // namespace int main(int argc, char* argv[]) { CliOptions options; try { options = parse_cli_options(argc, argv); } catch (const std::exception& error) { print_usage(); std::cerr << "LogLens failed: " << error.what() << '\n'; return 1; } try { const auto app_config = options.config_path.has_value() ? loglens::load_app_config(*options.config_path) : loglens::AppConfig{}; const auto parser_config = resolve_parser_config(options, app_config); const loglens::AuthLogParser parser(parser_config); const auto parsed = parser.parse_file(options.input_path); const loglens::Detector detector(app_config.detector); const auto findings = detector.analyze(parsed.events); const loglens::ReportData report_data{ options.input_path, parsed.metadata, parsed.quality, parsed.events, findings, parsed.warnings, app_config.detector.auth_signal_mappings}; loglens::write_reports(report_data, options.output_directory, options.emit_csv); std::cout << "Parsed events: " << parsed.events.size() << '\n'; std::cout << "Findings: " << findings.size() << '\n'; std::cout << "Warnings: " << parsed.warnings.size() << '\n'; std::cout << "Markdown report: " << (options.output_directory / "report.md").string() << '\n'; std::cout << "JSON report: " << (options.output_directory / "report.json").string() << '\n'; if (options.emit_csv) { std::cout << "Findings CSV: " << (options.output_directory / "findings.csv").string() << '\n'; std::cout << "Warnings CSV: " << (options.output_directory / "warnings.csv").string() << '\n'; } } catch (const std::exception& error) { std::cerr << "LogLens failed: " << error.what() << '\n'; return 1; } return 0; } \ No newline at end of file diff --git a/src/report.cpp b/src/report.cpp index be8126d..5ad503e 100644 --- a/src/report.cpp +++ b/src/report.cpp @@ -1,519 +1 @@ -#include "report.hpp" - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -namespace loglens { -namespace { - -struct HostSummary { - std::string hostname; - std::size_t parsed_event_count = 0; - std::size_t finding_count = 0; - std::size_t warning_count = 0; - std::vector> event_counts; -}; - -std::string escape_json(std::string_view value) { - std::string escaped; - escaped.reserve(value.size()); - - for (const char character : value) { - switch (character) { - case '\\': - escaped += "\\\\"; - break; - case '"': - escaped += "\\\""; - break; - case '\n': - escaped += "\\n"; - break; - case '\r': - escaped += "\\r"; - break; - case '\t': - escaped += "\\t"; - break; - default: - escaped += character; - break; - } - } - - return escaped; -} - -std::vector sorted_findings(const std::vector& findings) { - auto ordered = findings; - std::sort(ordered.begin(), ordered.end(), [](const Finding& left, const Finding& right) { - if (left.type != right.type) { - return to_string(left.type) < to_string(right.type); - } - if (left.subject != right.subject) { - return left.subject < right.subject; - } - return left.first_seen < right.first_seen; - }); - return ordered; -} - -std::vector sorted_warnings(const std::vector& warnings) { - auto ordered = warnings; - std::sort(ordered.begin(), ordered.end(), [](const ParseWarning& left, const ParseWarning& right) { - if (left.line_number != right.line_number) { - return left.line_number < right.line_number; - } - return left.reason < right.reason; - }); - return ordered; -} - -std::vector> build_event_counts(const std::vector& events) { - std::vector> counts = { - {EventType::SshFailedPassword, 0}, - {EventType::SshAcceptedPassword, 0}, - {EventType::SshAcceptedPublicKey, 0}, - {EventType::SshInvalidUser, 0}, - {EventType::SshFailedPublicKey, 0}, - {EventType::PamAuthFailure, 0}, - {EventType::SessionOpened, 0}, - {EventType::SudoCommand, 0}, - {EventType::Unknown, 0}}; - - for (const auto& event : events) { - for (auto& [type, count] : counts) { - if (type == event.event_type) { - ++count; - break; - } - } - } - - counts.erase( - std::remove_if(counts.begin(), counts.end(), [](const auto& entry) { - return entry.second == 0; - }), - counts.end()); - - return counts; -} - -std::string usernames_note(const Finding& finding) { - if (finding.usernames.empty()) { - return finding.summary; - } - - std::ostringstream note; - note << finding.summary << " Usernames: "; - for (std::size_t index = 0; index < finding.usernames.size(); ++index) { - if (index != 0) { - note << ", "; - } - note << finding.usernames[index]; - } - return note.str(); -} - -std::string format_parse_success_rate(double rate) { - std::ostringstream output; - output << std::fixed << std::setprecision(4) << rate; - return output.str(); -} - -std::string format_parse_success_percent(double rate) { - std::ostringstream output; - output << std::fixed << std::setprecision(2) << (rate * 100.0) << '%'; - return output.str(); -} - -std::string_view trim_left(std::string_view value) { - while (!value.empty() && (value.front() == ' ' || value.front() == '\t')) { - value.remove_prefix(1); - } - return value; -} - -std::string_view consume_token(std::string_view& input) { - input = trim_left(input); - if (input.empty()) { - return {}; - } - - const auto separator = input.find(' '); - if (separator == std::string_view::npos) { - const auto token = input; - input = {}; - return token; - } - - const auto token = input.substr(0, separator); - input.remove_prefix(separator + 1); - return token; -} - -std::optional extract_hostname_from_input_line(std::string_view line, InputMode input_mode) { - auto remaining = line; - switch (input_mode) { - case InputMode::SyslogLegacy: - if (consume_token(remaining).empty() - || consume_token(remaining).empty() - || consume_token(remaining).empty()) { - return std::nullopt; - } - break; - case InputMode::JournalctlShortFull: - if (consume_token(remaining).empty() - || consume_token(remaining).empty() - || consume_token(remaining).empty() - || consume_token(remaining).empty()) { - return std::nullopt; - } - break; - default: - return std::nullopt; - } - - const auto hostname = consume_token(remaining); - if (hostname.empty()) { - return std::nullopt; - } - - return std::string(hostname); -} - -std::unordered_map load_hostnames_by_line(const ReportData& data) { - std::unordered_map hostnames_by_line; - if (data.warnings.empty()) { - return hostnames_by_line; - } - - std::ifstream input(data.input_path); - if (!input) { - return hostnames_by_line; - } - - std::string line; - std::size_t line_number = 0; - while (std::getline(input, line)) { - ++line_number; - const auto hostname = extract_hostname_from_input_line(line, data.parse_metadata.input_mode); - if (hostname.has_value()) { - hostnames_by_line.emplace(line_number, *hostname); - } - } - - return hostnames_by_line; -} - -bool is_matching_finding_signal(const Finding& finding, const AuthSignal& signal) { - if (signal.timestamp < finding.first_seen || signal.timestamp > finding.last_seen) { - return false; - } - - switch (finding.type) { - case FindingType::BruteForce: - return signal.counts_as_terminal_auth_failure - && signal.source_ip == finding.subject; - case FindingType::MultiUserProbing: - if (!signal.counts_as_attempt_evidence || signal.source_ip != finding.subject) { - return false; - } - if (finding.usernames.empty()) { - return true; - } - return std::find( - finding.usernames.begin(), - finding.usernames.end(), - signal.username) - != finding.usernames.end(); - case FindingType::SudoBurst: - return signal.counts_as_sudo_burst_evidence - && signal.username == finding.subject; - default: - return false; - } -} - -std::vector build_host_summaries(const ReportData& data) { - std::unordered_map summaries_by_host; - - for (const auto& event : data.events) { - if (event.hostname.empty()) { - continue; - } - - auto& summary = summaries_by_host[event.hostname]; - summary.hostname = event.hostname; - ++summary.parsed_event_count; - } - - const auto hostnames_by_line = load_hostnames_by_line(data); - for (const auto& warning : data.warnings) { - const auto hostname_it = hostnames_by_line.find(warning.line_number); - if (hostname_it == hostnames_by_line.end() || hostname_it->second.empty()) { - continue; - } - - auto& summary = summaries_by_host[hostname_it->second]; - summary.hostname = hostname_it->second; - ++summary.warning_count; - } - - if (summaries_by_host.size() <= 1) { - return {}; - } - - std::unordered_map hostname_by_event_line; - hostname_by_event_line.reserve(data.events.size()); - std::unordered_map> events_by_host; - events_by_host.reserve(summaries_by_host.size()); - - for (const auto& event : data.events) { - hostname_by_event_line.emplace(event.line_number, event.hostname); - events_by_host[event.hostname].push_back(event); - } - - const auto signals = build_auth_signals(data.events, data.auth_signal_mappings); - for (const auto& finding : data.findings) { - std::unordered_set matching_hosts; - for (const auto& signal : signals) { - if (!is_matching_finding_signal(finding, signal)) { - continue; - } - - const auto hostname_it = hostname_by_event_line.find(signal.line_number); - if (hostname_it == hostname_by_event_line.end() || hostname_it->second.empty()) { - continue; - } - matching_hosts.insert(hostname_it->second); - } - - for (const auto& hostname : matching_hosts) { - ++summaries_by_host[hostname].finding_count; - } - } - - std::vector summaries; - summaries.reserve(summaries_by_host.size()); - for (auto& [hostname, summary] : summaries_by_host) { - const auto events_it = events_by_host.find(hostname); - if (events_it != events_by_host.end()) { - summary.event_counts = build_event_counts(events_it->second); - } - summaries.push_back(std::move(summary)); - } - - std::sort(summaries.begin(), summaries.end(), [](const HostSummary& left, const HostSummary& right) { - return left.hostname < right.hostname; - }); - - return summaries; -} - -} // namespace - -std::string render_markdown_report(const ReportData& data) { - std::ostringstream output; - const auto findings = sorted_findings(data.findings); - const auto warnings = sorted_warnings(data.warnings); - const auto event_counts = build_event_counts(data.events); - const auto host_summaries = build_host_summaries(data); - - output << "# LogLens Report\n\n"; - output << "## Summary\n\n"; - output << "- Input: `" << data.input_path.generic_string() << "`\n"; - output << "- Input mode: " << to_string(data.parse_metadata.input_mode) << '\n'; - if (data.parse_metadata.assume_year.has_value()) { - output << "- Assume year: " << *data.parse_metadata.assume_year << '\n'; - } - output << "- Timezone present: " << (data.parse_metadata.timezone_present ? "true" : "false") << '\n'; - output << "- Total lines: " << data.parser_quality.total_lines << '\n'; - output << "- Parsed lines: " << data.parser_quality.parsed_lines << '\n'; - output << "- Unparsed lines: " << data.parser_quality.unparsed_lines << '\n'; - output << "- Parse success rate: " << format_parse_success_percent(data.parser_quality.parse_success_rate) << '\n'; - output << "- Parsed events: " << data.events.size() << '\n'; - output << "- Findings: " << findings.size() << '\n'; - output << "- Parser warnings: " << warnings.size() << "\n\n"; - - if (!host_summaries.empty()) { - output << "## Host Summary\n\n"; - output << "| Host | Parsed Events | Findings | Warnings |\n"; - output << "| --- | ---: | ---: | ---: |\n"; - for (const auto& summary : host_summaries) { - output << "| " << summary.hostname - << " | " << summary.parsed_event_count - << " | " << summary.finding_count - << " | " << summary.warning_count << " |\n"; - } - output << '\n'; - } - - output << "## Findings\n\n"; - if (findings.empty()) { - output << "No configured detections matched the analyzed events.\n\n"; - } else { - output << "| Rule | Subject | Count | Window | Notes |\n"; - output << "| --- | --- | ---: | --- | --- |\n"; - for (const auto& finding : findings) { - output << "| " << to_string(finding.type) - << " | " << finding.subject - << " | " << finding.event_count - << " | " << format_timestamp(finding.first_seen) - << " -> " << format_timestamp(finding.last_seen) - << " | " << usernames_note(finding) << " |\n"; - } - output << '\n'; - } - - output << "## Event Counts\n\n"; - output << "| Event Type | Count |\n"; - output << "| --- | ---: |\n"; - for (const auto& [type, count] : event_counts) { - output << "| " << to_string(type) << " | " << count << " |\n"; - } - output << '\n'; - - output << "## Parser Quality\n\n"; - if (data.parser_quality.top_unknown_patterns.empty()) { - output << "All analyzed lines matched a supported pattern.\n\n"; - } else { - output << "| Unknown Pattern | Count |\n"; - output << "| --- | ---: |\n"; - for (const auto& entry : data.parser_quality.top_unknown_patterns) { - output << "| " << entry.pattern << " | " << entry.count << " |\n"; - } - output << '\n'; - } - - output << "## Parser Warnings\n\n"; - if (warnings.empty()) { - output << "No malformed lines were skipped.\n"; - } else { - output << "| Line | Reason |\n"; - output << "| ---: | --- |\n"; - for (const auto& warning : warnings) { - output << "| " << warning.line_number << " | " << warning.reason << " |\n"; - } - } - - return output.str(); -} - -std::string render_json_report(const ReportData& data) { - std::ostringstream output; - const auto findings = sorted_findings(data.findings); - const auto warnings = sorted_warnings(data.warnings); - const auto event_counts = build_event_counts(data.events); - const auto host_summaries = build_host_summaries(data); - - output << "{\n"; - output << " \"tool\": \"LogLens\",\n"; - output << " \"input\": \"" << escape_json(data.input_path.generic_string()) << "\",\n"; - output << " \"input_mode\": \"" << to_string(data.parse_metadata.input_mode) << "\",\n"; - if (data.parse_metadata.assume_year.has_value()) { - output << " \"assume_year\": " << *data.parse_metadata.assume_year << ",\n"; - } - output << " \"timezone_present\": " << (data.parse_metadata.timezone_present ? "true" : "false") << ",\n"; - output << " \"parser_quality\": {\n"; - output << " \"total_lines\": " << data.parser_quality.total_lines << ",\n"; - output << " \"parsed_lines\": " << data.parser_quality.parsed_lines << ",\n"; - output << " \"unparsed_lines\": " << data.parser_quality.unparsed_lines << ",\n"; - output << " \"parse_success_rate\": " << format_parse_success_rate(data.parser_quality.parse_success_rate) << ",\n"; - output << " \"top_unknown_patterns\": [\n"; - for (std::size_t index = 0; index < data.parser_quality.top_unknown_patterns.size(); ++index) { - const auto& entry = data.parser_quality.top_unknown_patterns[index]; - output << " {\"pattern\": \"" << escape_json(entry.pattern) << "\", \"count\": " << entry.count << "}"; - output << (index + 1 == data.parser_quality.top_unknown_patterns.size() ? "\n" : ",\n"); - } - output << " ]\n"; - output << " },\n"; - output << " \"parsed_event_count\": " << data.events.size() << ",\n"; - output << " \"warning_count\": " << warnings.size() << ",\n"; - output << " \"finding_count\": " << findings.size() << ",\n"; - output << " \"event_counts\": [\n"; - for (std::size_t index = 0; index < event_counts.size(); ++index) { - const auto& [type, count] = event_counts[index]; - output << " {\"event_type\": \"" << to_string(type) << "\", \"count\": " << count << "}"; - output << (index + 1 == event_counts.size() ? "\n" : ",\n"); - } - output << " ]"; - if (!host_summaries.empty()) { - output << ",\n"; - output << " \"host_summaries\": [\n"; - for (std::size_t host_index = 0; host_index < host_summaries.size(); ++host_index) { - const auto& summary = host_summaries[host_index]; - output << " {\n"; - output << " \"hostname\": \"" << escape_json(summary.hostname) << "\",\n"; - output << " \"parsed_event_count\": " << summary.parsed_event_count << ",\n"; - output << " \"finding_count\": " << summary.finding_count << ",\n"; - output << " \"warning_count\": " << summary.warning_count << ",\n"; - output << " \"event_counts\": [\n"; - for (std::size_t event_index = 0; event_index < summary.event_counts.size(); ++event_index) { - const auto& [type, count] = summary.event_counts[event_index]; - output << " {\"event_type\": \"" << to_string(type) << "\", \"count\": " << count << "}"; - output << (event_index + 1 == summary.event_counts.size() ? "\n" : ",\n"); - } - output << " ]\n"; - output << " }"; - output << (host_index + 1 == host_summaries.size() ? "\n" : ",\n"); - } - output << " ],\n"; - } else { - output << ",\n"; - } - output << " \"findings\": [\n"; - for (std::size_t index = 0; index < findings.size(); ++index) { - const auto& finding = findings[index]; - output << " {\n"; - output << " \"rule\": \"" << to_string(finding.type) << "\",\n"; - output << " \"subject_kind\": \"" << escape_json(finding.subject_kind) << "\",\n"; - output << " \"subject\": \"" << escape_json(finding.subject) << "\",\n"; - output << " \"event_count\": " << finding.event_count << ",\n"; - output << " \"window_start\": \"" << format_timestamp(finding.first_seen) << "\",\n"; - output << " \"window_end\": \"" << format_timestamp(finding.last_seen) << "\",\n"; - output << " \"usernames\": ["; - for (std::size_t name_index = 0; name_index < finding.usernames.size(); ++name_index) { - output << '"' << escape_json(finding.usernames[name_index]) << '"'; - if (name_index + 1 != finding.usernames.size()) { - output << ", "; - } - } - output << "],\n"; - output << " \"summary\": \"" << escape_json(finding.summary) << "\"\n"; - output << " }"; - output << (index + 1 == findings.size() ? "\n" : ",\n"); - } - output << " ],\n"; - output << " \"warnings\": [\n"; - for (std::size_t index = 0; index < warnings.size(); ++index) { - const auto& warning = warnings[index]; - output << " {\"line_number\": " << warning.line_number - << ", \"reason\": \"" << escape_json(warning.reason) << "\"}"; - output << (index + 1 == warnings.size() ? "\n" : ",\n"); - } - output << " ]\n"; - output << "}\n"; - return output.str(); -} - -void write_reports(const ReportData& data, const std::filesystem::path& output_directory) { - std::filesystem::create_directories(output_directory); - - std::ofstream markdown_output(output_directory / "report.md"); - markdown_output << render_markdown_report(data); - - std::ofstream json_output(output_directory / "report.json"); - json_output << render_json_report(data); -} - -} // namespace loglens +#include "report.hpp" #include #include #include #include #include #include #include #include #include #include #include namespace loglens { namespace { struct HostSummary { std::string hostname; std::size_t parsed_event_count = 0; std::size_t finding_count = 0; std::size_t warning_count = 0; std::vector> event_counts; }; std::string escape_json(std::string_view value) { std::string escaped; escaped.reserve(value.size()); for (const char character : value) { switch (character) { case '\\': escaped += "\\\\"; break; case '"': escaped += "\\\""; break; case '\n': escaped += "\\n"; break; case '\r': escaped += "\\r"; break; case '\t': escaped += "\\t"; break; default: escaped += character; break; } } return escaped; } std::string escape_csv(std::string_view value) { bool needs_quotes = value.find_first_of(",\"\n\r") != std::string_view::npos; std::string escaped; escaped.reserve(value.size() + 2); if (needs_quotes) { escaped.push_back('"'); } for (const char character : value) { if (character == '"') { escaped += "\"\""; } else { escaped.push_back(character); } } if (needs_quotes) { escaped.push_back('"'); } return escaped; } std::vector sorted_findings(const std::vector& findings) { auto ordered = findings; std::sort(ordered.begin(), ordered.end(), [](const Finding& left, const Finding& right) { if (left.type != right.type) { return to_string(left.type) < to_string(right.type); } if (left.subject != right.subject) { return left.subject < right.subject; } return left.first_seen < right.first_seen; }); return ordered; } std::vector sorted_warnings(const std::vector& warnings) { auto ordered = warnings; std::sort(ordered.begin(), ordered.end(), [](const ParseWarning& left, const ParseWarning& right) { if (left.line_number != right.line_number) { return left.line_number < right.line_number; } return left.reason < right.reason; }); return ordered; } std::vector> build_event_counts(const std::vector& events) { std::vector> counts = { {EventType::SshFailedPassword, 0}, {EventType::SshAcceptedPassword, 0}, {EventType::SshAcceptedPublicKey, 0}, {EventType::SshInvalidUser, 0}, {EventType::SshFailedPublicKey, 0}, {EventType::PamAuthFailure, 0}, {EventType::SessionOpened, 0}, {EventType::SudoCommand, 0}, {EventType::Unknown, 0}}; for (const auto& event : events) { for (auto& [type, count] : counts) { if (type == event.event_type) { ++count; break; } } } counts.erase( std::remove_if(counts.begin(), counts.end(), [](const auto& entry) { return entry.second == 0; }), counts.end()); return counts; } std::string usernames_note(const Finding& finding) { if (finding.usernames.empty()) { return finding.summary; } std::ostringstream note; note << finding.summary << " Usernames: "; for (std::size_t index = 0; index < finding.usernames.size(); ++index) { if (index != 0) { note << ", "; } note << finding.usernames[index]; } return note.str(); } std::string usernames_csv_field(const Finding& finding) { std::ostringstream usernames; for (std::size_t index = 0; index < finding.usernames.size(); ++index) { if (index != 0) { usernames << ';'; } usernames << finding.usernames[index]; } return usernames.str(); } std::string format_parse_success_rate(double rate) { std::ostringstream output; output << std::fixed << std::setprecision(4) << rate; return output.str(); } std::string format_parse_success_percent(double rate) { std::ostringstream output; output << std::fixed << std::setprecision(2) << (rate * 100.0) << '%'; return output.str(); } std::string_view trim_left(std::string_view value) { while (!value.empty() && (value.front() == ' ' || value.front() == '\t')) { value.remove_prefix(1); } return value; } std::string_view consume_token(std::string_view& input) { input = trim_left(input); if (input.empty()) { return {}; } const auto separator = input.find(' '); if (separator == std::string_view::npos) { const auto token = input; input = {}; return token; } const auto token = input.substr(0, separator); input.remove_prefix(separator + 1); return token; } std::optional extract_hostname_from_input_line(std::string_view line, InputMode input_mode) { auto remaining = line; switch (input_mode) { case InputMode::SyslogLegacy: if (consume_token(remaining).empty() || consume_token(remaining).empty() || consume_token(remaining).empty()) { return std::nullopt; } break; case InputMode::JournalctlShortFull: if (consume_token(remaining).empty() || consume_token(remaining).empty() || consume_token(remaining).empty() || consume_token(remaining).empty()) { return std::nullopt; } break; default: return std::nullopt; } const auto hostname = consume_token(remaining); if (hostname.empty()) { return std::nullopt; } return std::string(hostname); } std::unordered_map load_hostnames_by_line(const ReportData& data) { std::unordered_map hostnames_by_line; if (data.warnings.empty()) { return hostnames_by_line; } std::ifstream input(data.input_path); if (!input) { return hostnames_by_line; } std::string line; std::size_t line_number = 0; while (std::getline(input, line)) { ++line_number; const auto hostname = extract_hostname_from_input_line(line, data.parse_metadata.input_mode); if (hostname.has_value()) { hostnames_by_line.emplace(line_number, *hostname); } } return hostnames_by_line; } bool is_matching_finding_signal(const Finding& finding, const AuthSignal& signal) { if (signal.timestamp < finding.first_seen || signal.timestamp > finding.last_seen) { return false; } switch (finding.type) { case FindingType::BruteForce: return signal.counts_as_terminal_auth_failure && signal.source_ip == finding.subject; case FindingType::MultiUserProbing: if (!signal.counts_as_attempt_evidence || signal.source_ip != finding.subject) { return false; } if (finding.usernames.empty()) { return true; } return std::find( finding.usernames.begin(), finding.usernames.end(), signal.username) != finding.usernames.end(); case FindingType::SudoBurst: return signal.counts_as_sudo_burst_evidence && signal.username == finding.subject; default: return false; } } std::vector build_host_summaries(const ReportData& data) { std::unordered_map summaries_by_host; for (const auto& event : data.events) { if (event.hostname.empty()) { continue; } auto& summary = summaries_by_host[event.hostname]; summary.hostname = event.hostname; ++summary.parsed_event_count; } const auto hostnames_by_line = load_hostnames_by_line(data); for (const auto& warning : data.warnings) { const auto hostname_it = hostnames_by_line.find(warning.line_number); if (hostname_it == hostnames_by_line.end() || hostname_it->second.empty()) { continue; } auto& summary = summaries_by_host[hostname_it->second]; summary.hostname = hostname_it->second; ++summary.warning_count; } if (summaries_by_host.size() <= 1) { return {}; } std::unordered_map hostname_by_event_line; hostname_by_event_line.reserve(data.events.size()); std::unordered_map> events_by_host; events_by_host.reserve(summaries_by_host.size()); for (const auto& event : data.events) { hostname_by_event_line.emplace(event.line_number, event.hostname); events_by_host[event.hostname].push_back(event); } const auto signals = build_auth_signals(data.events, data.auth_signal_mappings); for (const auto& finding : data.findings) { std::unordered_set matching_hosts; for (const auto& signal : signals) { if (!is_matching_finding_signal(finding, signal)) { continue; } const auto hostname_it = hostname_by_event_line.find(signal.line_number); if (hostname_it == hostname_by_event_line.end() || hostname_it->second.empty()) { continue; } matching_hosts.insert(hostname_it->second); } for (const auto& hostname : matching_hosts) { ++summaries_by_host[hostname].finding_count; } } std::vector summaries; summaries.reserve(summaries_by_host.size()); for (auto& [hostname, summary] : summaries_by_host) { const auto events_it = events_by_host.find(hostname); if (events_it != events_by_host.end()) { summary.event_counts = build_event_counts(events_it->second); } summaries.push_back(std::move(summary)); } std::sort(summaries.begin(), summaries.end(), [](const HostSummary& left, const HostSummary& right) { return left.hostname < right.hostname; }); return summaries; } } // namespace std::string render_markdown_report(const ReportData& data) { std::ostringstream output; const auto findings = sorted_findings(data.findings); const auto warnings = sorted_warnings(data.warnings); const auto event_counts = build_event_counts(data.events); const auto host_summaries = build_host_summaries(data); output << "# LogLens Report\n\n"; output << "## Summary\n\n"; output << "- Input: `" << data.input_path.generic_string() << "`\n"; output << "- Input mode: " << to_string(data.parse_metadata.input_mode) << '\n'; if (data.parse_metadata.assume_year.has_value()) { output << "- Assume year: " << *data.parse_metadata.assume_year << '\n'; } output << "- Timezone present: " << (data.parse_metadata.timezone_present ? "true" : "false") << '\n'; output << "- Total lines: " << data.parser_quality.total_lines << '\n'; output << "- Parsed lines: " << data.parser_quality.parsed_lines << '\n'; output << "- Unparsed lines: " << data.parser_quality.unparsed_lines << '\n'; output << "- Parse success rate: " << format_parse_success_percent(data.parser_quality.parse_success_rate) << '\n'; output << "- Parsed events: " << data.events.size() << '\n'; output << "- Findings: " << findings.size() << '\n'; output << "- Parser warnings: " << warnings.size() << "\n\n"; if (!host_summaries.empty()) { output << "## Host Summary\n\n"; output << "| Host | Parsed Events | Findings | Warnings |\n"; output << "| --- | ---: | ---: | ---: |\n"; for (const auto& summary : host_summaries) { output << "| " << summary.hostname << " | " << summary.parsed_event_count << " | " << summary.finding_count << " | " << summary.warning_count << " |\n"; } output << '\n'; } output << "## Findings\n\n"; if (findings.empty()) { output << "No configured detections matched the analyzed events.\n\n"; } else { output << "| Rule | Subject | Count | Window | Notes |\n"; output << "| --- | --- | ---: | --- | --- |\n"; for (const auto& finding : findings) { output << "| " << to_string(finding.type) << " | " << finding.subject << " | " << finding.event_count << " | " << format_timestamp(finding.first_seen) << " -> " << format_timestamp(finding.last_seen) << " | " << usernames_note(finding) << " |\n"; } output << '\n'; } output << "## Event Counts\n\n"; output << "| Event Type | Count |\n"; output << "| --- | ---: |\n"; for (const auto& [type, count] : event_counts) { output << "| " << to_string(type) << " | " << count << " |\n"; } output << '\n'; output << "## Parser Quality\n\n"; if (data.parser_quality.top_unknown_patterns.empty()) { output << "All analyzed lines matched a supported pattern.\n\n"; } else { output << "| Unknown Pattern | Count |\n"; output << "| --- | ---: |\n"; for (const auto& entry : data.parser_quality.top_unknown_patterns) { output << "| " << entry.pattern << " | " << entry.count << " |\n"; } output << '\n'; } output << "## Parser Warnings\n\n"; if (warnings.empty()) { output << "No malformed lines were skipped.\n"; } else { output << "| Line | Reason |\n"; output << "| ---: | --- |\n"; for (const auto& warning : warnings) { output << "| " << warning.line_number << " | " << warning.reason << " |\n"; } } return output.str(); } std::string render_json_report(const ReportData& data) { std::ostringstream output; const auto findings = sorted_findings(data.findings); const auto warnings = sorted_warnings(data.warnings); const auto event_counts = build_event_counts(data.events); const auto host_summaries = build_host_summaries(data); output << "{\n"; output << " \"tool\": \"LogLens\",\n"; output << " \"input\": \"" << escape_json(data.input_path.generic_string()) << "\",\n"; output << " \"input_mode\": \"" << to_string(data.parse_metadata.input_mode) << "\",\n"; if (data.parse_metadata.assume_year.has_value()) { output << " \"assume_year\": " << *data.parse_metadata.assume_year << ",\n"; } output << " \"timezone_present\": " << (data.parse_metadata.timezone_present ? "true" : "false") << ",\n"; output << " \"parser_quality\": {\n"; output << " \"total_lines\": " << data.parser_quality.total_lines << ",\n"; output << " \"parsed_lines\": " << data.parser_quality.parsed_lines << ",\n"; output << " \"unparsed_lines\": " << data.parser_quality.unparsed_lines << ",\n"; output << " \"parse_success_rate\": " << format_parse_success_rate(data.parser_quality.parse_success_rate) << ",\n"; output << " \"top_unknown_patterns\": [\n"; for (std::size_t index = 0; index < data.parser_quality.top_unknown_patterns.size(); ++index) { const auto& entry = data.parser_quality.top_unknown_patterns[index]; output << " {\"pattern\": \"" << escape_json(entry.pattern) << "\", \"count\": " << entry.count << "}"; output << (index + 1 == data.parser_quality.top_unknown_patterns.size() ? "\n" : ",\n"); } output << " ]\n"; output << " },\n"; output << " \"parsed_event_count\": " << data.events.size() << ",\n"; output << " \"warning_count\": " << warnings.size() << ",\n"; output << " \"finding_count\": " << findings.size() << ",\n"; output << " \"event_counts\": [\n"; for (std::size_t index = 0; index < event_counts.size(); ++index) { const auto& [type, count] = event_counts[index]; output << " {\"event_type\": \"" << to_string(type) << "\", \"count\": " << count << "}"; output << (index + 1 == event_counts.size() ? "\n" : ",\n"); } output << " ]"; if (!host_summaries.empty()) { output << ",\n"; output << " \"host_summaries\": [\n"; for (std::size_t host_index = 0; host_index < host_summaries.size(); ++host_index) { const auto& summary = host_summaries[host_index]; output << " {\n"; output << " \"hostname\": \"" << escape_json(summary.hostname) << "\",\n"; output << " \"parsed_event_count\": " << summary.parsed_event_count << ",\n"; output << " \"finding_count\": " << summary.finding_count << ",\n"; output << " \"warning_count\": " << summary.warning_count << ",\n"; output << " \"event_counts\": [\n"; for (std::size_t event_index = 0; event_index < summary.event_counts.size(); ++event_index) { const auto& [type, count] = summary.event_counts[event_index]; output << " {\"event_type\": \"" << to_string(type) << "\", \"count\": " << count << "}"; output << (event_index + 1 == summary.event_counts.size() ? "\n" : ",\n"); } output << " ]\n"; output << " }"; output << (host_index + 1 == host_summaries.size() ? "\n" : ",\n"); } output << " ],\n"; } else { output << ",\n"; } output << " \"findings\": [\n"; for (std::size_t index = 0; index < findings.size(); ++index) { const auto& finding = findings[index]; output << " {\n"; output << " \"rule\": \"" << to_string(finding.type) << "\",\n"; output << " \"subject_kind\": \"" << escape_json(finding.subject_kind) << "\",\n"; output << " \"subject\": \"" << escape_json(finding.subject) << "\",\n"; output << " \"event_count\": " << finding.event_count << ",\n"; output << " \"window_start\": \"" << format_timestamp(finding.first_seen) << "\",\n"; output << " \"window_end\": \"" << format_timestamp(finding.last_seen) << "\",\n"; output << " \"usernames\": ["; for (std::size_t name_index = 0; name_index < finding.usernames.size(); ++name_index) { output << '"' << escape_json(finding.usernames[name_index]) << '"'; if (name_index + 1 != finding.usernames.size()) { output << ", "; } } output << "],\n"; output << " \"summary\": \"" << escape_json(finding.summary) << "\"\n"; output << " }"; output << (index + 1 == findings.size() ? "\n" : ",\n"); } output << " ],\n"; output << " \"warnings\": [\n"; for (std::size_t index = 0; index < warnings.size(); ++index) { const auto& warning = warnings[index]; output << " {\"line_number\": " << warning.line_number << ", \"reason\": \"" << escape_json(warning.reason) << "\"}"; output << (index + 1 == warnings.size() ? "\n" : ",\n"); } output << " ]\n"; output << "}\n"; return output.str(); } std::string render_findings_csv(const ReportData& data) { std::ostringstream output; const auto findings = sorted_findings(data.findings); output << "rule,subject_kind,subject,event_count,window_start,window_end,usernames,summary\n"; for (const auto& finding : findings) { output << escape_csv(to_string(finding.type)) << ',' << escape_csv(finding.subject_kind) << ',' << escape_csv(finding.subject) << ',' << finding.event_count << ',' << escape_csv(format_timestamp(finding.first_seen)) << ',' << escape_csv(format_timestamp(finding.last_seen)) << ',' << escape_csv(usernames_csv_field(finding)) << ',' << escape_csv(finding.summary) << '\n'; } return output.str(); } std::string render_warnings_csv(const ReportData& data) { std::ostringstream output; const auto warnings = sorted_warnings(data.warnings); output << "kind,message\n"; for (const auto& warning : warnings) { output << "parse_warning," << escape_csv(warning.reason) << '\n'; } return output.str(); } void write_reports(const ReportData& data, const std::filesystem::path& output_directory, bool emit_csv) { std::filesystem::create_directories(output_directory); std::ofstream markdown_output(output_directory / "report.md"); markdown_output << render_markdown_report(data); std::ofstream json_output(output_directory / "report.json"); json_output << render_json_report(data); const auto findings_csv_path = output_directory / "findings.csv"; const auto warnings_csv_path = output_directory / "warnings.csv"; if (!emit_csv) { std::filesystem::remove(findings_csv_path); std::filesystem::remove(warnings_csv_path); return; } std::ofstream findings_csv_output(findings_csv_path); findings_csv_output << render_findings_csv(data); std::ofstream warnings_csv_output(warnings_csv_path); warnings_csv_output << render_warnings_csv(data); } } // namespace loglens \ No newline at end of file diff --git a/src/report.hpp b/src/report.hpp index 47d8368..73fa9fc 100644 --- a/src/report.hpp +++ b/src/report.hpp @@ -1,27 +1 @@ -#pragma once - -#include "signal.hpp" -#include "detector.hpp" -#include "parser.hpp" - -#include -#include -#include - -namespace loglens { - -struct ReportData { - std::filesystem::path input_path; - ParseMetadata parse_metadata; - ParserQualityMetrics parser_quality; - std::vector events; - std::vector findings; - std::vector warnings; - AuthSignalConfig auth_signal_mappings; -}; - -std::string render_markdown_report(const ReportData& data); -std::string render_json_report(const ReportData& data); -void write_reports(const ReportData& data, const std::filesystem::path& output_directory); - -} // namespace loglens +#pragma once #include "signal.hpp" #include "detector.hpp" #include "parser.hpp" #include #include #include namespace loglens { struct ReportData { std::filesystem::path input_path; ParseMetadata parse_metadata; ParserQualityMetrics parser_quality; std::vector events; std::vector findings; std::vector warnings; AuthSignalConfig auth_signal_mappings; }; std::string render_markdown_report(const ReportData& data); std::string render_json_report(const ReportData& data); std::string render_findings_csv(const ReportData& data); std::string render_warnings_csv(const ReportData& data); void write_reports(const ReportData& data, const std::filesystem::path& output_directory, bool emit_csv = false); } // namespace loglens \ No newline at end of file diff --git a/tests/fixtures/report_contracts/multi_host_syslog_legacy/findings.csv b/tests/fixtures/report_contracts/multi_host_syslog_legacy/findings.csv new file mode 100644 index 0000000..1a5f6c8 --- /dev/null +++ b/tests/fixtures/report_contracts/multi_host_syslog_legacy/findings.csv @@ -0,0 +1 @@ +rule,subject_kind,subject,event_count,window_start,window_end,usernames,summary brute_force,source_ip,203.0.113.10,5,2026-03-11 09:00:00,2026-03-11 09:04:05,,5 failed SSH attempts from 203.0.113.10 within 10 minutes. multi_user_probing,source_ip,203.0.113.10,5,2026-03-11 09:00:00,2026-03-11 09:04:05,admin;deploy;guest;root;test,203.0.113.10 targeted 5 usernames within 15 minutes. sudo_burst,username,alice,3,2026-03-11 09:11:00,2026-03-11 09:14:15,,alice ran 3 sudo commands within 5 minutes. \ No newline at end of file diff --git a/tests/fixtures/report_contracts/multi_host_syslog_legacy/warnings.csv b/tests/fixtures/report_contracts/multi_host_syslog_legacy/warnings.csv new file mode 100644 index 0000000..93ee2a3 --- /dev/null +++ b/tests/fixtures/report_contracts/multi_host_syslog_legacy/warnings.csv @@ -0,0 +1 @@ +kind,message parse_warning,unrecognized auth pattern: pam_sss_unknown_user parse_warning,unrecognized auth pattern: sshd_connection_closed_preauth parse_warning,unrecognized auth pattern: sshd_timeout_or_disconnection \ No newline at end of file diff --git a/tests/fixtures/report_contracts/syslog_legacy/findings.csv b/tests/fixtures/report_contracts/syslog_legacy/findings.csv new file mode 100644 index 0000000..a0b0760 --- /dev/null +++ b/tests/fixtures/report_contracts/syslog_legacy/findings.csv @@ -0,0 +1 @@ +rule,subject_kind,subject,event_count,window_start,window_end,usernames,summary brute_force,source_ip,203.0.113.10,5,2026-03-10 08:11:22,2026-03-10 08:18:05,,5 failed SSH attempts from 203.0.113.10 within 10 minutes. multi_user_probing,source_ip,203.0.113.10,5,2026-03-10 08:11:22,2026-03-10 08:18:05,admin;deploy;guest;root;test,203.0.113.10 targeted 5 usernames within 15 minutes. sudo_burst,username,alice,3,2026-03-10 08:21:00,2026-03-10 08:24:15,,alice ran 3 sudo commands within 5 minutes. \ No newline at end of file diff --git a/tests/fixtures/report_contracts/syslog_legacy/warnings.csv b/tests/fixtures/report_contracts/syslog_legacy/warnings.csv new file mode 100644 index 0000000..8774405 --- /dev/null +++ b/tests/fixtures/report_contracts/syslog_legacy/warnings.csv @@ -0,0 +1 @@ +kind,message parse_warning,unrecognized auth pattern: sshd_connection_closed_preauth parse_warning,unrecognized auth pattern: sshd_timeout_or_disconnection \ No newline at end of file diff --git a/tests/test_cli.cpp b/tests/test_cli.cpp index 0060f42..1bc5c67 100644 --- a/tests/test_cli.cpp +++ b/tests/test_cli.cpp @@ -1,198 +1 @@ -#include -#include -#include -#include -#include - -namespace { - -void expect(bool condition, const std::string& message) { - if (!condition) { - throw std::runtime_error(message); - } -} - -std::string read_file(const std::filesystem::path& path) { - std::ifstream input(path); - if (!input) { - throw std::runtime_error("unable to read file: " + path.string()); - } - - return std::string((std::istreambuf_iterator(input)), std::istreambuf_iterator()); -} - -std::string quote_argument(const std::filesystem::path& path) { - return "\"" + path.string() + "\""; -} - -std::string build_command(std::string invocation, - const std::filesystem::path* stdout_path = nullptr, - const std::filesystem::path* stderr_path = nullptr) { - if (stdout_path != nullptr) { - invocation += " 1>" + quote_argument(*stdout_path); - } - if (stderr_path != nullptr) { - invocation += " 2>" + quote_argument(*stderr_path); - } - -#ifdef _WIN32 - return "cmd /c \"" + invocation + "\""; -#else - return invocation; -#endif -} - -void expect_report_core_fields(const std::string& markdown, - const std::string& json, - const std::string& input_mode, - bool expect_assume_year, - bool timezone_present) { - expect(markdown.find("Input mode: " + input_mode) != std::string::npos, "expected markdown input mode"); - expect(markdown.find(std::string("Timezone present: ") + (timezone_present ? "true" : "false")) != std::string::npos, - "expected markdown timezone metadata"); - if (expect_assume_year) { - expect(markdown.find("Assume year: 2026") != std::string::npos, "expected markdown assume year"); - expect(json.find("\"assume_year\": 2026") != std::string::npos, "expected json assume_year"); - } else { - expect(markdown.find("Assume year:") == std::string::npos, "did not expect markdown assume year"); - expect(json.find("\"assume_year\":") == std::string::npos, "did not expect json assume_year"); - } - - expect(markdown.find("Total lines: 16") != std::string::npos, "expected markdown total line count"); - expect(markdown.find("Parsed lines: 14") != std::string::npos, "expected markdown parsed line count"); - expect(markdown.find("Unparsed lines: 2") != std::string::npos, "expected markdown unparsed line count"); - expect(markdown.find("Parse success rate: 87.50%") != std::string::npos, "expected markdown parse success rate"); - expect(markdown.find("Parsed events: 14") != std::string::npos, "expected markdown parsed event count"); - expect(markdown.find("Findings: 3") != std::string::npos, "expected markdown finding count"); - expect(markdown.find("Parser warnings: 2") != std::string::npos, "expected markdown warning count"); - expect(markdown.find("| sshd_connection_closed_preauth | 1 |") != std::string::npos, - "expected markdown unknown connection-close pattern"); - expect(markdown.find("| sshd_timeout_or_disconnection | 1 |") != std::string::npos, - "expected markdown unknown timeout pattern"); - expect(json.find("\"total_lines\": 16") != std::string::npos, "expected json total line count"); - expect(json.find("\"parsed_lines\": 14") != std::string::npos, "expected json parsed line count"); - expect(json.find("\"unparsed_lines\": 2") != std::string::npos, "expected json unparsed line count"); - expect(json.find("\"parse_success_rate\": 0.8750") != std::string::npos, "expected json parse success rate"); - expect(json.find("\"parsed_event_count\": 14") != std::string::npos, "expected json parsed event count"); - expect(json.find("\"finding_count\": 3") != std::string::npos, "expected json finding count"); - expect(json.find("\"warning_count\": 2") != std::string::npos, "expected json warning count"); - expect(json.find("\"input_mode\": \"" + input_mode + "\"") != std::string::npos, "expected json input mode"); - expect(json.find(std::string("\"timezone_present\": ") + (timezone_present ? "true" : "false")) != std::string::npos, - "expected json timezone metadata"); - expect(json.find("\"pattern\": \"sshd_connection_closed_preauth\"") != std::string::npos, - "expected json unknown connection-close pattern"); - expect(json.find("\"pattern\": \"sshd_timeout_or_disconnection\"") != std::string::npos, - "expected json unknown timeout pattern"); -} - -} // namespace - -int main(int argc, char* argv[]) { - if (argc != 5) { - throw std::runtime_error("expected arguments: "); - } - - const std::filesystem::path loglens_exe = std::filesystem::absolute(argv[1]); - const std::filesystem::path sample_log = std::filesystem::absolute(argv[2]); - const std::filesystem::path sample_config = std::filesystem::absolute(argv[3]); - const std::filesystem::path output_dir = std::filesystem::absolute(argv[4]); - const std::filesystem::path asset_dir = sample_log.parent_path(); - const std::filesystem::path journalctl_log = asset_dir / "sample_journalctl_short_full.log"; - - std::filesystem::remove_all(output_dir); - std::filesystem::create_directories(output_dir); - - const auto syslog_cli_out = output_dir / "syslog_cli"; - std::filesystem::create_directories(syslog_cli_out); - const int syslog_cli_exit = std::system(build_command( - quote_argument(loglens_exe) - + " --mode syslog --year 2026 " - + quote_argument(sample_log) - + " " + quote_argument(syslog_cli_out)) - .c_str()); - expect(syslog_cli_exit == 0, "expected syslog CLI run with --year to succeed"); - - const auto syslog_markdown = read_file(syslog_cli_out / "report.md"); - const auto syslog_json = read_file(syslog_cli_out / "report.json"); - expect_report_core_fields(syslog_markdown, syslog_json, "syslog_legacy", true, false); - - const auto config_run_out = output_dir / "config_run"; - std::filesystem::create_directories(config_run_out); - const int config_run_exit = std::system(build_command( - quote_argument(loglens_exe) - + " --config " + quote_argument(sample_config) - + " " + quote_argument(sample_log) - + " " + quote_argument(config_run_out)) - .c_str()); - expect(config_run_exit == 0, "expected sample config run to succeed"); - - const auto journalctl_out = output_dir / "journalctl_cli"; - std::filesystem::create_directories(journalctl_out); - const int journalctl_exit = std::system(build_command( - quote_argument(loglens_exe) - + " --mode journalctl-short-full " - + quote_argument(journalctl_log) - + " " + quote_argument(journalctl_out)) - .c_str()); - expect(journalctl_exit == 0, "expected journalctl short-full CLI run to succeed"); - - const auto journalctl_markdown = read_file(journalctl_out / "report.md"); - const auto journalctl_json = read_file(journalctl_out / "report.json"); - expect_report_core_fields(journalctl_markdown, journalctl_json, "journalctl_short_full", false, true); - - const auto missing_year_out = output_dir / "missing_year"; - std::filesystem::create_directories(missing_year_out); - const auto missing_year_stdout = output_dir / "missing_year_stdout.txt"; - const auto missing_year_stderr = output_dir / "missing_year_stderr.txt"; - const int missing_year_exit = std::system(build_command( - quote_argument(loglens_exe) - + " --mode syslog " - + quote_argument(sample_log) - + " " + quote_argument(missing_year_out), - &missing_year_stdout, - &missing_year_stderr) - .c_str()); - expect(missing_year_exit != 0, "expected syslog mode without year to fail"); - const auto missing_year_error = read_file(missing_year_stderr); - expect(missing_year_error.find("--year") != std::string::npos - || missing_year_error.find("assume_year") != std::string::npos, - "expected missing-year error to mention year requirements"); - - const auto invalid_config = output_dir / "invalid_config.json"; - { - std::ofstream output(invalid_config); - output << "{\n" - << " \"input_mode\": \"syslog_legacy\",\n" - << " \"timestamp\": { \"assume_year\": \"bad\" },\n" - << " \"brute_force\": { \"threshold\": 5, \"window_minutes\": 10 },\n" - << " \"multi_user_probing\": { \"threshold\": 3, \"window_minutes\": 15 },\n" - << " \"sudo_burst\": { \"threshold\": 3, \"window_minutes\": 5 },\n" - << " \"auth_signal_mappings\": {\n" - << " \"ssh_failed_password\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": true },\n" - << " \"ssh_invalid_user\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": true },\n" - << " \"ssh_failed_publickey\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": true },\n" - << " \"pam_auth_failure\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": false }\n" - << " }\n" - << "}\n"; - } - - const auto invalid_out = output_dir / "invalid_config_run"; - std::filesystem::create_directories(invalid_out); - const auto invalid_stdout = output_dir / "invalid_stdout.txt"; - const auto invalid_stderr = output_dir / "invalid_stderr.txt"; - const int invalid_exit = std::system(build_command( - quote_argument(loglens_exe) - + " --config " + quote_argument(invalid_config) - + " " + quote_argument(sample_log) - + " " + quote_argument(invalid_out), - &invalid_stdout, - &invalid_stderr) - .c_str()); - expect(invalid_exit != 0, "expected invalid config CLI run to fail"); - - const auto invalid_error = read_file(invalid_stderr); - expect(invalid_error.find("assume_year") != std::string::npos, - "expected CLI error output to mention the failing config field"); - - return 0; -} +#include #include #include #include #include namespace { void expect(bool condition, const std::string& message) { if (!condition) { throw std::runtime_error(message); } } std::string read_file(const std::filesystem::path& path) { std::ifstream input(path); if (!input) { throw std::runtime_error("unable to read file: " + path.string()); } return std::string((std::istreambuf_iterator(input)), std::istreambuf_iterator()); } std::string quote_argument(const std::filesystem::path& path) { return "\"" + path.string() + "\""; } std::string build_command(std::string invocation, const std::filesystem::path* stdout_path = nullptr, const std::filesystem::path* stderr_path = nullptr) { if (stdout_path != nullptr) { invocation += " 1>" + quote_argument(*stdout_path); } if (stderr_path != nullptr) { invocation += " 2>" + quote_argument(*stderr_path); } #ifdef _WIN32 return "cmd /c \"" + invocation + "\""; #else return invocation; #endif } void expect_report_core_fields(const std::string& markdown, const std::string& json, const std::string& input_mode, bool expect_assume_year, bool timezone_present) { expect(markdown.find("Input mode: " + input_mode) != std::string::npos, "expected markdown input mode"); expect(markdown.find(std::string("Timezone present: ") + (timezone_present ? "true" : "false")) != std::string::npos, "expected markdown timezone metadata"); if (expect_assume_year) { expect(markdown.find("Assume year: 2026") != std::string::npos, "expected markdown assume year"); expect(json.find("\"assume_year\": 2026") != std::string::npos, "expected json assume_year"); } else { expect(markdown.find("Assume year:") == std::string::npos, "did not expect markdown assume year"); expect(json.find("\"assume_year\":") == std::string::npos, "did not expect json assume_year"); } expect(markdown.find("Total lines: 16") != std::string::npos, "expected markdown total line count"); expect(markdown.find("Parsed lines: 14") != std::string::npos, "expected markdown parsed line count"); expect(markdown.find("Unparsed lines: 2") != std::string::npos, "expected markdown unparsed line count"); expect(markdown.find("Parse success rate: 87.50%") != std::string::npos, "expected markdown parse success rate"); expect(markdown.find("Parsed events: 14") != std::string::npos, "expected markdown parsed event count"); expect(markdown.find("Findings: 3") != std::string::npos, "expected markdown finding count"); expect(markdown.find("Parser warnings: 2") != std::string::npos, "expected markdown warning count"); expect(markdown.find("| sshd_connection_closed_preauth | 1 |") != std::string::npos, "expected markdown unknown connection-close pattern"); expect(markdown.find("| sshd_timeout_or_disconnection | 1 |") != std::string::npos, "expected markdown unknown timeout pattern"); expect(json.find("\"total_lines\": 16") != std::string::npos, "expected json total line count"); expect(json.find("\"parsed_lines\": 14") != std::string::npos, "expected json parsed line count"); expect(json.find("\"unparsed_lines\": 2") != std::string::npos, "expected json unparsed line count"); expect(json.find("\"parse_success_rate\": 0.8750") != std::string::npos, "expected json parse success rate"); expect(json.find("\"parsed_event_count\": 14") != std::string::npos, "expected json parsed event count"); expect(json.find("\"finding_count\": 3") != std::string::npos, "expected json finding count"); expect(json.find("\"warning_count\": 2") != std::string::npos, "expected json warning count"); expect(json.find("\"input_mode\": \"" + input_mode + "\"") != std::string::npos, "expected json input mode"); expect(json.find(std::string("\"timezone_present\": ") + (timezone_present ? "true" : "false")) != std::string::npos, "expected json timezone metadata"); expect(json.find("\"pattern\": \"sshd_connection_closed_preauth\"") != std::string::npos, "expected json unknown connection-close pattern"); expect(json.find("\"pattern\": \"sshd_timeout_or_disconnection\"") != std::string::npos, "expected json unknown timeout pattern"); } } // namespace int main(int argc, char* argv[]) { if (argc != 5) { throw std::runtime_error("expected arguments: "); } const std::filesystem::path loglens_exe = std::filesystem::absolute(argv[1]); const std::filesystem::path sample_log = std::filesystem::absolute(argv[2]); const std::filesystem::path sample_config = std::filesystem::absolute(argv[3]); const std::filesystem::path output_dir = std::filesystem::absolute(argv[4]); const std::filesystem::path asset_dir = sample_log.parent_path(); const std::filesystem::path journalctl_log = asset_dir / "sample_journalctl_short_full.log"; std::filesystem::remove_all(output_dir); std::filesystem::create_directories(output_dir); const auto syslog_cli_out = output_dir / "syslog_cli"; std::filesystem::create_directories(syslog_cli_out); const int syslog_cli_exit = std::system(build_command( quote_argument(loglens_exe) + " --mode syslog --year 2026 " + quote_argument(sample_log) + " " + quote_argument(syslog_cli_out)) .c_str()); expect(syslog_cli_exit == 0, "expected syslog CLI run with --year to succeed"); const auto syslog_markdown = read_file(syslog_cli_out / "report.md"); const auto syslog_json = read_file(syslog_cli_out / "report.json"); expect_report_core_fields(syslog_markdown, syslog_json, "syslog_legacy", true, false); expect(!std::filesystem::exists(syslog_cli_out / "findings.csv"), "did not expect findings.csv without explicit csv flag"); expect(!std::filesystem::exists(syslog_cli_out / "warnings.csv"), "did not expect warnings.csv without explicit csv flag"); const auto csv_out = output_dir / "csv_run"; std::filesystem::create_directories(csv_out); const int csv_exit = std::system(build_command( quote_argument(loglens_exe) + " --mode syslog --year 2026 --csv " + quote_argument(sample_log) + " " + quote_argument(csv_out)) .c_str()); expect(csv_exit == 0, "expected syslog CSV CLI run to succeed"); const auto findings_csv = read_file(csv_out / "findings.csv"); const auto warnings_csv = read_file(csv_out / "warnings.csv"); expect(findings_csv.find("rule,subject_kind,subject,event_count,window_start,window_end,usernames,summary") == 0, "expected findings csv header"); expect(findings_csv.find("brute_force,source_ip,203.0.113.10,5,2026-03-10 08:11:22,2026-03-10 08:18:05,,5 failed SSH attempts from 203.0.113.10 within 10 minutes.") != std::string::npos, "expected brute-force findings csv row"); expect(warnings_csv.find("kind,message") == 0, "expected warnings csv header"); expect(warnings_csv.find("parse_warning,unrecognized auth pattern: sshd_connection_closed_preauth") != std::string::npos, "expected warning csv row"); const auto config_run_out = output_dir / "config_run"; std::filesystem::create_directories(config_run_out); const int config_run_exit = std::system(build_command( quote_argument(loglens_exe) + " --config " + quote_argument(sample_config) + " " + quote_argument(sample_log) + " " + quote_argument(config_run_out)) .c_str()); expect(config_run_exit == 0, "expected sample config run to succeed"); const auto journalctl_out = output_dir / "journalctl_cli"; std::filesystem::create_directories(journalctl_out); const int journalctl_exit = std::system(build_command( quote_argument(loglens_exe) + " --mode journalctl-short-full " + quote_argument(journalctl_log) + " " + quote_argument(journalctl_out)) .c_str()); expect(journalctl_exit == 0, "expected journalctl short-full CLI run to succeed"); const auto journalctl_markdown = read_file(journalctl_out / "report.md"); const auto journalctl_json = read_file(journalctl_out / "report.json"); expect_report_core_fields(journalctl_markdown, journalctl_json, "journalctl_short_full", false, true); expect(!std::filesystem::exists(journalctl_out / "findings.csv"), "did not expect journalctl findings.csv without explicit csv flag"); expect(!std::filesystem::exists(journalctl_out / "warnings.csv"), "did not expect journalctl warnings.csv without explicit csv flag"); const auto missing_year_out = output_dir / "missing_year"; std::filesystem::create_directories(missing_year_out); const int missing_year_exit = std::system(build_command( quote_argument(loglens_exe) + " --mode syslog " + quote_argument(sample_log) + " " + quote_argument(missing_year_out)) .c_str()); expect(missing_year_exit != 0, "expected syslog mode without year to fail"); const auto invalid_config = output_dir / "invalid_config.json"; { std::ofstream output(invalid_config); output << "{\n" << " \"input_mode\": \"syslog_legacy\",\n" << " \"timestamp\": { \"assume_year\": \"bad\" },\n" << " \"brute_force\": { \"threshold\": 5, \"window_minutes\": 10 },\n" << " \"multi_user_probing\": { \"threshold\": 3, \"window_minutes\": 15 },\n" << " \"sudo_burst\": { \"threshold\": 3, \"window_minutes\": 5 },\n" << " \"auth_signal_mappings\": {\n" << " \"ssh_failed_password\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": true },\n" << " \"ssh_invalid_user\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": true },\n" << " \"ssh_failed_publickey\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": true },\n" << " \"pam_auth_failure\": { \"counts_as_attempt_evidence\": true, \"counts_as_terminal_auth_failure\": false }\n" << " }\n" << "}\n"; } const auto invalid_out = output_dir / "invalid_config_run"; std::filesystem::create_directories(invalid_out); const int invalid_exit = std::system(build_command( quote_argument(loglens_exe) + " --config " + quote_argument(invalid_config) + " " + quote_argument(sample_log) + " " + quote_argument(invalid_out)) .c_str()); expect(invalid_exit != 0, "expected invalid config CLI run to fail"); return 0; } \ No newline at end of file diff --git a/tests/test_report_contracts.cpp b/tests/test_report_contracts.cpp index 8d8f9ce..79976a1 100644 --- a/tests/test_report_contracts.cpp +++ b/tests/test_report_contracts.cpp @@ -1,284 +1 @@ -#include -#include -#include -#include -#include -#include -#include -#include -#include - -namespace { - -void expect(bool condition, const std::string& message) { - if (!condition) { - throw std::runtime_error(message); - } -} - -std::filesystem::path repo_root() { - const std::filesystem::path source_path{__FILE__}; - std::vector candidates; - - if (source_path.is_absolute()) { - candidates.push_back(source_path); - } else { - const auto cwd = std::filesystem::current_path(); - candidates.push_back(cwd / source_path); - candidates.push_back(cwd.parent_path() / source_path); - } - - for (const auto& candidate : candidates) { - if (std::filesystem::exists(candidate)) { - return candidate.parent_path().parent_path(); - } - } - - throw std::runtime_error("unable to resolve repository root from test source path"); -} - -std::string read_file(const std::filesystem::path& path) { - std::ifstream input(path); - if (!input) { - throw std::runtime_error("unable to read file: " + path.string()); - } - - return std::string((std::istreambuf_iterator(input)), std::istreambuf_iterator()); -} - -std::string normalize_line_endings(std::string value) { - value.erase(std::remove(value.begin(), value.end(), '\r'), value.end()); - return value; -} - -std::vector split_lines(const std::string& content) { - std::vector lines; - std::string current; - - for (const char ch : normalize_line_endings(content)) { - if (ch == '\n') { - lines.push_back(current); - current.clear(); - } else { - current += ch; - } - } - - if (!current.empty()) { - lines.push_back(current); - } - - return lines; -} - -std::string trim(std::string_view value) { - std::size_t start = 0; - while (start < value.size() && (value[start] == ' ' || value[start] == '\t')) { - ++start; - } - - std::size_t end = value.size(); - while (end > start && (value[end - 1] == ' ' || value[end - 1] == '\t')) { - --end; - } - - return std::string(value.substr(start, end - start)); -} - -bool starts_with(std::string_view value, std::string_view prefix) { - return value.size() >= prefix.size() && value.substr(0, prefix.size()) == prefix; -} - -bool is_markdown_separator_row(std::string_view line) { - return starts_with(line, "| ---"); -} - -std::vector extract_markdown_contract_lines(const std::string& markdown) { - std::vector contract_lines; - - for (const auto& raw_line : split_lines(markdown)) { - const auto line = trim(raw_line); - if (line.empty() || is_markdown_separator_row(line)) { - continue; - } - - if (line == "# LogLens Report" - || starts_with(line, "## ") - || starts_with(line, "- Input: ") - || starts_with(line, "- Input mode: ") - || starts_with(line, "- Assume year: ") - || starts_with(line, "- Timezone present: ") - || starts_with(line, "- Total lines: ") - || starts_with(line, "- Parsed lines: ") - || starts_with(line, "- Unparsed lines: ") - || starts_with(line, "- Parse success rate: ") - || starts_with(line, "- Parsed events: ") - || starts_with(line, "- Findings: ") - || starts_with(line, "- Parser warnings: ") - || starts_with(line, "| ") - || starts_with(line, "No configured detections matched") - || starts_with(line, "All analyzed lines matched") - || starts_with(line, "No malformed lines were skipped")) { - contract_lines.push_back(line); - } - } - - return contract_lines; -} - -std::vector extract_json_contract_lines(const std::string& json) { - std::vector contract_lines; - - for (const auto& raw_line : split_lines(json)) { - const auto line = trim(raw_line); - if (line.empty()) { - continue; - } - - if (starts_with(line, "\"tool\": ") - || starts_with(line, "\"input\": ") - || starts_with(line, "\"input_mode\": ") - || starts_with(line, "\"assume_year\": ") - || starts_with(line, "\"timezone_present\": ") - || starts_with(line, "\"total_lines\": ") - || starts_with(line, "\"parsed_lines\": ") - || starts_with(line, "\"unparsed_lines\": ") - || starts_with(line, "\"parse_success_rate\": ") - || starts_with(line, "\"parsed_event_count\": ") - || starts_with(line, "\"warning_count\": ") - || starts_with(line, "\"finding_count\": ") - || starts_with(line, "\"host_summaries\": ") - || starts_with(line, "\"hostname\": ") - || starts_with(line, "{\"pattern\": ") - || starts_with(line, "{\"event_type\": ") - || starts_with(line, "\"rule\": ") - || starts_with(line, "\"subject_kind\": ") - || starts_with(line, "\"subject\": ") - || starts_with(line, "\"event_count\": ") - || starts_with(line, "\"window_start\": ") - || starts_with(line, "\"window_end\": ") - || starts_with(line, "\"usernames\": ") - || starts_with(line, "\"summary\": ") - || starts_with(line, "{\"line_number\": ")) { - contract_lines.push_back(line); - } - } - - return contract_lines; -} - -std::string quote_argument(std::string_view value) { - return "\"" + std::string(value) + "\""; -} - -std::string build_command(const std::string& invocation) { -#ifdef _WIN32 - return "cmd /c \"" + invocation + "\""; -#else - return invocation; -#endif -} - -void expect_equal_lines(const std::vector& actual, - const std::vector& expected, - const std::string& message) { - if (actual == expected) { - return; - } - - std::string details = message + "\nexpected:\n"; - for (const auto& line : expected) { - details += " " + line + '\n'; - } - details += "actual:\n"; - for (const auto& line : actual) { - details += " " + line + '\n'; - } - - throw std::runtime_error(details); -} - -void run_report_contract_case(const std::filesystem::path& loglens_exe, - const std::filesystem::path& fixture_directory, - const std::filesystem::path& output_root, - const std::string& mode_argument, - const std::string& extra_arguments = {}) { - const auto repo = repo_root(); - const auto relative_input = std::filesystem::relative(fixture_directory / "input.log", repo).generic_string(); - const auto case_output = output_root / fixture_directory.filename(); - - std::filesystem::remove_all(case_output); - std::filesystem::create_directories(case_output); - - std::string invocation = quote_argument(loglens_exe.generic_string()) - + " --mode " + mode_argument; - if (!extra_arguments.empty()) { - invocation += " " + extra_arguments; - } - invocation += " " + quote_argument(relative_input) - + " " + quote_argument(case_output.generic_string()); - - const int exit_code = std::system(build_command(invocation).c_str()); - expect(exit_code == 0, "expected report contract CLI run to succeed for " + fixture_directory.filename().string()); - - const auto actual_markdown = read_file(case_output / "report.md"); - const auto actual_json = read_file(case_output / "report.json"); - const auto golden_markdown = read_file(fixture_directory / "report.md"); - const auto golden_json = read_file(fixture_directory / "report.json"); - - expect_equal_lines( - extract_markdown_contract_lines(actual_markdown), - extract_markdown_contract_lines(golden_markdown), - "markdown contract mismatch for " + fixture_directory.filename().string()); - expect_equal_lines( - extract_json_contract_lines(actual_json), - extract_json_contract_lines(golden_json), - "json contract mismatch for " + fixture_directory.filename().string()); -} - -} // namespace - -int main(int argc, char* argv[]) { - if (argc != 3) { - throw std::runtime_error("expected arguments: "); - } - - const auto original_cwd = std::filesystem::current_path(); - const auto repo = repo_root(); - std::filesystem::current_path(repo); - - try { - const std::filesystem::path loglens_exe = std::filesystem::absolute(argv[1]); - const std::filesystem::path output_root = std::filesystem::absolute(argv[2]); - const auto fixture_root = repo / "tests" / "fixtures" / "report_contracts"; - - run_report_contract_case( - loglens_exe, - fixture_root / "syslog_legacy", - output_root, - "syslog", - "--year 2026"); - run_report_contract_case( - loglens_exe, - fixture_root / "journalctl_short_full", - output_root, - "journalctl-short-full"); - run_report_contract_case( - loglens_exe, - fixture_root / "multi_host_syslog_legacy", - output_root, - "syslog", - "--year 2026"); - run_report_contract_case( - loglens_exe, - fixture_root / "multi_host_journalctl_short_full", - output_root, - "journalctl-short-full"); - } catch (...) { - std::filesystem::current_path(original_cwd); - throw; - } - - std::filesystem::current_path(original_cwd); - return 0; -} +#include #include #include #include #include #include #include #include #include namespace { void expect(bool condition, const std::string& message) { if (!condition) { throw std::runtime_error(message); } } std::filesystem::path repo_root() { const std::filesystem::path source_path{__FILE__}; std::vector candidates; if (source_path.is_absolute()) { candidates.push_back(source_path); } else { const auto cwd = std::filesystem::current_path(); candidates.push_back(cwd / source_path); candidates.push_back(cwd.parent_path() / source_path); } for (const auto& candidate : candidates) { if (std::filesystem::exists(candidate)) { return candidate.parent_path().parent_path(); } } throw std::runtime_error("unable to resolve repository root from test source path"); } std::string read_file(const std::filesystem::path& path) { std::ifstream input(path); if (!input) { throw std::runtime_error("unable to read file: " + path.string()); } return std::string((std::istreambuf_iterator(input)), std::istreambuf_iterator()); } std::string normalize_line_endings(std::string value) { value.erase(std::remove(value.begin(), value.end(), '\r'), value.end()); return value; } std::vector split_lines(const std::string& content) { std::vector lines; std::string current; for (const char ch : normalize_line_endings(content)) { if (ch == '\n') { lines.push_back(current); current.clear(); } else { current += ch; } } if (!current.empty()) { lines.push_back(current); } return lines; } std::string trim(std::string_view value) { std::size_t start = 0; while (start < value.size() && (value[start] == ' ' || value[start] == '\t')) { ++start; } std::size_t end = value.size(); while (end > start && (value[end - 1] == ' ' || value[end - 1] == '\t')) { --end; } return std::string(value.substr(start, end - start)); } bool starts_with(std::string_view value, std::string_view prefix) { return value.size() >= prefix.size() && value.substr(0, prefix.size()) == prefix; } bool is_markdown_separator_row(std::string_view line) { return starts_with(line, "| ---"); } std::vector extract_markdown_contract_lines(const std::string& markdown) { std::vector contract_lines; for (const auto& raw_line : split_lines(markdown)) { const auto line = trim(raw_line); if (line.empty() || is_markdown_separator_row(line)) { continue; } if (line == "# LogLens Report" || starts_with(line, "## ") || starts_with(line, "- Input: ") || starts_with(line, "- Input mode: ") || starts_with(line, "- Assume year: ") || starts_with(line, "- Timezone present: ") || starts_with(line, "- Total lines: ") || starts_with(line, "- Parsed lines: ") || starts_with(line, "- Unparsed lines: ") || starts_with(line, "- Parse success rate: ") || starts_with(line, "- Parsed events: ") || starts_with(line, "- Findings: ") || starts_with(line, "- Parser warnings: ") || starts_with(line, "| ") || starts_with(line, "No configured detections matched") || starts_with(line, "All analyzed lines matched") || starts_with(line, "No malformed lines were skipped")) { contract_lines.push_back(line); } } return contract_lines; } std::vector extract_json_contract_lines(const std::string& json) { std::vector contract_lines; for (const auto& raw_line : split_lines(json)) { const auto line = trim(raw_line); if (line.empty()) { continue; } if (starts_with(line, "\"tool\": ") || starts_with(line, "\"input\": ") || starts_with(line, "\"input_mode\": ") || starts_with(line, "\"assume_year\": ") || starts_with(line, "\"timezone_present\": ") || starts_with(line, "\"total_lines\": ") || starts_with(line, "\"parsed_lines\": ") || starts_with(line, "\"unparsed_lines\": ") || starts_with(line, "\"parse_success_rate\": ") || starts_with(line, "\"parsed_event_count\": ") || starts_with(line, "\"warning_count\": ") || starts_with(line, "\"finding_count\": ") || starts_with(line, "\"host_summaries\": ") || starts_with(line, "\"hostname\": ") || starts_with(line, "{\"pattern\": ") || starts_with(line, "{\"event_type\": ") || starts_with(line, "\"rule\": ") || starts_with(line, "\"subject_kind\": ") || starts_with(line, "\"subject\": ") || starts_with(line, "\"event_count\": ") || starts_with(line, "\"window_start\": ") || starts_with(line, "\"window_end\": ") || starts_with(line, "\"usernames\": ") || starts_with(line, "\"summary\": ") || starts_with(line, "{\"line_number\": ")) { contract_lines.push_back(line); } } return contract_lines; } std::vector extract_csv_contract_lines(const std::string& csv) { std::vector lines; for (const auto& raw_line : split_lines(csv)) { if (!raw_line.empty()) { lines.push_back(raw_line); } } return lines; } std::string quote_argument(std::string_view value) { return "\"" + std::string(value) + "\""; } std::string build_command(const std::string& invocation) { #ifdef _WIN32 return "cmd /c \"" + invocation + "\""; #else return invocation; #endif } void expect_equal_lines(const std::vector& actual, const std::vector& expected, const std::string& message) { if (actual == expected) { return; } std::string details = message + "\nexpected:\n"; for (const auto& line : expected) { details += " " + line + '\n'; } details += "actual:\n"; for (const auto& line : actual) { details += " " + line + '\n'; } throw std::runtime_error(details); } void run_report_contract_case(const std::filesystem::path& loglens_exe, const std::filesystem::path& fixture_directory, const std::filesystem::path& output_root, const std::string& mode_argument, const std::string& extra_arguments = {}, bool expect_csv = false) { const auto repo = repo_root(); const auto relative_input = std::filesystem::relative(fixture_directory / "input.log", repo).generic_string(); const auto case_output = output_root / fixture_directory.filename(); std::filesystem::remove_all(case_output); std::filesystem::create_directories(case_output); std::string invocation = quote_argument(loglens_exe.generic_string()) + " --mode " + mode_argument; if (!extra_arguments.empty()) { invocation += " " + extra_arguments; } invocation += " " + quote_argument(relative_input) + " " + quote_argument(case_output.generic_string()); const int exit_code = std::system(build_command(invocation).c_str()); expect(exit_code == 0, "expected report contract CLI run to succeed for " + fixture_directory.filename().string()); const auto actual_markdown = read_file(case_output / "report.md"); const auto actual_json = read_file(case_output / "report.json"); const auto golden_markdown = read_file(fixture_directory / "report.md"); const auto golden_json = read_file(fixture_directory / "report.json"); expect_equal_lines( extract_markdown_contract_lines(actual_markdown), extract_markdown_contract_lines(golden_markdown), "markdown contract mismatch for " + fixture_directory.filename().string()); expect_equal_lines( extract_json_contract_lines(actual_json), extract_json_contract_lines(golden_json), "json contract mismatch for " + fixture_directory.filename().string()); const auto golden_findings_csv = fixture_directory / "findings.csv"; const auto golden_warnings_csv = fixture_directory / "warnings.csv"; if (expect_csv) { expect(std::filesystem::exists(golden_findings_csv), "expected golden findings.csv for " + fixture_directory.filename().string()); expect(std::filesystem::exists(case_output / "findings.csv"), "expected findings.csv for " + fixture_directory.filename().string()); expect_equal_lines( extract_csv_contract_lines(read_file(case_output / "findings.csv")), extract_csv_contract_lines(read_file(golden_findings_csv)), "findings csv contract mismatch for " + fixture_directory.filename().string()); } else { expect(!std::filesystem::exists(case_output / "findings.csv"), "did not expect findings.csv for " + fixture_directory.filename().string()); } if (expect_csv) { expect(std::filesystem::exists(golden_warnings_csv), "expected golden warnings.csv for " + fixture_directory.filename().string()); expect(std::filesystem::exists(case_output / "warnings.csv"), "expected warnings.csv for " + fixture_directory.filename().string()); expect_equal_lines( extract_csv_contract_lines(read_file(case_output / "warnings.csv")), extract_csv_contract_lines(read_file(golden_warnings_csv)), "warnings csv contract mismatch for " + fixture_directory.filename().string()); } else { expect(!std::filesystem::exists(case_output / "warnings.csv"), "did not expect warnings.csv for " + fixture_directory.filename().string()); } } } // namespace int main(int argc, char* argv[]) { if (argc != 3) { throw std::runtime_error("expected arguments: "); } const auto original_cwd = std::filesystem::current_path(); const auto repo = repo_root(); std::filesystem::current_path(repo); try { const std::filesystem::path loglens_exe = std::filesystem::absolute(argv[1]); const std::filesystem::path output_root = std::filesystem::absolute(argv[2]); const auto fixture_root = repo / "tests" / "fixtures" / "report_contracts"; run_report_contract_case( loglens_exe, fixture_root / "syslog_legacy", output_root, "syslog", "--year 2026"); run_report_contract_case( loglens_exe, fixture_root / "journalctl_short_full", output_root, "journalctl-short-full"); run_report_contract_case( loglens_exe, fixture_root / "multi_host_syslog_legacy", output_root, "syslog", "--year 2026"); run_report_contract_case( loglens_exe, fixture_root / "multi_host_journalctl_short_full", output_root, "journalctl-short-full"); run_report_contract_case( loglens_exe, fixture_root / "syslog_legacy", output_root, "syslog", "--year 2026 --csv", true); run_report_contract_case( loglens_exe, fixture_root / "multi_host_syslog_legacy", output_root, "syslog", "--year 2026 --csv", true); } catch (...) { std::filesystem::current_path(original_cwd); throw; } std::filesystem::current_path(original_cwd); return 0; } \ No newline at end of file