Enhance language detection: more manifest signals, recursive scan, Makefile heuristics, and tests#57
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e47b587829
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
cmd/context.go
Outdated
| if strings.Contains(content, "g++") || strings.Contains(content, "clang++") || strings.Contains(content, ".cpp") || strings.Contains(content, ".cc") { | ||
| addLang("cpp") | ||
| } | ||
| if strings.Contains(content, "gcc") || strings.Contains(content, "clang") || strings.Contains(content, ".c") { |
There was a problem hiding this comment.
Avoid inferring C from C++-only Makefiles
The C heuristic is overly broad: strings.Contains(content, "clang") and especially strings.Contains(content, ".c") both match common C++-only tokens like clang++ and .cpp, so repositories with only C++ build rules will be mislabeled as both cpp and c. This affects language-driven behavior (e.g., context output and skill matching) whenever fallback detection relies on Makefile parsing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR improves the “cheap fallback” language detection used when the daemon/dep-graph isn’t available by adding more manifest-based signals, supporting monorepo layouts, scanning for source files recursively, and adding Makefile-based heuristics for C/C++.
Changes:
- Expanded manifest→language signals (Gradle Kotlin DSL, tsconfig, Podfile, Makefile) and added repo-root glob detection for C# projects.
- Switched fallback detection from top-level-only extension checks to a recursive scan using
scanner.ScanFiles+NewGitIgnoreCache. - Added
applyMakefileHeuristicsand new unit tests covering manifest signals and nested source detection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
cmd/context.go |
Reworks fallback language detection to include more manifest signals, Makefile heuristics, and recursive scanning. |
cmd/context_test.go |
Adds unit tests validating new manifest signals and subdirectory source detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cmd/context.go
Outdated
| // Makefile heuristics for C/C++ projects. | ||
| if _, hasMakefile := langs["make"]; hasMakefile { | ||
| applyMakefileHeuristics(filepath.Join(root, "Makefile"), addLang) | ||
| } | ||
| for _, entry := range entries { | ||
| if entry.IsDir() { | ||
| continue | ||
| } | ||
| if lang := scanner.DetectLanguage(entry.Name()); lang != "" { | ||
| langs[lang] = true | ||
| delete(langs, "make") | ||
|
|
There was a problem hiding this comment.
Makefile is added as a manifest signal for "make", but the function then unconditionally deletes langs["make"]. As a result, "make" can never be reported as a detected language, and langs["make"] is effectively just a sentinel. Either keep "make" in the final output (don’t delete it), or avoid adding it to manifests and instead check for Makefile existence directly before running heuristics (and update comments/PR description accordingly).
cmd/context.go
Outdated
| if strings.Contains(content, "g++") || strings.Contains(content, "clang++") || strings.Contains(content, ".cpp") || strings.Contains(content, ".cc") { | ||
| addLang("cpp") | ||
| } | ||
| if strings.Contains(content, "gcc") || strings.Contains(content, "clang") || strings.Contains(content, ".c") { |
There was a problem hiding this comment.
applyMakefileHeuristics will often add c for C++-only Makefiles: strings.Contains(content, "clang") matches clang++, and strings.Contains(content, ".c") also matches .cpp. This can cause systematic false positives (reporting both c and cpp). Tighten these checks (e.g., token/word-boundary matching, or explicitly excluding clang++/.cpp when detecting C).
| if strings.Contains(content, "gcc") || strings.Contains(content, "clang") || strings.Contains(content, ".c") { | |
| if strings.Contains(content, "gcc") || | |
| (strings.Contains(content, "clang") && !strings.Contains(content, "clang++")) || | |
| (strings.Contains(content, ".c") && !strings.Contains(content, ".cpp")) { |
cmd/context.go
Outdated
| // Include subdirectory-only source files. | ||
| gitCache := scanner.NewGitIgnoreCache(root) | ||
| if files, err := scanner.ScanFiles(root, gitCache, nil, nil); err == nil { | ||
| for _, f := range files { | ||
| addLang(scanner.DetectLanguage(f.Path)) | ||
| } | ||
| } |
There was a problem hiding this comment.
This adds a full recursive scanner.ScanFiles walk in the fallback path, but buildProjectContext may still call countSourceFiles(root) afterwards, which performs another full ScanFiles walk. That makes the daemon-missing path potentially do two complete directory traversals. Consider reusing the scanned file list/count from this walk (or returning both languages + fileCount from the fallback) to avoid duplicate I/O on large repos.
- Remove 'make' sentinel from manifests, check Makefile directly - Tighten C heuristic: exclude clang++ false positive - Cache ScanFiles result to avoid double directory walk - Fix build error in countSourceFiles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Motivation
Makefilecontents to avoid missing languages in non-top-level layouts.Description
detectLanguagesFromFilesto use amap[string][]stringof manifest → languages and a smalladdLanghelper to accumulate results.build.gradle.kts→kotlin,tsconfig.json→typescript,Podfile→swift,tsconfig.json→typescript,Makefile→make) and C# detection via repo-root glob for*.csproj/*.sln.packages/*/package.json, switch from top-level-only scanning to recursive scanning withscanner.ScanFilesandscanner.NewGitIgnoreCache, and translate detected file extensions into languages viascanner.DetectLanguage.applyMakefileHeuristicswhich reads aMakefile(up to 128KB) and heuristically addsc,cppwhen it finds C/C++-related tokens.makeplaceholder after applying heuristics) and added missing imports (io,strings).cmd/context_test.gocovering manifest signals and subdirectory source detection.Testing
go test ./cmd -run TestDetectLanguagesFromFilesand both tests passed.go test ./...and the test run completed successfully (including the new tests).Codex Task