Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
/describe |
|
/review |
Review Summary by QodoAdd AI-based automatic song tagging with OpenAI integration
WalkthroughsDescription• Add AI-based automatic song tagging using OpenAI GPT-4o-mini • Create tagging cron script with Supabase integration • Reorganize crawling scripts from crawling/ to cron/ directory • Add new GitHub Actions workflow for daily song tagging • Update documentation and improve workflow step names Diagramflowchart LR
A["Song Database"] -->|getSongsAllDB| B["Tagging Script"]
C["Tagged Songs Cache"] -->|getSongTagSongIdsDB| B
B -->|autoTagSong| D["OpenAI GPT-4o-mini"]
D -->|tag IDs| B
B -->|postSongTagsDB| E["song_tags Table"]
F["GitHub Actions"] -->|Daily Schedule| B
File Changes1. packages/crawling/src/utils/getSongTag.ts
|
|
/improve |
Code Review by QodoNew Review StartedThis review has been superseded by a new analysisⓘ The new review experience is currently in Beta. Learn more |
1 similar comment
Code Review by QodoNew Review StartedThis review has been superseded by a new analysisⓘ The new review experience is currently in Beta. Learn more |
Code Review by Qodo
1. taggingSongs.ts stops at 5000
|
|
PR Description updated to latest commit (3023c13)
|
| // 2. 순차 순회 (테스트: 5회만 실행) | ||
| let processedCount = 0; | ||
| for (const song of allSongs) { | ||
| if (processedCount >= 5000) break; | ||
| if (taggedSongIds.has(song.id)) { |
There was a problem hiding this comment.
1. taggingsongs.ts stops at 5000 📎 Requirement gap ≡ Correctness
The new tagging cron stops iterating after processing 5000 songs, which can leave the remaining songs unprocessed when the song list exceeds that size. This violates the requirement to reliably traverse the entire song list for per-song tag extraction.
Agent Prompt
## Issue description
`packages/crawling/src/cron/taggingSongs.ts` breaks out of the main loop when `processedCount >= 5000`, which prevents processing the full song list.
## Issue Context
Compliance requires iterating through the entire song list predictably; a hard-coded cap causes unintentional partial processing as the dataset grows.
## Fix Focus Areas
- packages/crawling/src/cron/taggingSongs.ts[17-21]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| export async function getSongTagSongIdsDB(): Promise<Set<string>> { | ||
| const supabase = getClient(); | ||
|
|
||
| const { data, error } = await supabase.from('song_tags').select('song_id').limit(50000); | ||
|
|
||
| if (error) throw error; | ||
|
|
||
| return new Set(data.map(row => row.song_id)); | ||
| } |
There was a problem hiding this comment.
2. Incomplete tagged-song preload 🐞 Bug ≡ Correctness
getSongTagSongIdsDB() loads only 50,000 rows from song_tags (a many-to-many mapping), so taggingSongs.ts can miss many already-tagged songs and re-run OpenAI + attempt duplicate inserts for them.
Agent Prompt
## Issue description
`getSongTagSongIdsDB()` currently fetches only `limit(50000)` rows from `song_tags`. Because `song_tags` has multiple rows per song (1~4 tags per song), this limit does **not** represent “50k songs”; it represents “50k mappings”. As the table grows, many tagged songs will not be loaded into the Set and the tagging cron will re-tag them and attempt duplicate inserts.
## Issue Context
`taggingSongs.ts` uses this Set as the sole mechanism to skip already-tagged songs, so the correctness and completeness of this Set is critical.
## Fix Focus Areas
- packages/crawling/src/supabase/getDB.ts[102-110]
- packages/crawling/src/cron/taggingSongs.ts[11-24]
## Suggested fix
- Implement pagination over `song_tags` using `.range(from, to)` (or another paging strategy) until no rows are returned, adding `song_id` to a `Set`.
- Optionally order by `song_id` to make paging deterministic.
- Consider updating `taggedSongIds` in-memory after a successful insert (`taggedSongIds.add(song.id)`) to avoid any accidental reprocessing within the same run.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
User description
📌 PR 제목
[Feat] : 노래별 태그 추출 크롤링 코드 작성
📌 변경 사항
packages/crawling/src/utils/getSongTag.ts)packages/crawling/src/cron/taggingSongs.ts)packages/crawling/src/supabase/)crawling/→cron/디렉토리 이동💬 추가 참고 사항
PR Type
Enhancement
Description
Add AI-based automatic song tagging using OpenAI GPT-4o-mini
Implement song tagging cron script with Supabase integration
Reorganize crawling scripts from
crawling/tocron/directoryAdd new GitHub Actions workflow for automated daily tagging
Update documentation and improve workflow step names
Diagram Walkthrough
File Walkthrough
4 files
AI-powered song tag extraction utilityCron script for batch song taggingAdd song and tag query functionsAdd song tags insertion function3 files
Update import path after directory moveUpdate import path and increase limitUpdate sitemap timestamp5 files
New workflow for automated song taggingImprove workflow step name clarityImprove workflow step name clarityImprove workflow step name clarityUpdate script paths and add tag-songs command2 files
Document tagging pipeline and workflowsUpdate crawling package description2 files
Reformat TypeScript type definitionConsolidate array operations on single line1 files