-
Notifications
You must be signed in to change notification settings - Fork 0
[Feat] : 노래별 태그 추출 크롤링 코드 작성 (#173) #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| name: Tagging Songs | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: "0 14 * * *" # 한국 시간 23:00 실행 (UTC+9 → UTC 14:00) | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: write # push 권한을 위해 필요 | ||
|
|
||
| jobs: | ||
| run-npm-task: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout branch | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Use Node.js 20 | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "20" | ||
|
|
||
| - name: Install pnpm | ||
| uses: pnpm/action-setup@v2 | ||
| with: | ||
| version: 9 | ||
| run_install: false | ||
|
|
||
| - name: Install dependencies | ||
| working-directory: packages/crawling | ||
| run: pnpm install | ||
|
|
||
| - name: Create .env file | ||
| working-directory: packages/crawling | ||
| run: | | ||
| echo "SUPABASE_URL=${{ secrets.SUPABASE_URL }}" >> .env | ||
| echo "SUPABASE_KEY=${{ secrets.SUPABASE_KEY }}" >> .env | ||
| echo "OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> .env | ||
|
|
||
| - name: run tagging script - taggingSongs.ts | ||
| working-directory: packages/crawling | ||
| run: pnpm run tag-songs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| <?xml version="1.0" encoding="UTF-8"?> | ||
| <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"> | ||
| <url><loc>https://www.singcode.kr</loc><lastmod>2026-03-25T14:32:28.966Z</lastmod><changefreq>weekly</changefreq><priority>0.7</priority></url> | ||
| <url><loc>https://www.singcode.kr</loc><lastmod>2026-03-27T14:29:45.638Z</lastmod><changefreq>weekly</changefreq><priority>0.7</priority></url> | ||
| </urlset> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| import { getSongTagSongIdsDB, getSongsAllDB } from '@/supabase/getDB'; | ||
| import { postSongTagsDB } from '@/supabase/postDB'; | ||
| import { autoTagSong } from '@/utils/getSongTag'; | ||
|
|
||
| const resultsLog = { | ||
| success: 0, | ||
| failed: 0, | ||
| skipped: 0, | ||
| }; | ||
|
|
||
| // 1. 전체 곡 조회 + 이미 태그된 곡 ID 로드 | ||
| const [allSongs, taggedSongIds] = await Promise.all([getSongsAllDB(), getSongTagSongIdsDB()]); | ||
|
|
||
| console.log('전체 곡 수:', allSongs.length); | ||
| console.log('이미 태그된 곡 수:', taggedSongIds.size); | ||
|
|
||
| // 2. 순차 순회 (테스트: 5회만 실행) | ||
| let processedCount = 0; | ||
| for (const song of allSongs) { | ||
| if (processedCount >= 5000) break; | ||
| if (taggedSongIds.has(song.id)) { | ||
| resultsLog.skipped++; | ||
| continue; | ||
| } | ||
|
|
||
| try { | ||
| const tagIds = await autoTagSong(song.title, song.artist); | ||
|
|
||
| if (tagIds.length === 0) { | ||
| resultsLog.failed++; | ||
| console.log(`[FAIL] ${song.title} - ${song.artist}: 태그 없음`); | ||
| continue; | ||
| } | ||
|
|
||
| const success = await postSongTagsDB(song.id, tagIds); | ||
| if (success) { | ||
| resultsLog.success++; | ||
| console.log(`[OK] ${song.title} - ${song.artist}: [${tagIds.join(', ')}]`); | ||
| } else { | ||
| resultsLog.failed++; | ||
| } | ||
| } catch (error) { | ||
| resultsLog.failed++; | ||
| console.error(`[ERROR] ${song.title} - ${song.artist}:`, error); | ||
| } | ||
|
|
||
| processedCount++; | ||
|
|
||
| // OpenAI rate limit 대비 딜레이 | ||
| await new Promise(resolve => setTimeout(resolve, 200)); | ||
| } | ||
|
|
||
| // 3. 결과 출력 | ||
| console.log(` | ||
| 총 ${allSongs.length}곡 중: | ||
| - 스킵 (이미 태그됨): ${resultsLog.skipped}곡 | ||
| - 성공: ${resultsLog.success}곡 | ||
| - 실패: ${resultsLog.failed}곡 | ||
| `); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -84,3 +84,27 @@ export async function getVerifyKySongsDB(): Promise<Set<string>> { | |
|
|
||
| return new Set(data.map(row => row.id)); | ||
| } | ||
|
|
||
| export async function getSongsAllDB(max: number = 50000) { | ||
| const supabase = getClient(); | ||
|
|
||
| const { data, error } = await supabase | ||
| .from('songs') | ||
| .select('id, title, artist') | ||
| .order('created_at', { ascending: false }) | ||
| .limit(max); | ||
|
|
||
| if (error) throw error; | ||
|
|
||
| return data; | ||
| } | ||
|
|
||
| export async function getSongTagSongIdsDB(): Promise<Set<string>> { | ||
| const supabase = getClient(); | ||
|
|
||
| const { data, error } = await supabase.from('song_tags').select('song_id').limit(50000); | ||
|
|
||
| if (error) throw error; | ||
|
|
||
| return new Set(data.map(row => row.song_id)); | ||
| } | ||
|
Comment on lines
+102
to
+110
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 2. Incomplete tagged-song preload getSongTagSongIdsDB() loads only 50,000 rows from song_tags (a many-to-many mapping), so taggingSongs.ts can miss many already-tagged songs and re-run OpenAI + attempt duplicate inserts for them. Agent Prompt
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| import OpenAI from 'openai'; | ||
| import dotenv from 'dotenv'; | ||
|
|
||
| import { getClient } from '@/supabase/getClient'; | ||
|
|
||
| dotenv.config(); | ||
|
|
||
| const client = new OpenAI({ | ||
| apiKey: process.env.OPENAI_API_KEY, | ||
| }); | ||
|
|
||
| // 태그 정보를 담을 타입 정의 | ||
| interface Tag { | ||
| id: number; | ||
| name: string; | ||
| category: string; | ||
| } | ||
|
|
||
| let cachedTagsPrompt: string | null = null; | ||
|
|
||
| /** | ||
| * DB에서 전체 태그 목록을 읽어와 AI 프롬프트용 텍스트로 변환한다. | ||
| */ | ||
| const getTagsForPrompt = async (): Promise<string> => { | ||
| if (cachedTagsPrompt) return cachedTagsPrompt; | ||
|
|
||
| const supabase = getClient(); | ||
| const { data: tags, error } = await supabase | ||
| .from('tags') | ||
| .select('id, name, category') | ||
| .order('id'); | ||
|
|
||
| if (error) { | ||
| console.error('Error fetching tags:', error); | ||
| return ''; | ||
| } | ||
|
|
||
| // AI가 읽기 편하게 "ID: 이름 (카테고리)" 형식으로 변환 | ||
| cachedTagsPrompt = tags.map((tag: Tag) => `${tag.id}: ${tag.name} (${tag.category})`).join('\n'); | ||
| return cachedTagsPrompt; | ||
| }; | ||
|
|
||
| /** | ||
| * AI를 활용해 노래에 적절한 태그 ID들을 추출한다. | ||
| */ | ||
| export const autoTagSong = async (title: string, artist: string): Promise<number[]> => { | ||
| try { | ||
| // 1단계: 프롬프트용 태그 리스트 준비 | ||
| const tagsPrompt = await getTagsForPrompt(); | ||
| if (!tagsPrompt) return []; | ||
|
|
||
| // 2단계: OpenAI API 호출 | ||
| const response = await client.chat.completions.create({ | ||
| model: 'gpt-4o-mini', // 가성비가 좋은 모델 사용 | ||
| messages: [ | ||
| { | ||
| role: 'system', | ||
| content: ` | ||
| You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list. | ||
|
|
||
| Guidelines: | ||
| 1. Select at least one tag, but no more than 4. | ||
| 2. Prioritize Language (100s), then Genre (200s), then Origin (300s). | ||
| 3. If it's Japanese music, ALWAYS include 101 (J-POP). | ||
| 4. Be precise. If it's from an Anime, use 302 (애니메이션). | ||
| 5. Return only JSON: {"tag_ids": [number, number, ...]} | ||
|
|
||
| Allowed Tags List: | ||
| ${tagsPrompt} | ||
| `, | ||
| }, | ||
| { | ||
| role: 'user', | ||
| content: `Title: "${title}", Artist: "${artist}"`, | ||
| }, | ||
| ], | ||
| response_format: { type: 'json_object' }, | ||
| temperature: 0, // 결과의 일관성을 위해 0으로 설정 | ||
| max_tokens: 50, // 결과가 짧으므로 토큰 제한 | ||
| }); | ||
|
|
||
| const content = response.choices[0].message.content; | ||
| if (!content) return []; | ||
|
|
||
| // 3단계: 결과 파싱 및 반환 | ||
| const result: { tag_ids: number[] } = JSON.parse(content); | ||
| return result.tag_ids; | ||
| } catch (error) { | ||
| console.error('Error auto-tagging song:', error); | ||
| return []; | ||
| } | ||
| }; |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. taggingsongs.ts stops at 5000
📎 Requirement gap≡ CorrectnessAgent Prompt
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools