A simple Scrapy-based crawler that extracts information about users who have starred a GitHub repository.
-
Make sure you have Python installed (Python 3.6+ recommended)
-
Create a virtual environment:
# Navigate to the project directory cd /path/to/leads-crawler/github_stargazers # Create a virtual environment python -m venv venv # Activate the virtual environment # On macOS/Linux: source venv/bin/activate # On Windows: # venv\Scripts\activate
-
Install dependencies:
# With the virtual environment activated pip install -r requirements.txt
# Make sure your virtual environment is activated
python run.py --repo username/repositoryOptions:
--repoor-r: GitHub repository in format "username/repository" (default: "langfuse/langfuse")--outputor-o: Custom output filename without extension (optional)
Examples:
# Default repository (langfuse/langfuse)
python run.py
# Custom repository
python run.py --repo openai/openai-python
# Custom repository with custom output filename
python run.py --repo openai/openai-python --output openai_stars# Make sure your virtual environment is activated
scrapy crawl stargazers -a repo_url="https://github.com/username/repository"Example:
scrapy crawl stargazers -a repo_url="https://github.com/openai/openai-python"Results will be saved to CSV files in the results directory with the following format:
- Custom filename:
custom_filename.csv - Default filename:
stargazers_owner_repo_YYYYMMDD_HHMMSS.csv