Because sometimes you need to programmatically access manga metadata. For research purposes, obviously.
A blazingly fastβ’ FastAPI-based REST API that scrapes and serves manga metadata from some hentai website. Built with modern async Python, questionable regex patterns, and just enough web scraping magic to make it work without getting us sued.
This API provides a clean, RESTful interface to retrieve comprehensive manga information including metadata, tags, recommendations, and image URLs. It leverages CloudFlare bypass techniques (via curl_cffi) and BeautifulSoup for parsing, because apparently some hentai website doesn't believe in official APIs. I spent 3 days fighting CloudFlare so you don't have to.
UPDATE: The old API got deprecated (RIP). We've rebuilt it from scratch to work with some hentai website's new SvelteKit architecture. Because nothing says "fun weekend" like reverse-engineering someone else's frontend framework.
- β‘ Async Everything: Built on FastAPI with async/await for maximum performance (and to sound impressive in interviews)
- π CloudFlare Bypass: Uses
curl_cffito impersonate Chrome and bypass protection (we're basically method actors) - π Rich Metadata: Extracts titles, tags, artists, characters, parodies, and more (everything except your dignity)
- πΌοΈ Image URLs: Generates direct links to all pages and cover images (no more 404s at 2 AM)
- π² Recommendations: Scrapes related content suggestions (the algorithm knows you better than you know yourself)
- π§Ή Clean JSON: Returns well-structured, easy-to-consume data (unlike the spaghetti code that generates it)
- Python 3.8+ (if you're still on 2.7, we need to have a talk)
- A questionable sense of humor
- No judgment
- Coffee (lots of it)
- The ability to explain this project to your parents without making eye contact
-
Clone the repository (or just copy-paste like we all do when no one's watching)
git clone https://github.com/indiser/doujin-api.git cd doujin-api -
Install dependencies (pray your Python environment isn't already broken)
pip install -r requirements.txt
If this fails, try
pip3. If that fails, reinstall Python. If that fails, switch to JavaScript. If that fails, become a farmer. -
Run the server (the moment of truth)
uvicorn main:app --reload
Or with Gunicorn for production (look at you, being all professional):
gunicorn main:app -k uvicorn.workers.UvicornWorker
The API will be available at http://localhost:8000 (assuming nothing caught fire)
Home endpoint - Returns a friendly reminder that you're supposed to use the actual endpoints.
Response:
{
"Message": "Go To The EndPoint Moron /manga_id=id_number or /docs for swagger fastapi documentation"
}Note: We actually fixed the "Messege" typo in v2. Character development is real. Also added helpful directions because apparently people need hand-holding.
Retrieve manga metadata by ID.
Parameters:
manga_id(int): The some hentai website gallery ID
Example Request:
curl http://localhost:8000/manga_id=177013Pro tip: Don't Google that ID. Trust us. We're trying to protect you.
Example Response:
{
"id": 177013,
"title": "Example Title",
"date": "2023-01-15",
"media_id": "987654",
"parodies": ["Original Work"],
"characters": ["Character Name"],
"groups": ["Group Name"],
"categories": ["Manga"],
"language": ["English"],
"favorites": 12345,
"tags": ["tag1", "tag2"],
"artists": ["Artist Name"],
"num_pages": 225,
"media_id": "987654",
"page_urls": [
"https://i.some hentai website.net/galleries/987654/1.jpg",
"https://i.some hentai website.net/galleries/987654/2.jpg"
],
"cover_image": "https://t.some hentai website.net/galleries/987654/cover.jpg",
"recommendations": [
{
"id": 123456,
"title": "Related Title 1",
"thumbnail_image": "https://t.some hentai website.net/galleries/123456/thumb.jpg"
},
{
"id": 789012,
"title": "Related Title 2",
"thumbnail_image": "https://t.some hentai website.net/galleries/789012/thumb.jpg"
}
]
}Error Response:
{
"Error": "Error message here"
}- FastAPI: Modern, fast web framework for building APIs (finally, a Python framework that doesn't make us want to cry)
- curl_cffi: CloudFlare bypass via browser impersonation (we're not hackers, we just play them in production)
- BeautifulSoup4: HTML parsing for scraping recommendations and cover images (because regex-ing HTML is a war crime)
- Uvicorn/Gunicorn: ASGI server for production deployment (fancy words for "makes the code go brrr")
- Request Handling: FastAPI receives the manga ID via path parameter (the easy part)
- Session Management: Async session with Chrome impersonation bypasses CloudFlare (the "please don't ban us" part)
- CDN Configuration: Fetches dynamic image server URLs from some hentai website's config API (because hardcoding URLs is so 2023)
- Data Extraction:
- Regex extracts JSON from SvelteKit's
data-sveltekit-fetchedscript tags (yes, we're still parsing JavaScript with regex. Still not proud) - Double JSON parsing because SvelteKit wraps everything like a Russian nesting doll
- BeautifulSoup parses HTML for recommendations and cover images (the civilized approach)
- Regex extracts JSON from SvelteKit's
- URL Generation: Uses actual CDN paths from the API with round-robin server selection (we're fancy now)
- Defensive Programming: Strict validation and error handling so the API doesn't explode when some hentai website changes their HTML again (narrator: it will)
- Response: Returns clean, structured JSON with all metadata (the part that makes us look competent)
The API properly manages the async session lifecycle:
- Session created on startup
- Gracefully closed on shutdown
- No resource leaks (we're professionals here, unlike that one project we don't talk about)
The API lets curl_cffi handle all the fingerprinting automatically:
session = AsyncSession(impersonate="chrome")No custom headers needed. We learned our lesson. Let the library do the heavy lifting while we take credit.
Dynamically fetches image servers from some hentai website's config API and uses round-robin distribution:
image_pool = itertools.cycle(image_servers)Because load balancing is our passion. Also because we read that one blog post about itertools.cycle and wanted to use it.
Now uses full paths directly from the API response instead of manually constructing URLs:
- No more guessing file extensions
- No more broken image links at 2 AM
- Actual professional-looking code (we're as shocked as you are)
Because every good project needs a roadmap of features that may or may not ever get implemented (spoiler: they won't):
- Rate Limiting: Add proper rate limiting to avoid getting IP banned (again)
- Caching: Implement Redis caching for frequently requested manga (because hitting the same endpoint 1000 times is apparently a thing)
- Error Handling: More granular error responses (404s, 503s, etc.) instead of just "Error: something broke lol"
- Pagination: Support for browsing multiple manga (for the power users)
- Search Endpoint: Query by tags, artists, or titles (the feature everyone actually wants)
- Swagger Docs: Auto-generated API documentation (FastAPI makes this trivial, so we have no excuse)
- Database Integration: Store metadata locally for faster access (and to stop hammering their servers)
- Proxy Rotation: Distribute requests across multiple IPs (for when one IP ban isn't enough)
- Webhook Support: Notify when new content from favorite artists drops (we're building a notification system for manga. Let that sink in.)
- Batch Requests: Retrieve multiple manga in a single API call (efficiency is our middle name. Our first name is "Procrastination")
- Image Proxy: Serve images through the API to avoid CORS issues (because CORS is the final boss of web development)
- Authentication: API keys for access control (if you're feeling fancy and want to pretend this is enterprise software)
- GraphQL Support: Because REST is so 2020 (and we want to sound cool at meetups)
- WebSocket Streaming: Real-time updates for new releases (because polling is for peasants)
- Machine Learning: Auto-tagging and content recommendations (throw AI at it until it works)
- Mobile SDK: Native libraries for iOS/Android (so you can use this API on the go, you absolute degenerate)
- Blockchain Integration: Just kidding. We're not that desperate for funding. Yet.
- World Domination: Standard startup goal (currently at 0.00001% completion)
AKA: Things we know are broken but haven't fixed yet
- No retry logic for failed requests (if at first you don't succeed, give up immediately)
- Session isn't shared across workers in multi-process deployments (each worker is a lone wolf)
- If some hentai website changes their SvelteKit structure, everything breaks (it's not a bug, it's job security)
- Probably some race conditions we haven't discovered yet (they're like Easter eggs, but worse)
- The API returns "SvelteKit payload not found" when CloudFlare is feeling extra protective (aka Tuesday)
Contributions are welcome! Whether it's fixing typos (seriously, please do), adding features, or improving documentation, feel free to submit a PR. We promise to review it within 3-5 business days (or months, who's counting?).
# Install dev dependencies
pip install -r requirements.txt
# Run with auto-reload (watch your code break in real-time!)
uvicorn main:app --reload --host 0.0.0.0 --port 8000Contribution Guidelines:
- Write tests (we don't, but you should)
- Follow PEP 8 (we try, we really do)
- Comment your code (unlike us)
- Don't judge our regex patterns too harshly
This project is for educational purposes only. Web scraping may violate the terms of service of the target website. Use responsibly and at your own risk. The authors are not responsible for any misuse of this software.
Translation: Don't blame us if you get banned. We're just humble developers who built a thing. What you do with it is between you and your ISP.
This project is provided as-is with no license specified. Use it, modify it, sell it to venture capitalists for millions (if you do, remember us). We don't care. We're too busy debugging production issues at 3 AM.
- FastAPI: For making Python web development not painful (finally)
- curl_cffi: For solving the CloudFlare problem we didn't want to deal with (you're the real MVP)
- BeautifulSoup: Still the GOAT of HTML parsing after all these years (we're not worthy)
- some hentai website: For migrating to SvelteKit and breaking our API, forcing us to rebuild everything (thanks for the character development)
- SvelteKit: For the nested JSON structure that made us question our life choices
- Stack Overflow: For the regex pattern we definitely didn't copy-paste at 2 AM (this time it was 3 AM)
- Coffee: The real dependency that should be in requirements.txt
- Our Therapist: For listening to us complain about CloudFlare for 3 hours straight
Built with π», questionable life choices, and an unhealthy amount of caffeine
If you found this useful, consider starring the repo. Or don't. We're not your mom.
Remember: This project exists because someone, somewhere, thought "I could automate this" instead of just using the website like a normal person.
β Star this repo if you've ever explained a side project to someone and watched their face slowly transition from interest to concern β