-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently, the service stores user-submitted URLs as provided. If the input URL is itself a shortened link (e.g., Bitly, TinyURL), the system may create redirect chains such as:
our-short → bit.ly → destination
This causes unnecessary redirects, slower response times, and potential link breakage if the intermediate short URL expires.
The system should resolve nested short URLs to their final destination URL before storing them.
Goals
- Detect and resolve nested short URLs.
- Store only the final destination URL.
- Prevent redirect loops and excessive redirect chains.
- Improve performance and reliability of generated short links.
Proposed Implementation
- Redirect Resolution
Follow HTTP redirects until the final URL is reached.
Key requirements:
- Maximum redirect limit: 10
- Track visited URLs to prevent redirect loops.
- Prefer HEAD requests for efficiency, fallback to GET when necessary.
Example logic:
import requests
def resolve_final_url(url: str, max_redirects: int = 10):
visited = set()
for _ in range(max_redirects):
if url in visited:
raise ValueError("Redirect loop detected")
visited.add(url)
response = requests.head(url, allow_redirects=False, timeout=5)
if response.status_code in (301, 302, 303, 307, 308):
url = response.headers.get("Location")
else:
return url
raise ValueError("Too many redirects")- URL Normalization
Before storing URLs:
- Trim whitespace
- Ensure scheme exists ("http" or "https")
- Convert relative redirect locations to absolute URLs
Example:
example.com → https://example.com
- Domain Validation
Allow only safe protocols:
Allowed:
http
https
Rejected:
javascript:
data:
file:
ftp:
This prevents security abuse and malformed redirects.
- Redirect Loop Protection
Maintain a set of visited URLs while resolving redirects.
Example loop:
A → B
B → C
C → A
If a URL repeats during resolution, abort the process.
- Redirect Resolution Cache (Optional Enhancement)
Cache resolved URLs to reduce external HTTP requests.
Example cache entry:
bit.ly/abc123 → https://example.com/page
Suggested TTL: 24 hours
Possible storage:
- Redis
- in-memory LRU cache
- database table
Benefits:
- Faster URL submission
- Reduced external requests
- Lower latency under load
Expected Behavior
Current behavior:
short.ly/xyz → bit.ly/abc → example.com
Expected behavior after implementation:
short.ly/xyz → example.com
Benefits
- Faster redirects (single hop)
- More reliable links
- Improved analytics accuracy
- Reduced dependency on third-party shorteners
- Better spam and abuse detection
Acceptance Criteria
- Nested short URLs resolve to their final destination before storage.
- Maximum redirect limit enforced.
- Redirect loop detection implemented.
- Only "http" and "https" schemes allowed.
- Unit tests added for redirect resolution logic.
- Error handling for unreachable URLs implemented.
Test Cases
Case 1 – Nested Short URL
Input:
Stored:
Case 2 – Redirect Loop
Input chain:
A → B → C → A
Expected result:
Error: Redirect loop detected
Case 3 – Too Many Redirects
Input chain exceeds redirect limit.
Expected result:
Error: Too many redirects
Future Enhancements
- Background job to resolve URLs asynchronously.
- Automatic detection of malicious domains.
- Deduplication (same destination → reuse existing short code).
- Domain classification and blacklist checks.