Skip to content

❇️ [RTY-260037]: Improve Short URL Handling by Resolving Nested Redirects #37

@recursivezero

Description

@recursivezero

Currently, the service stores user-submitted URLs as provided. If the input URL is itself a shortened link (e.g., Bitly, TinyURL), the system may create redirect chains such as:

our-short → bit.ly → destination

This causes unnecessary redirects, slower response times, and potential link breakage if the intermediate short URL expires.

The system should resolve nested short URLs to their final destination URL before storing them.


Goals

  • Detect and resolve nested short URLs.
  • Store only the final destination URL.
  • Prevent redirect loops and excessive redirect chains.
  • Improve performance and reliability of generated short links.

Proposed Implementation

  1. Redirect Resolution

Follow HTTP redirects until the final URL is reached.

Key requirements:

  • Maximum redirect limit: 10
  • Track visited URLs to prevent redirect loops.
  • Prefer HEAD requests for efficiency, fallback to GET when necessary.

Example logic:

import requests

def resolve_final_url(url: str, max_redirects: int = 10):
    visited = set()

    for _ in range(max_redirects):
        if url in visited:
            raise ValueError("Redirect loop detected")

        visited.add(url)

        response = requests.head(url, allow_redirects=False, timeout=5)

        if response.status_code in (301, 302, 303, 307, 308):
            url = response.headers.get("Location")
        else:
            return url

    raise ValueError("Too many redirects")

  1. URL Normalization

Before storing URLs:

  • Trim whitespace
  • Ensure scheme exists ("http" or "https")
  • Convert relative redirect locations to absolute URLs

Example:

example.com → https://example.com


  1. Domain Validation

Allow only safe protocols:

Allowed:

http
https

Rejected:


javascript:
data:
file:
ftp:

This prevents security abuse and malformed redirects.


  1. Redirect Loop Protection

Maintain a set of visited URLs while resolving redirects.

Example loop:

A → B
B → C
C → A

If a URL repeats during resolution, abort the process.


  1. Redirect Resolution Cache (Optional Enhancement)

Cache resolved URLs to reduce external HTTP requests.

Example cache entry:

bit.ly/abc123 → https://example.com/page

Suggested TTL: 24 hours

Possible storage:

  • Redis
  • in-memory LRU cache
  • database table

Benefits:

  • Faster URL submission
  • Reduced external requests
  • Lower latency under load

Expected Behavior

Current behavior:

short.ly/xyz → bit.ly/abc → example.com

Expected behavior after implementation:

short.ly/xyz → example.com


Benefits

  • Faster redirects (single hop)
  • More reliable links
  • Improved analytics accuracy
  • Reduced dependency on third-party shorteners
  • Better spam and abuse detection

Acceptance Criteria

  • Nested short URLs resolve to their final destination before storage.
  • Maximum redirect limit enforced.
  • Redirect loop detection implemented.
  • Only "http" and "https" schemes allowed.
  • Unit tests added for redirect resolution logic.
  • Error handling for unreachable URLs implemented.

Test Cases

Case 1 – Nested Short URL

Input:

https://bit.ly/example

Stored:

https://final-destination.com


Case 2 – Redirect Loop

Input chain:

A → B → C → A

Expected result:

Error: Redirect loop detected


Case 3 – Too Many Redirects

Input chain exceeds redirect limit.

Expected result:

Error: Too many redirects


Future Enhancements

  • Background job to resolve URLs asynchronously.
  • Automatic detection of malicious domains.
  • Deduplication (same destination → reuse existing short code).
  • Domain classification and blacklist checks.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions