Skip to content

bgpkit/oneio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

296 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OneIO - all-in-one IO library for Rust

Rust Crates.io Docs.rs License

OneIO is a Rust library providing unified IO operations for reading and writing compressed files from local and remote sources with both synchronous and asynchronous support.

Quick Start

oneio = "0.20"  # Default: gz, bz, https

Feature Selection Guide

Common Use Cases

Local files only:

oneio = { version = "0.20", default-features = false, features = ["gz", "bz"] }

HTTPS with default rustls:

oneio = { version = "0.20", default-features = false, features = ["https", "gz"] }

HTTPS with custom TLS backend:

# With rustls
oneio = { version = "0.20", default-features = false, features = ["http", "rustls", "gz"] }

# With native-tls (recommended for corporate proxies/VPNs)
oneio = { version = "0.20", default-features = false, features = ["http", "native-tls", "gz"] }

S3-compatible storage:

oneio = { version = "0.20", default-features = false, features = ["s3", "https", "gz"] }

Async operations:

oneio = { version = "0.20", features = ["async"] }

Available Features

Compression (choose only what you need):

  • gz - Gzip via flate2
  • bz - Bzip2
  • lz - LZ4
  • xz - XZ
  • zstd - Zstandard (balanced)

Protocols:

  • http - HTTP-only support (no TLS)
  • https - HTTP/HTTPS with rustls TLS backend (equivalent to http + rustls)
  • ftp - FTP support (requires http + TLS backend)
  • s3 - S3-compatible storage

TLS Backends (for HTTPS - mutually exclusive):

  • rustls - Pure Rust TLS (use with http). Uses both system certificates and bundled Mozilla certificates for maximum compatibility.
  • native-tls - Platform native TLS (use with http). Recommended for corporate proxies and VPNs (Cloudflare WARP, etc.) as it uses the OS trust store.

Additional:

  • async - Async support (limited to gz, bz, zstd for compression)
  • json - JSON parsing
  • digest - SHA256 digest calculation
  • cli - Command-line tool

Working with Corporate Proxies (Cloudflare WARP, etc.)

If you're behind a corporate proxy or VPN like Cloudflare WARP that uses custom TLS certificates:

[dependencies]
oneio = { version = "0.20", default-features = false, features = ["http", "native-tls", "gz"] }

The native-tls feature uses your operating system's TLS stack with its trust store, which includes custom corporate certificates. This works for both HTTP/HTTPS and S3 operations.

Alternatively, you can add custom CA certificates:

use oneio::OneIo;

let oneio = OneIo::builder()
    .add_root_certificate_pem(&std::fs::read("company-ca.pem")?)?
    .build()?;

Or set the ONEIO_CA_BUNDLE environment variable:

export ONEIO_CA_BUNDLE=/path/to/company-ca.pem

Environment Variables:

  • ONEIO_ACCEPT_INVALID_CERTS=true - Accept invalid TLS certificates (insecure, for development only)
  • ONEIO_CA_BUNDLE=/path/to/ca.pem - Add custom CA certificate to trust store

Library Usage

Basic Reading and Writing

Read all content into a string (works with compression and remote files automatically):

use oneio;

let content = oneio::read_to_string("https://spaces.bgpkit.org/oneio/test_data.txt.gz")?;
println!("{}", content);

Read line by line:

use oneio;

let lines = oneio::read_lines("https://spaces.bgpkit.org/oneio/test_data.txt.gz")?
    .map(|line| line.unwrap())
    .collect::<Vec<String>>();

for line in lines {
    println!("{}", line);
}

Get a reader for streaming:

use oneio;
use std::io::Read;

let mut reader = oneio::get_reader("tests/test_data.txt.gz")?;
let mut buffer = Vec::new();
reader.read_to_end(&mut buffer)?;

Write with automatic compression:

use oneio;
use std::io::Write;

let mut writer = oneio::get_writer("output.txt.gz")?;
writer.write_all(b"Hello, compressed world!")?;
drop(writer); // Important: close the writer

// Read it back
let content = oneio::read_to_string("output.txt.gz")?;

Reusable OneIo Clients

The OneIo client allows you to configure headers, TLS certificates, timeouts, and other options once, then reuse the configuration across multiple operations:

use oneio::OneIo;
use reqwest::header::{HeaderName, HeaderValue};

// Build a reusable client with custom headers and certificates
let oneio = OneIo::builder()
    .header_str("Authorization", "Bearer TOKEN")
    .add_root_certificate_pem(&std::fs::read("company-ca.pem")?)?
    .timeout(std::time::Duration::from_secs(30))
    .connect_timeout(std::time::Duration::from_secs(10))
    .build()?;

// Reuse the same configuration for multiple requests
let content1 = oneio.read_to_string("https://api.example.com/data1.json")?;
let content2 = oneio.read_to_string("https://api.example.com/data2.json")?;

Builder Methods:

  • .header(name, value) - Add a typed header (infallible, uses HeaderName and HeaderValue)
  • .header_str(name, value) - Add a string header (panics on invalid input)
  • .user_agent(value) - Set User-Agent header
  • .add_root_certificate_pem(pem) - Add custom CA certificate (PEM format)
  • .add_root_certificate_der(der) - Add custom CA certificate (DER format)
  • .danger_accept_invalid_certs(true) - Accept invalid certificates
  • .timeout(duration) - Set request timeout
  • .connect_timeout(duration) - Set connection timeout
  • .proxy(proxy) - Set HTTP proxy
  • .no_proxy() - Disable system proxy
  • .redirect(policy) - Set redirect policy
  • .configure_http(f) - Escape hatch for direct reqwest configuration

Compression Override

For URLs with query parameters or non-standard extensions, use explicit compression type:

use oneio::OneIo;

let oneio = OneIo::new()?;

// URL has query params, so we specify compression explicitly
let reader = oneio.get_reader_with_type(
    "https://api.example.com/data?format=gzip",
    "gz"
)?;

Progress Tracking

Track download/read progress with callbacks:

use oneio::OneIo;

let oneio = OneIo::new()?;

// Callback receives (bytes_read, total_bytes).
// total_bytes is 0 when the server does not provide a Content-Length.
// The returned Option<u64> is Some(total) when the size was known upfront.
let (mut reader, total_size) = oneio.get_reader_with_progress(
    "https://example.com/largefile.gz",
    |bytes_read, total_bytes| {
        if total_bytes > 0 {
            let percent = (bytes_read as f64 / total_bytes as f64) * 100.0;
            println!("Progress: {:.1}%", percent);
        } else {
            println!("Downloaded: {} bytes", bytes_read);
        }
    }
)?;

Async Support (Feature: async)

use oneio;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let content = oneio::read_to_string_async("https://example.com/data.json.gz").await?;

    oneio::download_async(
        "https://example.com/data.csv.gz",
        "local_data.csv.gz"
    ).await?;

    Ok(())
}

Note: Async compression is limited to gz, bz, zstd. LZ4/XZ return NotSupported.

S3 Operations (Feature: s3)

use oneio::s3::*;

// Direct S3 operations
s3_upload("my-bucket", "path/to/file.txt", "local/file.txt")?;
s3_download("my-bucket", "path/to/file.txt", "downloaded.txt")?;

// Read S3 directly using OneIO
let oneio = oneio::OneIo::new()?;
let content = oneio.read_to_string("s3://my-bucket/path/to/file.txt")?;

// Check existence and get metadata
if s3_exists("my-bucket", "path/to/file.txt")? {
    let stats = s3_stats("my-bucket", "path/to/file.txt")?;
    println!("Size: {} bytes", stats.content_length.unwrap_or(0));
}

// List objects
let objects = s3_list("my-bucket", "path/", Some("/".to_string()), false)?;

Required environment variables for S3:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION (use "auto" for Cloudflare R2)
  • AWS_ENDPOINT

Error Handling

OneIO uses a simplified error enum with #[non_exhaustive] for forward compatibility:

use oneio::OneIoError;

match oneio::get_reader("file.txt") {
    Ok(reader) => { /* use reader */ },
    Err(OneIoError::Io(e)) => { /* filesystem error */ },
    Err(OneIoError::Network(e)) => { /* network error */ },
    Err(OneIoError::NetworkWithContext { source, url }) => {
        // Network error with URL context for debugging
        eprintln!("Failed to fetch {}: {}", url, source);
    }
    Err(OneIoError::Status { service, code }) => { /* remote status error */ },
    Err(OneIoError::InvalidCertificate(msg)) => { /* TLS cert error */ },
    Err(OneIoError::NotSupported(msg)) => { /* feature not compiled */ },
    _ => { /* handle future error variants */ }
}

Crypto Provider Initialization (Rustls)

When using HTTPS, S3, or FTP features with rustls, oneio automatically initializes a crypto provider (AWS-LC or ring) on first use. For more control, initialize it explicitly:

use oneio;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize crypto provider explicitly at startup
    oneio::crypto::ensure_default_provider()?;

    // Now all HTTPS/S3/FTP operations will work
    let content = oneio::read_to_string("https://example.com/data.txt")?;

    Ok(())
}

Command Line Tool

Install the CLI tool:

cargo install oneio --features cli

Basic Usage

Read and print a remote compressed file:

$ oneio https://spaces.bgpkit.org/oneio/test_data.txt.gz
OneIO test file.
This is a test.

Read local compressed file:

$ oneio tests/test_data.txt.gz
OneIO test file.
This is a test.

Get file statistics:

$ oneio tests/test_data.txt --stats
lines: 	 2
chars: 	 31

Download with Progress Bar

Download a file with automatic progress bar (shown when stderr is a terminal):

$ oneio -d https://example.com/largefile.bz2
downloaded to largefile.bz2

When stderr is piped or redirected the progress bar is suppressed.

Custom HTTP Headers

Add custom headers for API authentication:

$ oneio -H "Authorization: Bearer TOKEN" -H "X-Custom-Header: value" https://api.example.com/data.json

Compression Override

For URLs with query parameters where extension detection fails:

$ oneio --compression gz "https://api.example.com/data?format=gzip"

Caching

Cache remote files locally for repeated reads:

$ oneio --cache-dir /tmp/cache https://example.com/largefile.gz
# Second read uses cache
$ oneio --cache-dir /tmp/cache https://example.com/largefile.gz

Force re-download even if cache exists:

$ oneio --cache-dir /tmp/cache --cache-force https://example.com/largefile.gz

S3 Operations

Upload file to S3:

$ oneio s3 upload local-file.txt my-bucket path/in/s3.txt
uploaded to s3://my-bucket/path/in/s3.txt

Download file from S3:

$ oneio s3 download my-bucket path/in/s3.txt -o local-file.txt
downloaded s3://my-bucket/path/in/s3.txt to local-file.txt

List S3 bucket:

$ oneio s3 list my-bucket path/ --delimiter "/"

List directories only:

$ oneio s3 list my-bucket path/ --dirs

Generate SHA256 Digest

$ oneio digest tests/test_data.txt
a3f5c8e9d2b1... (64 hex characters)

CLI Help Output

$ oneio --help
oneio reads files from local or remote locations with any compression

Usage: oneio [OPTIONS] [FILE] [COMMAND]

Commands:
  s3      S3-related subcommands
  digest  Generate SHA256 digest
  help    Print this message or the given subcommand(s)

Arguments:
  [FILE]  file to open, remote or local

Options:
  -d, --download                   download the file to the current directory
  -o, --outfile <OUTFILE>          output file path
      --cache-dir <CACHE_DIR>      cache reading to a specified directory
      --cache-force                force re-caching if a local cache already exists
      --cache-file <CACHE_FILE>    specify cache file name
  -s, --stats                      read through the file and only print out stats
  -H, --header <HEADERS>           Add HTTP header (format: "Name: Value"), can be repeated
      --compression <COMPRESSION>  Override compression type (gz, bz2, lz4, xz, zst)
  -h, --help                       Print help
  -V, --version                    Print version

Supported Formats

Compression Detection

OneIO detects compression algorithm by the file extensions:

  • Gzip: .gz, .gzip, .tgz
  • Bzip2: .bz, .bz2
  • LZ4: .lz4, .lz
  • XZ: .xz, .xz2, .lzma
  • Zstandard: .zst, .zstd

For URLs with query parameters, use --compression flag or get_reader_with_type().

Protocol Support

  • Local files: /path/to/file.txt
  • HTTP/HTTPS: https://example.com/file.txt.gz
  • FTP: ftp://ftp.example.com/file.txt (requires ftp feature)
  • S3: s3://bucket/path/file.txt (requires s3 feature)

License

MIT

About

Read any file from anywhere - local, HTTP, S3, FTP, compressed or not.

Resources

License

Stars

Watchers

Forks

Contributors