Skip to content

Added function export API#7

Open
subspecs wants to merge 3 commits intorodrigomatta:mainfrom
subspecs:main
Open

Added function export API#7
subspecs wants to merge 3 commits intorodrigomatta:mainfrom
subspecs:main

Conversation

@subspecs
Copy link
Contributor

@subspecs subspecs commented Mar 21, 2026

@rodrigomatta
So as we talked I created a minimal, but very versatile export functions for use in other libraries.

I only concentrated doing as minimal changes to the original code as possible, only changing 1/2 parameters here:

And as we also talked about proper logging, I didn't have time to implement one, but I created a 'config' header:|

Which allows to suppress non-essential cout's/prints so that the s2 project can be used as an library.
It does NOT suppress any errors/exceptions/etc. like that, those will still be triggered. (You can find all affected references by searching for the 'SuppressNonEssentialVerbosity' variable:

With that out of the way, now we can talk about the exported function/library design:

I created a single .cpp and .h for the exported functions only.
I ONLY copied the minimal amount of code needed to run this, mostly this is as a 'wrapper' around the classes as you wanted.
I went a little extensive with the functions, so we'd gain a whole lot of control over generation, even in other languages.

What features we gained:

  • Ability to transform text to voice using either only a text input or with an reference voice via external methods, allowing use of S2 in other languages.
  • Ability to either save the resulting audio to file and/or retrieve it as an array of float samples in code. This allows for custom modifications later on the user side of things in other languages, a very useful feature.
  • The functions are modular, meaning you can load the model/tokenizer separately, which allows you to re-use the model/tokenizer in other instances without reloading them again. This is quite big since currently there was no way to do this.
  • You can also pre-process reference audio 'codes' and re-use them per generation, so you don't have to process the reference audio each time you want to generate. This is a massive speed boost for repeated generation with the same voice.
  • I also found one kind of big logical issue in the main source while working on this, it could allow for 2x memory(RAM) reduction, maybe, but that I will test in another pull.

Sample: Working S2 code in C#

namespace FishS2Sharp
{
    internal class Program
    {
        public static class NativeMethods
        {
            const string DllPath = @"C:\Users\SubSpecs\Desktop\s2.cpp-main\build\bin\RelWithDebInfo\s2.dll";

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void* AllocS2Pipeline();
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void ReleaseS2Pipeline(void* Pipeline);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void SyncS2TokenizerConfigFromS2Model(void* Model, void* Tokenizer);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern int InitializeS2Pipeline(void* Pipeline, void* Tokenizer, void* Model, void* AudioCodec);

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void* AllocS2GenerateParams();
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void ReleaseS2GenerateParams(void* GenerateParams);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern int InitializeS2GenerateParams(void* GenerateParams, int max_new_tokens = -1, float temperature = -1, float top_p = -1, int top_k = -1, int min_tokens_before_end = -1, int n_threads = -1, int verbose = -1);

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void* AllocS2Model();
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void ReleaseS2Model(void* Model);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern int InitializeS2Model(void* Model, string gguf_path, int gpu_device, int backend_type);

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void* AllocS2Tokenizer();
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void ReleaseS2Tokenizer(void* Tokenizer);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern int InitializeS2Tokenizer(void* Tokenizer, string path);

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void* AllocS2AudioCodec();
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void ReleaseS2AudioCodec(void* AudioCodec);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern int InitializeS2AudioCodec(void* AudioCodec, string gguf_path, int gpu_device, int backend_type);

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void* AllocS2AudioPromptCodes();
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void ReleaseS2AudioPromptCodes(void* AudioPromptCodes);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern int InitializeAudioPromptCodes(void* Pipeline, int ThreadCount, string ReferenceAudioPath, void* AudioPromptCodes, int* TPrompt);

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void* AllocS2AudioBuffer(int InitialSize = -1);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern void ReleaseS2AudioBuffer(void* AudioBuffer);
            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern float* GetS2AudioBufferDataPointer(void* AudioBuffer);

            [System.Runtime.InteropServices.DllImport(DllPath)]
            public static unsafe extern int S2Synthesize(void* Pipeline, void* GenerateParams, void* AudioBuffer, void* ReferenceAudioPromptCodes, int* ReferenceAudioTPrompt, string ReferenceAudioPath = null, string ReferenceAudioTranscript = "", string TextToInfer = "", string OutputAudioPath = null, int* AudioBufferOutputLength = null);
        }

        static unsafe void Main(string[] args)
        {
            string ReferenceAudioPath = "2.mp3";
            string ReferenceAudioTranscript = "Raiden! Shang Tsung! Kitana! Choose your destiny! Johnny Cage! Sonya Blade! Kano! Jax! Round One... FIGHT! Finish Him! FATALITY! Flawless Victory!";
            string TextToInfer = "My name is Jeff, and England is my city!";
            string OutputAudioPath = "test.wav"; //Optional.

            var Pipeline = NativeMethods.AllocS2Pipeline();
            var Tokenizer = NativeMethods.AllocS2Tokenizer();
            var Model = NativeMethods.AllocS2Model();
            var AudioCodec = NativeMethods.AllocS2AudioCodec();
            var AudioBuffer = NativeMethods.AllocS2AudioBuffer();
            var AudioPromptCodes = NativeMethods.AllocS2AudioPromptCodes(); //Can be used to store Audio References.
            var GenerateParams = NativeMethods.AllocS2GenerateParams();

            if (NativeMethods.InitializeS2Tokenizer(Tokenizer, @"C:\Users\SubSpecs\Desktop\s2.cpp-main\build\bin\RelWithDebInfo\tokenizer.json") != 1)
            {
                throw new System.Exception("Failed to initialize tokenizer.");
            }
            if (NativeMethods.InitializeS2Model(Model, @"C:\Users\SubSpecs\Desktop\s2.cpp-main\build\bin\RelWithDebInfo\s2-pro-q8_0.gguf", 0, 1) != 1)
            {
                throw new System.Exception("Failed to initialize model with cuda support.");
            }
            if (NativeMethods.InitializeS2AudioCodec(AudioCodec, @"C:\Users\SubSpecs\Desktop\s2.cpp-main\build\bin\RelWithDebInfo\s2-pro-q8_0.gguf", -1, -1) != 1)
            {
                throw new System.Exception("Failed to initialize model with CPU support.");
            }
            if (NativeMethods.InitializeS2GenerateParams(GenerateParams) != 1)
            {
                throw new System.Exception("Failed to initialize GenerateParams.");
            }

            NativeMethods.SyncS2TokenizerConfigFromS2Model(Model, Tokenizer);
            if(NativeMethods.InitializeS2Pipeline(Pipeline, Tokenizer, Model, AudioCodec) != 1)
            {
                throw new System.Exception("Failed to initialize pipeline.");
            }

            int AudioSampleCount = 0, AudioTPrompt = 0;

            System.Diagnostics.Stopwatch STimer = new System.Diagnostics.Stopwatch(); STimer.Start();
            int ErrorCode = NativeMethods.S2Synthesize(Pipeline, GenerateParams, AudioBuffer, AudioPromptCodes, &AudioTPrompt, ReferenceAudioPath, ReferenceAudioTranscript, TextToInfer, OutputAudioPath, &AudioSampleCount);
            STimer.Stop(); System.Console.WriteLine("Generation Time: " + STimer.Elapsed.TotalSeconds.ToString("0.000") + "s");

            switch(ErrorCode)
            {
                case 0: { throw new System.Exception("Failed to synthesize pipeline because the pipeline is not initialized."); } break;
                case -1: { System.Console.WriteLine("[Pipeline Warning]: encode failed, running without reference audio."); } break;
                case -2: { System.Console.WriteLine("[Pipeline Warning]: load_audio failed, running without reference audio."); } break;
                case -3: { throw new System.Exception("[Pipeline Error]: init_kv_cache failed."); } break;
                case -4: { throw new System.Exception("[Pipeline Error]: generation produced no frames."); } break;
                case -5: { throw new System.Exception("[Pipeline Error]: decode failed."); } break;
                case -6: { throw new System.Exception("[Pipeline Error]: save_audio failed."); } break;
            }

            if(ErrorCode > -3)
            {
                //Do something with audio samples / output audio file.

                float* RawAudioBufferAccess = NativeMethods.GetS2AudioBufferDataPointer(AudioBuffer);

                //...
            }

            //Cleanup:
            NativeMethods.ReleaseS2AudioPromptCodes(AudioPromptCodes);
            NativeMethods.ReleaseS2Tokenizer(Tokenizer);
            NativeMethods.ReleaseS2Model(Model);
            NativeMethods.ReleaseS2AudioCodec(AudioCodec);
            NativeMethods.ReleaseS2GenerateParams(GenerateParams);
            NativeMethods.ReleaseS2Pipeline(Pipeline);
            NativeMethods.ReleaseS2AudioBuffer(AudioBuffer);
        }
    }
}

Also, I couldn't resist on creating an official PROPER modular C# wrapper for this thing. (https://github.com/subspecs/FishS2Sharp)

This library targets netstandard 2.1 so it will work in places like the Unity game engine and then S2 can be used for game development as well.

It's tested and works. (Binary in releases)
You should add it to the README here so people can find it since this is what most people are waiting for lol.

internal class Program
{
    static void Main(string[] args)
    {
        const string SomeLocation = @"C:\Users\SubSpecs\Desktop\s2.cpp-main\build\bin\RelWithDebInfo\";

        //Create an FishS2Client instance. (You can have as many as you want, models/tokenizers can be shared across instances)
        FishS2Sharp.FishS2Client Instance = new FishS2Sharp.FishS2Client(SomeLocation + "s2 - pro-q8_0.gguf", SomeLocation + "tokenizer.json", FishS2Sharp.GPUBackendTypes.Cuda);

        //(Optional) Register a reference voice with some transcript of what is said in said sample. Wav/mp3's currently onyl supported. (10-20s samples recommended.)
        Instance.RegisterVoiceReference("Mortal Combat", SomeLocation + "2.mp3", "Raiden! Shang Tsung! Kitana! Choose your destiny! Johnny Cage! Sonya Blade! Kano! Jax! " +
                "Round One... FIGHT! Finish Him! FATALITY! Flawless Victory!", out _);

        //Create some pipeline settings.
        FishS2Sharp.FishAudioParameters PipelineParameters = new FishS2Sharp.FishAudioParameters(/*int max_new_tokens = -1, float temperature = -1, float top_p = -1...*/);

        //Synthesize text to voice:
        System.Diagnostics.Stopwatch Timer = new System.Diagnostics.Stopwatch(); Timer.Start();
        Instance.Synthesize("My name is Jeff and england is my city!", "D:\\Jeff.wav", PipelineParameters, Instance.GetVoiceReference("Mortal Combat"));
        Timer.Stop(); System.Console.WriteLine("Generation Time: " + Timer.Elapsed.TotalSeconds.ToString("0.000") + "s");

        //Cleanup this sample code.
        Instance.Dispose();
    }
}

@rodrigomatta
Copy link
Owner

Hi @subspecs, thanks for putting this together. I do like where this is going.

The overall idea makes sense to me, having a native API for other languages, being able to reuse loaded model/tokenizer/codec state across runs, supporting precomputed reference codes, and returning audio buffers directly instead of only writing files. Those are all useful additions.

The C# wrapper direction also looks genuinely useful. I can see the value there, especially for people who want to use this from Unity or other .NET environments.

Before merging, though, there are a few implementation details I’d like to revisit so we don’t lock in ABI / ownership issues too early.

  • export visibility needs to be portable, right now include/s2_export_api.h defines S2_Export as __declspec(dllexport) unconditionally. I tested the current branch on Linux and it fails there because GCC rejects that. I think this should be switched to a normal cross-platform export macro.

  • CMake should build the shared library directly, at the moment the tree still only defines the s2 executable, and the install rule is still runtime-only. So the DLL/SO used by the sample is not actually being produced by the project build yet. I think we should add a real shared-library target and install rule before merging this.

  • the pipeline ownership model needs another look, InitializeS2Pipeline() currently assigns Tokenizer, SlowARModel, and AudioCodec into Pipeline. The main thing I’d like to confirm here is ownership and lifetime, since SlowARModel and AudioCodec own GGML/backend resources and clean them up in their destructors. I think we should confirm whether the pipeline should instead keep non-owning references / handles, or otherwise make the ownership explicit.

  • S2Synthesize() should be hardened for the text-only case, right now it assumes some optional inputs are non-null. It dereferences ReferenceAudioPromptCodes / ReferenceAudioTPrompt directly, and it also constructs std::string from optional C-string inputs. I think that path should be made fully safe so a plain text-only call works cleanly with null pointers.

  • the temporary audio buffer allocation should be cleaned up, when AudioBuffer == NULL, S2Synthesize() allocates a temporary std::vector with new, and then there are early return paths after that. I think this should be rewritten so the temporary buffer is owned locally and can’t leak on decode/save failure.

  • logging control probably shouldn’t be wired through a global compile-time define, i agree with the goal of suppressing non-essential output when embedding this as a library. I just think include/s2_config.h defining S2_LIBRARY unconditionally changes the normal CLI behavior too, which is probably not what we want. I’d rather handle that as runtime verbosity control, or at least scope it only to the actual library build target.

  • the export layer should stay as thin as possible over the existing pipeline path, S2Synthesize() currently duplicates most of the logic that already exists in the pipeline synthesis flow. I’d feel better if the exported API stayed closer to a thin wrapper over the existing path, so fixes and behavior changes only need to happen in one place.

None of that changes the fact that I think the feature direction is good. I’d just like to rework those pieces a bit before merging so the native API lands on a more stable base.

If you’re up for another pass in that shape, I think this could be in good shape for merge pretty quickly.

On the C# wrapper, i’m positive on the idea, but I’d prefer to hold off on adding it to the README or calling it official until the native API is stable upstream. Once the native layer is settled, I’d be happy to revisit that right after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants