← All blogs
AI

Deploying Custom AI Models Across Android, iOS & Cross-Platform Apps with Melange

If you read my previous article on on-device AI in Android, you already know why running models locally matters: faster inference, better privacy, and zero dependency on an internet connection.

Anand Gaur
Mobile Tech Lead · 30 May 2026
Deploying Custom AI Models Across Android, iOS & Cross-Platform Apps with Melange
If you read my previous article on on-device AI in Android, you already know why running models locally matters: faster inference, better privacy, and zero dependency on an internet connection.

But that article skipped one thing the hardest part. Deployment.

Training a model is one challenge. Getting it to run efficiently on a real phone across Snapdragon, Exynos, Google Tensor, and Apple Silicon is a completely different one. You have to compress the model, optimize it for each chip, benchmark it across devices, and then wire it into your app. Done manually, that’s days or weeks of work.

That’s exactly the problem Melange solves.

The deployment problem nobody warns you about

Here’s what usually happens. Your model runs beautifully on a GPU server. You move it to a phone and… it crawls. So you start the rabbit hole:

  • Quantize the model so it fits in mobile memory.
  • Learn a different toolchain for each chip vendor Qualcomm, MediaTek, Samsung, and Apple each require different SDKs, toolchains, and optimization strategies.
  • Discover that Android’s NPU landscape is fragmented across hundreds of device models.
  • Benchmark everything, then write device-specific runtime code.

Getting a model to run efficiently on a single NPU can take weeks and supporting all your target devices multiplies that effort.

Most teams give up and either run on the CPU (slow, battery-hungry) or push inference to a cloud server (latency, cost, privacy concerns). Neither is great.

What is Melange?

Melange is a developer platform built by ZETIC, a company founded by former Qualcomm engineers. It does one thing extremely well: takes your AI model and gets it running on real devices fast, automatically optimized for CPU, GPU, or NPU depending on the hardware.

The mental model is simple: you bring the model, Melange handles the hardware. No C++, no OpenCL, no Metal shaders.

The workflow is three steps:

  1. Upload your custom or fine-tuned model (or pick one from the public library).
  2. Benchmark it across real devices Apple, Samsung, Google, Xiaomi, and more.
  3. Deploy with a generated SDK integration and a few lines of code.

No manual quantization. No hardware-specific optimization code. No weeks of trial and error. Behind the scenes, Melange automatically analyzes, quantizes, and compiles the model graph for each NPU target, then ships a device-specific binary at runtime.

Real example: SPECTRA at LA Hacks 2026

At LA Hacks 2026, a team of four developers built SPECTRANet a custom neural network that fuses iPhone LiDAR with the RGB camera to produce dense, full-resolution depth maps entirely on-device.

Their model was custom-trained on the ARKitScenes dataset. Before LA Hacks, they had no way to run it efficiently on a phone. On a GPU server it was fine. On a phone, too slow to be practical.

They uploaded their PyTorch model to Melange. Within 16 minutes, it was benchmarked across Apple, Samsung, Google, and Xiaomi devices and ready to deploy. Running on the NPU via Melange, they hit a 26x performance improvement over standard CPU deployment sub-millisecond latency with 99.1% pixel accuracy.

That’s the difference between a project that only works on a server and one that works in the real world. (For reference, ZETIC’s own published benchmark shows a similar story: YOLOv11n on an iPhone 16 went from 102ms on CPU to 1.9ms on the NPU — about 54x faster.)

Want to See It Running First? Clone a Sample App

Before writing any code, the fastest way to feel how Melange works is to run one of the official open-source apps. ZETIC maintains a repo of production-grade apps YOLO object detection, Whisper speech recognition, face landmark tracking, on-device LLM chat, and more each with full Android and iOS projects.

# 1. Clone the repository
git clone https://github.com/zetic-ai/ZETIC_Melange_apps.git
cd ZETIC_Melange_apps

# 2. Get your free Melange Personal Access Token from the dashboard
# (mlange.zetic.ai → Settings → Personal Access Token)

# 3. Configure your key automatically
./adapt_mlange_key.sh

# 4. Open an app and run it
# Android: apps/<ModelName>/Android → Android Studio
# iOS: apps/<ModelName>/iOS → Xcode

Each app folder contains an Android/ project (Kotlin), an iOS/ project (Swift), and a prepare/ folder with the model-export scripts. It's the best reference code you'll find. Now let's build from scratch.

The Implementation Guide

This section takes you from a raw model file to running inference on a real device.

Step 0: Prepare your model

Before uploading, your model needs to be in a format Melange accepts. It supports two formats fully:

Format Extension
ONNX .onnx
PyTorch Exported Program .pt2
 
 

If you’re coming from PyTorch, export to a .pt2 exported program:

import torch

# your trained model in eval mode
model.eval()

# a sample input with the SAME shape your app will feed at runtime
example_input = torch.randn(1, 3, 224, 224)

# export to PyTorch Exported Program format
exported = torch.export.export(model, (example_input,))
torch.export.save(exported, "my_model.pt2")

Already have an ONNX file? Upload that directly. Tip: keep your sample input shape handy — you’ll need it for both upload and for preparing inputs in your app.

Step 1: Sign up and open the dashboard

Go to mlange.zetic.ai and create a free account. The free (Lite) plan gives you access to the public model library, benchmarking, and SDK generation. Paid tiers add custom-model support and more device optimization targets.

Step 2: Upload your model and get your keys

Upload your .onnx or .pt2 file on the dashboard. Melange automatically analyzes, quantizes, and compiles the graph for NPU targets in the background.

Once it finishes, you get two credentials — keep these safe:

  • Personal Access Token (Personal Key) — your secure credential for on-device authentication. Find it under Settings → Personal Access Token.
  • Model Key — the unique identifier for your hardware-accelerated model (it looks like your-name/YourModelName).

A nice touch: the dashboard hands you ready-to-paste source code with your keys already filled in.

Step 3: Benchmark across devices

Melange runs your model across a range of real devices and shows you the performance profile for each latency, throughput, and memory across CPU, GPU, and NPU. Pick the configuration that fits your use case, then move on to integration.

Android: Step-by-Step Setup + Code

What you need first: Android Studio (Arctic Fox or later), a physical Android device (emulators have no NPU), minimum SDK 24 (Android 7.0), and your Personal Key.

1. Add the Melange dependency

In your app-level build.gradle.kts:

android {
// ...
packaging {
jniLibs {
useLegacyPackaging = true
}
}
}

dependencies {
implementation("com.zeticai.mlange:mlange:+")
}

⚠️ Don’t skip useLegacyPackaging. It ensures the native C++ NPU drivers (JNI) are bundled without compression. Without it you'll get a java.lang.UnsatisfiedLinkError at runtime a confusing crash to debug. (Using Groovy build.gradle? Use packagingOptions { jniLibs { useLegacyPackaging true } } and implementation 'com.zeticai.mlange:mlange:+'.)

2. Sync and build

Hit Sync Now in the Gradle bar and build the project to confirm the dependency resolves.

3. Run inference

import com.zeticai.mlange.core.model.ZeticMLangeModel
import com.zeticai.mlange.core.tensor.Tensor

// (1) Load the model.
// On first run this downloads a binary compiled specifically
// for THIS device's NPU chipset. Run it off the main thread.
val model = ZeticMLangeModel(
context = this,
tokenKey = "YOUR_PERSONAL_KEY",
modelName = "Team_ZETIC/YOLO26" // or your own "your-name/YourModel"
)

// (2) Prepare inputs — shapes MUST match what the model expects.
val inputs: Array<Tensor> = prepareInputs()

// (3) Run on the NPU. No delegate config, no memory syncing.
val outputs = model.run(inputs)

Important: the constructor makes a network call on first use to download the model binary. Call it from a background thread or a coroutine so you don’t block the UI.

4. (Optional) Speed up per-frame loops

For real-time use like camera inference, skip the per-call byte copy by writing into the model’s own input buffers:

kotlin

val inputBuffers = model.getInputBuffers()

for (i in inputBuffers.indices) {
inputBuffers[i].from(sourceTensors[i])
}
val outputs = model.run()

Consume the output tensors before the next run() call — those buffers get reused.

iOS: Step-by-Step Setup + Code

What you need first: Xcode 14+, a physical iOS device (iPhone 8 or later — simulators have no Neural Engine), iOS 15.0+, and your Personal Access Token.

1. Add the Melange package (Swift Package Manager)

  1. In Xcode, go to File → Add Package Dependencies.
  2. Enter the package URL: https://github.com/zetic-ai/ZeticMLangeiOS.git
  3. Set the dependency rule to Exact Version 1.6.0 (or Up to Next Major from 1.6.0).
  4. Click Add Package and link it to your app target.

2. Link Accelerate.framework manually

The SDK depends on Apple’s Accelerate framework, and SPM won’t link it automatically:

  1. Select your app target → General → Frameworks, Libraries, and Embedded Content.
  2. Click +, search for Accelerate.framework, and add it.

⚠️ Skip this and you’ll hit linker errors like Undefined symbol: _vDSP_vmul at build time.

3. Run inference

import ZeticMLange

class ViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()

do {
// (1) Load the model — downloads a Neural Engine-optimized
// binary on first run, then caches it locally.
let model = try ZeticMLangeModel(
tokenKey: "YOUR_PERSONAL_KEY",
name: "Team_ZETIC/YOLO26",
version: 1
)

// (2) Prepare inputs — shapes must match the model spec.
let inputs: [Tensor] = prepareInputs()

// (3) Run on the Apple Neural Engine.
let outputs = try model.run(inputs: inputs)

for output in outputs {
// post-process each output tensor
}
} catch {
print("Melange error: \(error)")
}
}
}

Same three-step pattern as Android — load → prepare inputs → run — just in Swift. That unified API is one of Melange's best features: learn the mental model once, reuse it everywhere.

Flutter & React Native: Cross-Platform Support

Good news for cross-platform developers — Melange isn’t Android/iOS only. ZETIC ships starter templates for both Flutter and React Native, currently focused on on-device LLM use cases (chat apps with real-time token streaming). General vision/detection model support is expanding.

Flutter

Clone the official template to get a working chat app:

git clone https://github.com/zetic-ai/zetic-llm-flutter-template.git
cd zetic-llm-flutter-template
flutter pub get
cd ios && pod install && cd ..

Add your credentials in lib/core/constants.dart:

class Constants {
// TODO: Replace with your actual credentials
static final personalAccessToken = "YOUR_PERSONAL_ACCESS_TOKEN";
static final modelKey = "YOUR_MODEL_KEY";
}

Then initialize the model in lib/main.dart. Notice you can pick the quantization level — lower bits mean a smaller, faster model at some accuracy cost:

final result = await LLMService.instance.initializeModel(
personalAccessKey: Constants.personalAccessToken,
modelKey: Constants.modelKey,
target: LLMTarget.llamaCpp,
quantType: LLMQuantType.quantized4BitKM, // F16, Q8_0, Q6_K, Q4_K_M, Q3_K_M, Q2_K...
);

Run it with flutter run — on a real device only (simulators have no NPU).

React Native

The React Native template uses the react-native-zetic-mlange package:

git clone https://github.com/zetic-ai/zetic-llm-react-native-template.git
cd zetic-llm-react-native-template
npm install
cd ios && pod install && cd ..

Add your credentials in src/constants.ts:

// TODO: Replace with your actual credentials
export const PERSONAL_ACCESS_KEY = 'YOUR_PERSONAL_ACCESS_TOKEN';
export const MODEL_KEY = 'YOUR_MODEL_KEY';

Initialize the LLM in App.tsx:

await ZeticLLM.init(
personalAccessKey,
modelKey,
ZeticLLMTarget.LLAMA_CPP,
ZeticQuantType.Q6_K // pick your quantization
);

Then npx react-native run-android or run-ios — again, on a real device for accurate performance.

Practical advice: if you’re on Flutter or React Native and need a general (non-LLM) model like object detection right now, the cleanest path is to wrap the native Android/iOS SDK in a small platform channel / native module — reusing exactly the code from the sections above. For chat/LLM features, the templates above work out of the box.

When to use Melange vs native tools

You already know ML Kit and LiteRT from my previous article. Here’s how Melange fits in:

  • ML Kit — Google’s high-level AI API. Best for plug-and-play pre-trained models for vision, text, and speech. It supports some custom TFLite models, but its strength is making common perception tasks simple with minimal setup.
  • LiteRT — Google’s on-device ML and GenAI framework, the evolution of TensorFlow Lite. It handles model conversion, runtime optimization, and execution across CPU/GPU/NPU. Powerful and flexible but you still manage a lot of model prep, device-specific tuning, and runtime config yourself.
  • Melange — A deployment layer that sits on top of your model. You bring your custom or fine-tuned model; Melange handles hardware targeting, quantization, benchmarking, and device-specific integration, then hands you a ready SDK.

Melange is not a replacement for LiteRT. It’s the layer above it that absorbs the optimization and deployment complexity, so you can focus on building the actual app experience.

Common pitfalls

A few things that trip people up:

  • Testing on an emulator/simulator. Emulators and simulators have no NPU. Always benchmark on a physical device, or your numbers are meaningless.
  • Input shape mismatches. If your tensor shapes don’t exactly match the model spec, you’ll get a runtime exception. Check the input spec on the dashboard.
  • Blocking the UI thread. The first model load triggers a network download. Always run it on a background thread or coroutine.
  • Forgetting useLegacyPackaging (Android) or Accelerate.framework (iOS). Both cause cryptic build/runtime errors covered above.

Why this matters for mobile developers

The NPU in modern devices Snapdragon 8 Elite, Exynos 2400, Apple’s Neural Engine is purpose-built for AI inference. But most apps never touch it, because the tooling to target it is genuinely hard.

Melange makes the NPU accessible to any developer, not just ML engineers with hardware-optimization experience. You bring the model. Melange handles the hardware.

If you’re building anything with vision, speech, text, or any AI feature and you want it fast, private, and server-free it’s worth an afternoon to try.

Resources

Summary

The problem: Training an AI model is easy. Getting it to run fast on real phones — across Snapdragon, Exynos, Tensor, and Apple chips — takes weeks of manual quantization and hardware-specific tuning.

The solution: Melange (by ZETIC, ex-Qualcomm engineers) is a deployment platform. You upload your model, it auto-optimizes for every NPU, and hands you a ready-to-use SDK.

The workflow — 3 steps:

  1. Upload your model (.onnx or .pt2) to the Melange dashboard
  2. Benchmark it automatically across 200+ real devices
  3. Deploy with ~3 lines of SDK code

Real proof: At LA Hacks 2026, the SPECTRA team took a custom PyTorch depth-estimation model from “too slow for phones” to 26x faster on NPU — in just 16 minutes.

Platform support:

  • Android (Kotlin/Java) & iOS (Swift) — production-ready, full code in this guide
  • Flutter & React Native — official LLM starter templates available

The integration pattern is the same everywhere: load model → prepare inputs → run.

Watch out for: test on real devices only (emulators have no NPU), match input tensor shapes exactly, load models off the UI thread, and don’t forget useLegacyPackaging (Android) / Accelerate.framework (iOS).

Bottom line: Melange isn’t a replacement for LiteRT — it’s the layer on top that removes the deployment pain, so you focus on building the app, not fighting the hardware.


Level Up Your Mobile Developer Interview !

Mastering AI for Android Developers

Your complete hands-on guide to integrating AI into Android apps — covering Generative AI, LLMs, on-device intelligence, AI APIs, real-world use cases, and practical implementation with modern Android development.
👉 Grab your copy now:
https://medium.com/@anandgaur2207/mastering-ai-for-android-developers-5cc6d62e7d21

Cracking the Mobile System Design Interview Book

Your complete practical guide to mastering Mobile System Design Interviews — covering scalable architecture, Android & iOS system design concepts, high-level design strategies, low-level design patterns, performance optimization, offline-first architecture, real-world case.
👉 Grab your copy now:
https://medium.com/@anandgaur2207/cracking-the-mobile-system-design-interview-book-8ff043db0359

Crack Android Interviews Like a Pro

Your complete Android interview preparation book — packed with real questions, deep explanations, and practical insights to help you stand out.
👉 Grab your copy now:
https://medium.com/@anandgaur2207/crack-android-interviews-with-confidence-the-only-handbook-youll-need-b87ec525f19c

iOS Developer Interview Handbook

From Swift fundamentals to advanced iOS concepts — a complete handbook to help you prepare smartly and confidently.
👉 Explore the book:
https://medium.com/@anandgaur2207/crack-ios-developer-interviews-with-confidence-the-complete-ios-developer-handbook-f1eabc3d7a21

Flutter Developer Interview Handbook

Ace your next Flutter interview with scenario-based questions, detailed explanations, and hands-on examples that make you stand out.
👉 Explore the book:
https://medium.com/@anandgaur2207/crack-flutter-developer-interviews-with-confidence-the-complete-flutter-developer-interview-6cb53996832c

React Native Developer Interview Handbook

Crack your next React Native interview with confidence!
This guide is packed with scenario-based questions, detailed explanations, and hands-on examples to help you stand out and succeed.
👉 Explore the book:
https://medium.com/@anandgaur2207/react-native-interview-crack-your-next-interview-with-confidence-0d7255a20fe1

Need 1:1 Career Guidance or Mentorship?

If you’re looking for personalized guidance, interview preparation help, or just want to talk about your career path in mobile development — you can book a 1:1 session with me on Topmate.

🔗 Book a session here

I’ve helped many developers grow in their careers, switch jobs, and gain clarity with focused mentorship. Looking forward to helping you too!

Found this helpful? Don’t forgot to clap 👏 and follow me for more such useful articles about Android development and Kotlin or buy us a coffee here

If you need any help related to Mobile app development. I’m always happy to help you.

Follow me on:

LinkedIn, Github, Instagram , YouTube & WhatsApp

#AI#Android#Ondevice#Kotlin