Upwork

Local LLM Expert: Fine-Tuning & Training with Gemini Distillation

Upwork·Remote

AI Stack

Tools this role uses

GEGeminiGenerate synthetic VTO training dataoccasional
SDStable DiffusionFine-tune open-source VTO modeloccasional
S2SAM 2Image segmentation in VTO pipelineoccasional

Workflow Map

AI-assisted vs. human-led

AI-assisted

Generate synthetic training datasetGemini
Fine-tune diffusion model on VTO taskStable Diffusion
Image segmentation for garment maskingSAM 2

Human-led

TensorRT and quantization optimization
Dockerized API deployment setup
Benchmark latency and model evaluation
CUDA and Flash Attention 3 tuning
Model architecture design and selection

About the role


We are looking for an expert in Generative AI and Computer Vision to build a high performance, local Virtual Try On pipeline. Our goal is to achieve visual quality comparable to Gemini 3 Pro virtual try on 001, but with a local inference speed of under 5 seconds on an NVIDIA GPU Droplet A100 or H100.

The Challenge

Proprietary models like Gemini Pro are too slow for our production needs. You will be responsible for teaching a faster, open weight student model such as IDM VTON, CatVTON, or OOTDiffusion to mimic Gemini’s high fidelity fabric draping, shadow alignment, and texture preservation.

Key Responsibilities

Synthetic Data Pipeline

Build a script to generate a training dataset using Google Gemini or Imagen VTO as the teacher model.

Model Fine Tuning

Fine tune an open source diffusion model such as IDM VTON or CatVTON using the Gemini generated dataset to capture pro tier garment realism.

Inference Optimization

Implement TensorRT and FP8 or INT8 quantization to ensure the model runs in under 5 seconds on our GPU instance.

Deployment

Containerize the solution using Docker with a vLLM or SGLang style backend for low latency API access.

Required Skills

Deep Learning

Expert level PyTorch and experience with diffusion models such as Stable Diffusion and ControlNet.

Computer Vision

Proven experience with Virtual Try On pipelines, image segmentation such as SAM 2, and human pose estimation.

Optimization

Deep knowledge of NVIDIA TensorRT, Flash Attention 3, and CUDA optimization.

API Integration

Experience with Google Vertex AI or Gemini API for synthetic data generation.

Deliverables

A fine tuned local VTO model including weights and configuration.

Benchmark report showing under 5 second end to end latency.

Dockerized API for easy deployment on our GPU Droplet.

Interested in this role?

Apply on Upwork website →

Listed on UpgradedJobs · Source: scraped

Similar roles

Hiring someone who uses AI daily?

Post your role and reach candidates with real AI skills.

Post a job →
Apply now →