# Introduction
For a long time, running Transformer models meant maintaining a Python server, paying for GPU time, and routing each inference request through an API. The user typed something, it left their machine, touched your infrastructure, and came back as a prediction. That architecture made sense when models were too large to run elsewhere. This is no longer the only option.
transformer.js Changes the equation. This state-of-the-art NLP model runs directly in the browser, on the user’s device, with no servers involved. Models are downloaded once, cached locally, and run offline from that point forward. The Python-to-JavaScript translation is almost one-to-one:
// JavaScript -- nearly identical
import { pipeline } from '@huggingface/transformers';
const classifier = await pipeline('sentiment-analysis');
const result = await classifier('I love transformers!');
This tutorial covers three NLP tasks: text classification, zero-shot labeling, and question answering using Transformers.js. pipeline() API. For each task, you’ll see how to initialize the pipeline, what the output structure looks like and how to interpret it, and a working HTML example that you can open directly in the browser. The tutorial concludes with a full support ticket routing application that combines all three pipelines into one practical tool.
Every code example in this article uses the CDN import path, so no build step is necessary. Open a text editor, paste the code and run it.
# What exactly is Transformers.js?
The library is designed Functionally equivalent to Hugging Face’s Python Transformer libraryWhich means same pre-trained models, same function names and same pipeline API in JavaScript. Under the hood, is the bridge that makes it possible ONNX Runtime.
Trained models are converted in PyTorch, TensorFlow, or JAX onnx format using the Hugging Face Optimum. The ONNX runtime then executes these models in the browser. By default, it runs on the CPU via WebAssembly (WASM), which works in every modern browser. If you want GPU acceleration, set device: 'webgpu' Meaningfully speeds up calculations through the browser’s WebGPU API where available, although still experimental in some environments.
- model caching. The first time a pipeline is run, the model weights are downloaded. Hugging Face Hub And the cache in the browser is IndexedDB in the browser context, the file system in Node.js. Developer test shows sentiment analysis pipeline Approximately 111 MB is downloaded on first load. Subsequent runs skip the download entirely and load from the cache. This means that the first user session incurs a bandwidth cost; Every subsequent session is fast and offline-enabled
- Quantification.
dtypeThe option controls model precision.q8(8-bit quantization) WASM is the default; This gives you a good balance of size and accuracy.q4Cuts the file by about half with a 1-3% accuracy loss on most tasks, which is the perfect compromise for mobile or slow connections. Node.js for server-side use,fp32Gives absolute precision without any size constraints
// Default WASM execution -- works everywhere
const pipe = await pipeline('sentiment-analysis');
// WebGPU for faster inference on compatible hardware
const pipe = await pipeline('sentiment-analysis', null, { device: 'webgpu' });
// 4-bit quantization for smaller model downloads
const pipe = await pipeline('sentiment-analysis',
'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
{ dtype: 'q4' }
);
# Pipeline() API
line pipe The function is the entire public interface for most use cases. It bundles three things: a pre-trained model, a tokenizer, and postprocessing logic, into a single callable object. You don’t touch the tokenizer or model weights directly. You call the pipeline with text and get structured output back.
The signature has three parts:
const pipe = await pipeline(task, model?, options?);
const result = await pipe(input, inferenceOptions?);
task A string identifier that tells the library what type of model to load and how to handle input and output. model is optional; If you omit it, the library loads the default model for that task. If you specify a model ID (like ‘Xenova/distilbert-base-uncased-finetuned-sst-2-english‘), that model is loaded from the hub. options is the place where you set device, dtypeAnd progress_callback.
Both steps are async. pipeline() Downloads and loads the model into memory. This is the slow part the first time you run it. The pipe call is usually faster after the model is loaded. Both return promises, which means your UI needs to handle the loading state.
A progress_callbackLets you track downloads and show progress to the user:
// progress_callback fires during model download with status updates
// This is important UX -- users need to know something is happening
const pipe = await pipeline(
'sentiment-analysis',
'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
{
dtype: 'q8',
progress_callback: (progress) => {
// progress.status can be: 'initiate', 'download', 'progress', 'done'
if (progress.status === 'progress') {
const pct = Math.round(progress.progress);
document.getElementById('progress').textContent =
`Loading model: ${pct}%`;
}
if (progress.status === 'ready') {
document.getElementById('progress').textContent="Model ready";
}
}
}
);
An important comment from official document:Transformers.js is a guess-only library. You can’t refine or train models with it. If your task requires a custom model, training happens elsewhere (Python, cloud), and the resulting ONNX export runs in the browser.
# Task 1: Text Classification
Text classification provides a label and a confidence score for the input text. The most common form is sentiment analysis, positive versus negative, but the same pipeline architecture handles any fixed set of categories on which the model was trained.
What the output looks like:
const result = await classifier('This product completely exceeded my expectations.');
// ({ label: 'POSITIVE', score: 0.9997 })
The output is an array of objects. is in everything label (class projected as a string) and score (A float between 0 and 1 represents the confidence of the model). A score of 0.9997 means that the model is highly confident. A score of 0.52 means it is barely above the decision threshold, consider it indeterminate and handle it according to your application logic.
The output is always an array, even for a single input, because the same pipeline calls handle batches:
const results = await classifier((
'This is great!',
'Completely broken, waste of money.'
));
// (
// { label: 'POSITIVE', score: 0.9998 },
// { label: 'NEGATIVE', score: 0.9991 }
// )
// full working example
The example below is a complete, self-contained HTML file. Open it in any modern browser. The model downloads on the first run and caches subsequent loads, which are immediate.
Text Classification with Transformers.js
Runs entirely in your browser -- no server, no API calls.
Downloading model on first run (this may take a moment)...
loadModel function call pipeline() With job name, model ID and options. progress_callback Activates frequently during download and updates the status text so that the user does not see a frozen screen. Once the model is loaded, the button becomes enabled. When the user clicks on Classified, classifier(text) Runs estimates synchronously from cache, typically less than 200ms on modern laptops. results destruction label And score From the first array element, formats the confidence as a percentage, and applies a CSS class for color coding.
# Task 2: Zero-shot classification
Zero-shot classification does something that regular text classification can’t: it classifies text into categories that you define at runtime, without needing any training data. You pass a list of text and labels in plain English. The model decides which label is most appropriate based on its understanding of the semantics of the language.
This is useful any time you don’t want or don’t want to train a model on labeled examples, which happens most of the time in real projects.
// How it works under the hood
The model recasts each candidate label as a natural language inference (NLI) hypothesis. for label “billing issue“, it generates hypothesis”This text is about a billing problem” and calculates the probability that the hypothesis is covered by the input text. The label with the highest entailment score wins. It NLI-based approach That’s why you can use any descriptive English phrase as a label and get a meaningful result. The model understands the meaning of your labels, not just their surface appearance.
What the output looks like:
const classifier = await pipeline('zero-shot-classification',
'Xenova/bart-large-mnli');
const result = await classifier(
'My invoice is wrong and I was charged twice.',
('billing', 'technical support', 'shipping', 'returns', 'account access')
);
// {
// sequence: 'My invoice is wrong and I was charged twice.',
// labels: ('billing', 'returns', 'account access', 'technical support', 'shipping'),
// scores: (0.871, 0.063, 0.031, 0.022, 0.013)
// }
The output is an object with three fields. sequenceThe original input is text. labelsThis is a series of your candidate labels, ordered from highest to lowest score. scoresThere is a series of confidence scores in that order. The first element of both arrays is always the winning prediction. When the sum of scores across all labels is approximately 1 multi_labelis false (default).
setting multi_label: true Behavior changes: Each label scores independently rather than competing, so multiple labels can all get a high score together. Use this when the text potentially belongs to multiple categories at once.
// full working example
Here is your updated script block with all HTML brackets completely escaped. You can paste this directly into your custom HTML block in WordPress, and it will render perfectly as a code snippet.
Zero-Shot Classifier -- Support Ticket Router
Paste a support ticket. The model routes it to the right department
with no training data needed.
Downloading model on first run...