Performer Identification

Stash Sense identifies performers in your scenes and images using face recognition against a database of 108,000+ performers sourced from multiple Stash-Box endpoints.

Prerequisites

Sprite sheets generated for scenes you want to identify (Stash: Settings > Tasks > Generate > Sprites)
Face recognition database downloaded (plugin Settings tab > Database > Update)
ONNX models downloaded (plugin Settings tab > Models > Download All)

Without sprite sheets, the sidecar has no frames to analyze. Without the database and models, there's nothing to match against.

How It Works

Click Identify Performers on any scene page in Stash
The sidecar extracts frames from the scene's sprite sheet (no video decoding required)
Faces are detected using RetinaFace and aligned via 5-point similarity transform
Each face is embedded using two models (FaceNet512 + ArcFace) with flip-averaging for stability
Embeddings are searched against Voyager vector indices containing 366,000+ face references
Results are clustered by person — the same performer appearing across multiple frames is grouped together
Matched performers are shown with confidence scores and one-click tagging

Matching Modes

Clustered frequency matching (default) — Groups detected faces by person using cosine distance, then frequency-matches within each cluster. Multi-frame appearances boost confidence — a performer appearing in 30 out of 60 frames is a much stronger signal than a single-frame detection.

Tagged-performer boost — Performers already tagged on the scene receive a small distance bonus (+0.03), reducing false negatives for known cast members.

Gallery and Image Identification

Face recognition extends to gallery images, which are typically higher quality and better-framed than video frames.

Single image or full gallery identification
Results grouped by performer across images with best/average distance
Fingerprint caching avoids re-processing on subsequent requests

Additional Signals

Face recognition alone is the primary signal. Two optional signals can improve results when faces are unclear or absent:

Body proportions — MediaPipe pose estimation extracts shoulder-hip ratio, leg-torso ratio, and arm-span-height ratio. Mismatches apply a penalty multiplier. Enabled via Settings.
Tattoo presence — YOLO-based detection. If the query shows tattoos but a candidate has none, a penalty is applied. Matching tattoo locations give a small boost. Requires the optional tattoo detection models.

Fusion: final_score = face_score × body_multiplier × tattoo_multiplier. Missing signals are neutral (1.0 multiplier).

Performance

On a GTX 1080 (8GB VRAM), a typical scene (60 frames) processes in ~5 seconds. The 3-phase batch pipeline (extract → detect → embed+match) processes all faces in bulk rather than one at a time.

CPU mode works but is significantly slower (~2-3 seconds per frame for face detection vs ~200ms with GPU).

Understanding Results

Each detected person shows:

Face thumbnails — Cropped faces from the scene frames
Best match — Top performer match from the database
Distance score — Lower is better (cosine distance, 0.0 = perfect match)
Appearances — How many frames this person appeared in

Results are split into two groups:

Multi-frame detections — Performers appearing in multiple frames (shown prominently)
Single-frame detections — One-off detections (in a collapsible section, less reliable)

Result Actions

Button	Action
Add to Scene	Links the performer to the scene
Already tagged on scene	Shown instead of "Add to Scene" for performers already tagged
View on Stash-Box	Opens the performer's page on the source Stash-Box endpoint