Skip to main content

Video Labeling

Video labeling is the process of annotating video clips stored in an index so they can be used as training data. Where image labeling annotates a single still frame, video labeling annotates a short clip frame by frame, letting the model learn from motion and context over time.

This is an index workflow. Video fragments live alongside image observations in your indexes and are included automatically when you run a training session.

How it fits into training

When you train a model that supports video, the training runs in two stages. The first stage uses image observations. The second stage uses video fragments together with a mix of images. You do not need to configure this — Vidsy handles it automatically as long as you have labeled video observations in your index.

Labels are frame-precise. An annotation is only applied to the frames where it actually exists. Frames with no annotation produce negative training examples, teaching the model what the absence of a detection looks like.

Capturing a video fragment

A video fragment is a short MP4 clip saved from a recording into an index. It becomes a video observation that you can annotate.

To capture one, play or scrub a recording to the moment you want, then take a snapshot. In the snapshot dialog, check As video and set the number of analysis frames to determine how long the clip will be. Vidsy extracts the clip centred on your current position and saves it into the active index. Any detections already present at that point in the recording are copied into the fragment with adjusted timestamps.

Once a video fragment appears in your search results, open it and use the video controls to play, pause, change playback speed, return to the fragment start, or open the snapshot dialog.

For precise labeling, use Left Arrow and Right Arrow to jump one analysis frame backward or forward. Arrow key navigation uses the analysis frame interval, typically about 150 ms, which matches the resolution the model uses during analysis. This makes it easy to step through a clip without landing on frames that the model will never see.

For the full shortcut list, including play/pause, snapshot, and frame copy/paste, see Video Controls.

Annotating frames

With the video paused at a frame, draw ROI boxes or place keypoints the same way you would on an image. Annotations are stored with the exact timestamp of the frame, so they are linked to that precise moment in the clip.

You only need to annotate the frames that matter. Frames without annotations are still included in training as negative examples — you do not have to explicitly mark them as empty.

Copying annotations across frames

When the same object appears across several consecutive frames with little movement, copying saves significant time.

  1. Pause the video at a well-annotated frame.
  2. Press Shift+Ctrl+C to copy all boxes and keypoints at that frame.
  3. Use or to move to the next frame.
  4. Press Shift+Ctrl+V to paste the annotations.

Paste is smart about it: if the observation does not yet cover the target frame's timestamp, its time bounds are extended automatically to include it. If no observation exists at that position, a new one is created.

The clipboard is scoped to the current video — you cannot paste into a different clip. All paste operations can be undone.

You can also access copy and paste from the right-click context menu on the video card, under Copy special → Frame data and Paste frame.

Reviewing and managing video observations

Video observations appear in search results alongside image observations. Right-clicking one gives you access to the same context menu actions as images, plus the frame copy and paste options.

You can mark a video observation as training or validation from the context menu to control which split it ends up in. When you do not mark observations manually, Vidsy assigns them automatically when training starts, reserving at least one video per category for validation.

Tips

Capture clips around interesting events. The most useful training data covers the moments your model needs to learn — defects, transitions, edge cases. Capture fragments at those moments rather than uniformly across a recording.

Mix with image observations. Video and image observations work together. A model needs both for proper training.

Use Learn & Complete after labeling a few clips. Once you have annotated a handful of frames, run Learn on that selection to refine the model, then use Complete to help annotate the rest.

For model setup and training settings, see Model Configuration.