This is an automated archive made by the Lemmit Bot.
The original was posted on /r/stablediffusion by /u/FennelFetish on 2024-10-25 21:52:23+00:00.
I’ve been working on a tool for creating image datasets.
Initially built as an image viewer with comparison and quick cropping functions, qapyq now includes a captioning interface and supports multi-modal models and LLMs for automated batch processing.
A key concept is storing multiple captions in intermediate .json files, which can then be combined and refined with your favourite LLM and custom prompt(s).
Features:
Tabbed image viewer
- Zoom/pan and fullscreen mode
- Gallery, Slideshow
- Crop, compare, take measurements
Manual and automated captioning/tagging
- Drag-and-drop interface and colored text highlighting
- Tag sorting and filtering rules
- Further refinement with LLMs
- GPU acceleration with CPU offload support
- On-the-fly NF4 and INT8 quantization
Supports JoyTag and WD for tagging.
InternVL2, MiniCPM, Molmo, Ovis, Qwen2-VL for automatic captioning.
And GGUF format for LLMs.
Download and further information are available on GitHub:
Given the importance of quality datasets in training, I hope this tool can assist creators of models, finetunes and LoRA.
Looking forward to your feedback! Do you have any good prompts to share?
Screenshots:
Overview of qapyq’s modular interface
Apply sorting and filtering rules
Edit quickly with drag-and-drop support
Batch caption with multiple prompts sent sequentially