ByteDance launches Bagel, claiming a new AI tool to master in image generation and editing
Bagel: ByteDance claims that Bagel does better image editing than other existing open-source VLMs. It can easily do things like adding emotions to the image, removing, changing or adding an element, style transfer, free-form editing, i.e. making changes without any limited framework.

China's tech giant ByteDance has just introduced its new multimodal artificial intelligence (AI) model, Bagel. It is a visual language model (VLM), not just able to comprehend images, but also to create and even edit them. The biggest news is that the firm has opened it up and it is now available to be downloaded from popular AI platforms like GitHub and Hugging Face.
Features of Bagel:
- Multimodal input: Capable of understanding and processing both text and images simultaneously.
- 14 billion parameters: 7 billion of which are active at a time.
- Interleaved training data: Text and images are trained together, allowing Bagel to make better connections between the two.
ByteDance claims that Bagel performs better image editing than other existing open-source VLMs. It can easily do tasks like adding emotions to an image, removing, replacing or adding an element, style transfer, free-form editing, i.e. making changes without any limited framework.
Bagel has been trained in such a way that it can understand the world in visual form - such as the relationship between objects, the effect of natural factors like light or gravity, etc. ByteDance says that in their internal tests, Bagel has outperformed Qwen2.5-VL-7B (better at understanding images), Janus-Pro-7B and Flux-1-dev (better at image generation), Gemini-2-exp (better at image editing in GEdit-Bench test) AI models.
For Latest News update Subscribe to Sangri Today's Broadcast channels on Google News | Telegram | WhatsApp