Draw Attention - translate your message into a remarkable drawing

Anshu Raj

5th Jan, 2026

When Language Meets Visual Design Can Large Language Models Improve Automated Image Composition

This paper explores how Large Language Models can improve automated image composition by applying visual design principles such as balance, hierarchy, and spatial organization. It discusses how language-driven reasoning can guide AI image generation systems to create more structured and aesthetically coherent designs. The study also highlights challenges, evaluation methods, and future research directions for multimodal design-aware AI systems.

Download Paper

Tag Large Language Models Multimodal AI Visual Design Automated Image Composition

Anshu Raj

5th Jan, 2026

Can Large Language Models Understand Visual Aesthetics for Intelligent Image Editing

This paper explores how Large Language Models can support intelligent image editing through aesthetic reasoning and multimodal understanding. It focuses on how AI systems can analyze composition, lighting, color harmony, and visual balance to generate structured editing recommendations. The study proposes a language-guided editing framework that improves image quality while aligning more closely with human aesthetic preferences and creative design principles.

Download Paper

Tag Large Language Models Multimodal AI Visual Design AI Image Editing

Anshu Raj

5th Jan, 2026

A Structured Three-Stage Pipeline for Compositional Text-to-Image Generation with Editable Layouts and Object-Wise Attention

This paper presents a structured three-stage pipeline for controllable text-to-image generation using language understanding, editable layouts, and object-wise attention control. The framework improves compositional grounding by separating prompt parsing, layout planning, and image synthesis into independent stages. Experimental results show stronger object accuracy, spatial consistency, and attribute fidelity compared to existing diffusion-based generation methods, while maintaining high visual quality.

Download Paper

Tag Multimodal AI Text-to-Image Generation Diffusion Models Structured Layout Modeling

Anshu Raj

5th Jan, 2026

DrawBench: A Benchmark for High-Level Intent Multi-Format Creative Outputs

This paper introduces DrawBench, a benchmark framework designed to evaluate how generative AI systems handle real-world design workflows across raster images, vector graphics, and editable infographic formats. The study focuses on measuring creative intent understanding, layout precision, structural consistency, and multi-step editing performance rather than only visual quality. Experimental analysis highlights the strengths and limitations of diffusion, vector-based, and instruction-tuned multimodal models in professional design-oriented tasks.

Download Paper

Tag Multimodal AI Visual Design Design Benchmarking Multi-Format AI

Drawify Publication

Get visualisation tips every week