SGLang Diffusion Cookbook

Create a comprehensive cookbook for diffusion models in SGLang, demonstrating SGLang's performance advantages for image and video generation workloads.

🎯 What You'll Find Here

This cookbook aggregates battle-tested SGLang recipes covering:

Models: Mainstream Image and Video generation Models
Use Cases: Inference serving, deployment strategies
Hardware: GPU and CPU configurations, optimization for different accelerators
Best Practices: Configuration templates, performance tuning, troubleshooting guides

Each recipe provides step-by-step instructions to help you quickly implement SGLang solutions for your specific requirements.

🚀 Quick Start

Browse the recipe index above to find your model
Follow the step-by-step instructions in each guide
Adapt configurations to your specific hardware and requirements
Join our community to share feedback and improvements

The sglang diffusion cookbook directory structure are shown below:

sgl-cookbook/docs/diffusion/
├── README.md              # Main cookbook (this file)
├── Qwen-Image/            # Qwen-Image series models docs
│   ├── Qwen-Image.md
│   └── Qwen-Image-Edit.md
├── Wan/                   # Wan series models docs
│   ├── Wan2.1.md
│   └── Wan2.2.md
├── Z-Image/               # Z-Image series models docs
│   └── Z-Image-Turbo.md
└── ...

🤝 Contributing

We believe the best documentation comes from practitioners. Whether you've optimized SGLang for a specific model, solved a tricky deployment challenge, or discovered performance improvements, we encourage you to contribute your recipes!

💪How to Contribute

Comment below if interested (mention which role)
Join discussion on implementation details
Fork repo and work on assigned section
Submit PR following SGLang cookbook standards
Iterate based on review feedback

To contribute:

# Fork the repo and clone locally
git clone https://github.com/YOUR_USERNAME/sglang-cookbook.git
cd sglang-cookbook

# Create a new branch
git checkout -b add-my-recipe

# Add your recipe following the template in DeepSeek-V3.2
# Submit a PR!

Tips for Best Practices

If you have sufficient VRAM, consider disabling cpu offload options to get better result. You can check the console output to determine which components can safely remain resident:

Peak GPU memory: 52.51 GB, Remaining GPU memory at peak: 27.14 GB. Components that can stay resident: [text_encoder]

--dit-layerwise-offload is enabled for video models by default, but it doesn't always improve performance. Feel free to adjust this option as needed.

📖 Resources

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Let's build this resource together! 🚀 Star the repo and contribute your recipes to help the SGLang community grow.

🎯 What You'll Find Here​

🚀 Quick Start​

🤝 Contributing​

Tips for Best Practices​

📖 Resources​

📄 License​