Google DeepMind Engineer Generates Isometric Pixel-Art NYC Map Using Qwen

Senior Staff Engineer Andy Coenen generated a massive, detailed isometric pixel-art map of Manhattan. By fine-tuning Qwen-Image-Edit on 40 custom image pairs and running 50 GPU instances, he processed 40,000 tiles in hours.
Impact: Medium
Why it matters
Developers can now achieve production-grade asset generation at scale by combining open-source vision models with very small, high-quality fine-tuning datasets.
TL;DR
- 01A Google DeepMind engineer built a full pixel-art map of Manhattan for very low GPU cost.
- 02Fine-tuning Qwen-Image-Edit required only 40 high-quality training pairs.
- 03The project combined the Google Maps 3D tiles API with custom image processing at scale.
Key facts
- Fine-tuning dataset size
- 40 hand-paired examples
- Estimated tiles generated
- 40,000
- Parallel GPU instances
- 50
- Base model
- Qwen-Image-Edit
The Pipeline: From Satellite to Pixels
Andy Coenen extracted NYC's real-world geometry using the Google Maps 3D tiles API. This raw spatial data was sliced into individual tiles. To translate these photographic, perspective-heavy blocks into clean, classic isometric art, Coenen utilized Qwen-Image-Edit, an open-source image modification model.
Ultra-Lean Fine-Tuning
Instead of training a model from scratch or compiling thousands of images, Coenen hand-crafted exactly 40 pairs of training data showing "satellite tile → pixel art tile". This remarkably small dataset was enough to teach the model the targeted visual style, demonstrating the massive efficiency of modern instruction-based image edit models.
Processing at Scale
To render the entire metropolis, which requires approximately 40,000 distinct tiles, Coenen bypassed consumer hardware limitations by renting 50 parallel GPU instances. The entire rendering run took only a few hours and cost a trivial sum, generating detailed structures ranging from Midtown skyscrapers to specific corporate signage.
Try it in 2 minutes
# Conceptual dataset format for Qwen fine-tuning
dataset = [
{
"image": "satellite_tile_1.png",
"prompt": "convert to isometric pixel art style",
"output": "pixel_tile_1.png"
}
]python
✓ When to use
- When translating real-world spatial or photographic data into stylized gaming assets.
- When fine-tuning open-source vision-language models with very limited training pairs.
✕ When NOT to use
- When real-time, interactive generation is needed inside the client browser.
- When pixel-perfect structural accuracy is required rather than general visual styles.
What to do today
- Explore Qwen-Image-Edit for customized image translation pipelines.
- Use 3D tiles APIs to extract real-world geometry for mockups or game maps.
Sources