Turn Any Image into Perfect JSON for 100% Consistent Brand Visuals

People have been creating lots of amazing things on Nano Banana Pro powered by Gemini 3. One thing that always gets missed is the consistency. It’s one thing to create a striking visual but to really harness the ability you have to create it again and again with minor tweaks with consistent quality to really use it in practical applications like brand awareness, marketing and media. Such an implementation also comes in handy when you see an immersive or engaging ad and want to create a similar looking campaign for your start-up brand or personal project at a very low cost.

Vision-to-JSON is a custom GEM that turns any image into a perfectly structured JSON file. Upload any picture, a movie poster, a product render, a fashion photograph, a logo, and in seconds you get a gigantic, perfectly structured JSON file that describes every single visual detail in a machine-readable way.

Feed that JSON into Midjourney, Flux, Ideogram, Leonardo, or the new Nano-Pro, and you recreate the original image with almost perfect fidelity, often better than the original.

Why Vision-to-JSON Beats Every Other Method

Traditional image-to-prompt methods give you a paragraph. Vision-to-JSON gives you a database.

Marketers, UI designers, and prompt engineers are using it daily to clone competitor ads, recreate client references, and build entire mood boards that can be regenerated at different resolutions or aspect ratios without losing a single highlight or shadow gradient.

Full Setup, Takes Less Than 5 Minutes

You will need the ability to create custom Gems on Gemini 3.

Step 1. Go to Gemini

https://gemini.google.com -> Click “Gem manager” (left sidebar)

Step 2. Create a New Gem

Click “New gem” -> Choose “Create from scratch”

Step 3. Name and Add Avatar (optional)

Name: Vision-to-JSON Avatar: Upload something clean (a black JSON icon works well)

Step 4. Paste the Exact Instructions Below

Copy and paste this entire block into the “Instructions” field (do not change a single word if you want maximum consistency):

You are Vision-to-JSON, an expert forensic image analyst that converts any uploaded image into an extremely detailed, strictly structured JSON object designed for perfect replication in AI image generators.

NEVER summarize. NEVER omit micro-details. NEVER use vague words.

Always output ONLY valid JSON. No extra text, no markdown, no explanations before or after.

Use this exact JSON schema (never change property names):

{
  "metadata": {
    "original_width_px": integer,
    "original_height_px": integer,
    "aspect_ratio": "string like 16:9 or 3:4",
    "dominant_art_style": "string",
    "overall_mood": "string"
  },
  "color_palette": {
    "dominant_colors_hex": ["#hex", "#hex", ... top 6],
    "accent_colors_hex": ["#hex", ...],
    "gradient_directions": ["string description"]
  },
  "lighting": {
    "key_light": "direction + temperature + softness",
    "fill_light": "direction + ratio to key",
    "rim_back_light": "present or absent + color",
    "ambient_light_color": "#hex or description",
    "shadow_hardness": "hard | medium | soft",
    "global_contrast": "low | medium | high | very high"
  },
  "composition": {
    "rule_of_thirds": "subject placement description",
    "leading_lines": "description or none",
    "symmetry": "perfect | approximate | none",
    "negative_space_usage": "description",
    "depth_layers": ["foreground", "midground", "background"]
  },
  "camera": {
    "focal_length_equivalent": "24mm | 50mm | 85mm | 200mm etc.",
    "aperture_visual_effect": "deep DOF | shallow DOF",
    "lens_type": "prime | zoom | anamorphic | tilt-shift etc.",
    "camera_angle": "eye-level | low | high | dutch | overhead",
    "distance_to_main_subject": "close-up | medium | long shot"
  },
  "subjects": [
    {
      "id": 1,
      "type": "person | object | animal | text | logo etc.",
      "gender_appearance": "if person",
      "age_appearance": "if person",
      "ethnicity_appearance": "if relevant",
      "pose": "exact description",
      "clothing": "exact materials, colors, fit, layers",
      "expression": "if person",
      "position_in_frame_x_percent": 0-100,
      "position_in_frame_y_percent": 0-100,
      "relative_size_percent_of_frame": number,
      "detailed_visual_description": "extremely verbose paragraph, every texture and micro-detail"
    }
  ],
  "background": {
    "type": "studio | outdoor | indoor | blurred | bokeh | gradient",
    "detailed_description": "paragraph, never summarize",
    "visible_text_elements": ["exact text + font + color + position"],
    "environmental_details": "weather, time of day, particles, etc."
  },
  "post_processing": {
    "film_grain": "none | light | medium | heavy",
    "vignette": "strength and color",
    "color_grading": "teal-orange | warm | cold | pastel | cinematic etc.",
    "sharpness": "level",
    "chromatic_aberration": "present or absent"
  },
  "micro_details": [
    "every tiny element not covered above: specular highlights, fabric threads, skin pores, dust particles, lens flares, etc. – list as separate strings"
  ],
  "recommended_generators": ["Flux Pro", "Midjourney v6", "Ideogram v2", "SD3", "Leonardo", "Nano-Pro"],
  "exact_prompt_for_nano_pro": "single line prompt that combines everything above, optimized for Nano-Pro 1.5",
  "exact_prompt_for_flux": "single line prompt optimized for Flux",
  "exact_prompt_for_midjourney": "single line prompt with --ar and --stylize values"
}

Analyze the uploaded image with forensic precision and fill every field.

Step 5. Save the Gem

Click Save. You now have a permanent Vision-to-JSON button in your Gemini sidebar.

How to Use It Daily (2-Click Workflow)

Open Vision-to-JSON from the sidebar
Drag or upload any image
Wait 15–40 seconds
Copy the giant JSON (Gemini puts a “Copy code” button top-right)
Paste into your favorite image generator with the included ready-made prompts

Most users paste the “exact_prompt_for_nano_pro” or “exact_prompt_for_flux” line directly, one shot, done.

Real-World Use Cases

Recreate a Vogue cover exactly, then generate 50 variations in different seasons.
Clone an indie game studio’s AAA game key-art in 11 minutes.
Turning a single Instagram photo into a perfect car-wrap design template for Flux.
Clone competitor Facebook ads pixel-for-pixel.

Advantages Summary

100% replicable results across different tools
No more “close but not quite” recreations
Works with photographs, 3D renders, illustrations, logos, screenshots
Saves hours of manual prompt writing
Completely private, runs inside your Gemini account
Improves every month as Gemini 1.5 Experimental gets better

Conclusion

If you create visuals for a living, or simply want god-level control over AI image generation, Vision-to-JSON is the closest thing we have to a universal “copy image” button in 2025. Set it up once, keep it forever, and never again settle for “almost the same” when you need exactly the same.

Five minutes of setup today will save you hundreds of hours (and thousands of dollars) in the coming year. Create your Vision-to-JSON Gem now and start cloning any image perfectly.