Skip to content

Documentation: Gemini Action (AI Analysis)

Overview

The Gemini Action is an automation node that lets you use Google Gemini AI models to process and analyze different types of content. This node can generate text, analyze images, process audio, and analyze video using Gemini’s advanced models.

When to Use This Action

Use this action when you need to: - Analyze images automatically (detect objects, read text, identify scenes) - Generate images from text descriptions - Analyze videos to extract information or detect events - Process text with advanced AI capabilities - Process audio with recognition and analysis - Integrate AI capabilities into your automations

Node Configuration

Node configuration is split into two main sections that you can switch between using the selector at the top:

Section 1: Basic Configuration

Step 1: Configure API Key

  1. In the "API Key" field, enter your Google Gemini API key
  2. This key is required to authenticate with Gemini services
  3. Important: Keep your API key secure and do not share it

How to get an API Key: - Go to Google AI Studio - Sign in with your Google account - Create a new API key - Copy and paste the key into this field

Step 2: Select Resource Type

  1. In the "Resource Type" field, select the type of content you want to process:
  2. Text: For text processing
  3. Image: For image analysis or generation
  4. Audio: For audio processing
  5. Video: For video analysis

Note: The selected resource type determines which operations are available and which additional fields appear.

Step 3: Select Model

  1. In the "Model" field, select the Gemini model you want to use:

Available models: - Gemini 2.5 Pro: Most capable model, ideal for complex tasks - Gemini 2.5 Flash: Fast and efficient, ideal for most use cases - Gemini 2.5 Flash Image: Optimized for image processing - Gemini 2.5 Flash Lite: Lighter, faster, and more economical - Gemini 2.0 Flash: Previous version of the Flash model - Gemini 2.0 Flash Lite: Previous lightweight version

Recommendations: - For image analysis: Use Gemini 2.5 Flash Image or Gemini 2.5 Flash - For image generation: Use Gemini 2.5 Flash - For video analysis: Use Gemini 2.5 Pro or Gemini 2.5 Flash - For text processing: Use Gemini 2.5 Flash (faster) or Gemini 2.5 Pro (more capable)

Section 2: Prompt Configuration

Step 4: Select Operation (Image and Video Only)

If you selected Image as the resource type:

  1. In the "Operation" field, select:
  2. Generate Image: To create images from descriptions
  3. Analyze Image: To analyze existing images

If you selected Video as the resource type:

  1. In the "Operation" field, select:
  2. Analyze Video: To analyze videos

Note: For Text and Audio resources, this field does not appear since only one operation is available.

Step 5: Configure Image URLs (Image Analysis Only)

If you selected ImageAnalyze Image:

  1. In the "Image URLs" field, enter the URLs of the images you want to analyze
  2. Enter one URL per line
  3. Example:
    https://example.com/image1.jpg
    https://example.com/image2.png
    https://example.com/image3.jpg
    

Requirements: - URLs must be publicly accessible or reachable from Gemini’s servers - Supported formats: JPG, PNG, GIF, WebP - You can analyze multiple images in a single run

Step 6: Configure Video URLs (Video Analysis Only)

If you selected VideoAnalyze Video:

  1. In the "Video URLs" field, enter the URLs of the videos you want to analyze
  2. Enter one URL per line
  3. Example:
    https://example.com/video1.mp4
    https://example.com/video2.mov
    

Requirements: - URLs must be publicly accessible or reachable from Gemini’s servers - Supported formats: MP4, MOV, AVI, WebM - You can analyze multiple videos in a single run

Step 7: Write the Prompt

  1. In the "Prompt" field, write the instruction or question you want Gemini to process
  2. Prompt content varies by operation:

For Generate Image:

Describe the image you want in detail. Example:
"A Siamese cat sitting on a windowsill with an ocean view at sunset, realistic style, high quality"

For Analyze Image:

Ask specific questions about the image. Examples:
"What objects appear in this image?"
"Describe the scene in detail"
"Is there any visible text in the image? If so, what does it say?"
"How many people are in the image and what are they doing?"

For Analyze Video:

Specify what you want analyzed in the video. Examples:
"Describe the main actions that occur in the video"
"Are there people in the video? What are they doing?"
"Detect any unusual events or suspicious movement"
"Summarize the video content in 3–5 key points"

For Process Text:

Enter the text to process or your instructions. Examples:
"Summarize the following text: [your text here]"
"Translate to English: [your text here]"
"Extract keywords from: [your text here]"

For Process Audio:

Specify what you want to do with the audio. Examples:
"Transcribe this audio to text"
"Identify the spoken language"
"Summarize the audio content"

Tips for effective prompts: - Be specific and clear in your instructions - Include context when relevant - For analysis, ask concrete questions - For generation, provide specific visual details - You can use multiple instructions separated by periods or line breaks

Usage Examples

Example 1: Analyze Security Image

Basic configuration: - API Key: your-api-key-here - Resource Type: Image - Model: Gemini 2.5 Flash Image

Prompt configuration: - Operation: Analyze Image - Image URLs:

https://security.example.com/camera1/snapshot.jpg
- Prompt:
Analyze this security image. Are there any people visible?
If so, describe their position and activity.
Is there any suspicious object or unusual activity?

Use case: Triggers when motion is detected and automatically analyzes the camera image.

Example 2: Generate Image for Notification

Basic configuration: - API Key: your-api-key-here - Resource Type: Image - Model: Gemini 2.5 Flash

Prompt configuration: - Operation: Generate Image - Prompt:

A modern illustration of a home automation system,
showing connected devices, minimalist style, blue and white colors,
professional high quality

Use case: Generates a custom image for use in notifications or dashboards.

Example 3: Analyze Event Video

Basic configuration: - API Key: your-api-key-here - Resource Type: Video - Model: Gemini 2.5 Pro

Prompt configuration: - Operation: Analyze Video - Video URLs:

https://events.example.com/recordings/event-2024.mp4
- Prompt:
Analyze this event video. Summarize the main activities,
identify highlight moments, and estimate the approximate number of attendees.
Is there any moment that requires special attention?

Use case: Automatically analyzes event videos to produce reports.

Example 4: Process Document Text

Basic configuration: - API Key: your-api-key-here - Resource Type: Text - Model: Gemini 2.5 Flash

Prompt configuration: - Prompt:

Analyze the following text and extract:
1. The main ideas
2. Important keywords
3. A summary in 3 sentences

Text: {{context.document_text}}

Use case: Automatically processes documents and extracts key information.

Example 5: Transcribe Audio

Basic configuration: - API Key: your-api-key-here - Resource Type: Audio - Model: Gemini 2.5 Flash

Prompt configuration: - Prompt:

Transcribe this audio to text. If there are multiple speakers,
identify who says what. Include appropriate punctuation and formatting.

Use case: Converts audio recordings to text automatically.

Automation Workflow

Typical Structure

  1. Trigger (e.g., Object State Change Trigger) ↓
  2. Gemini Action (processes the content) ↓
  3. Result Action (uses the output, e.g., send notification)

Accessing Results

Gemini action results are available in the automation context and can be used by downstream nodes. Results typically include:

  • Image analysis: Textual description, detected objects, extracted text
  • Image generation: URL of the generated image
  • Video analysis: Summary, detected events, scene descriptions
  • Text processing: Processed text, summary, translation, etc.
  • Audio processing: Transcription, summary, extracted information

JSON Editor (Advanced)

If you are comfortable with technical configuration, you can edit the setup directly in JSON using the "JSON Editor" tab. The structure is:

{
  "api_key": "your-api-key-here",
  "resource": "image",
  "operation": "analyze",
  "model_id": "gemini-2.5-flash",
  "image_urls": [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg"
  ],
  "video_urls": [],
  "prompt": "Describe this image in detail"
}

Available fields: - api_key: Your Gemini API key (required) - resource: Resource type: "text", "image", "audio", "video" (required) - operation: Specific operation (required for image and video) - For image: "generate" or "analyze" - For video: "analyze_video" - model_id: ID of the model to use (required) - image_urls: Array of image URLs (for image analysis) - video_urls: Array of video URLs (for video analysis) - prompt: The prompt or instruction (required)

Troubleshooting

Authentication Error

Problem: "Invalid API key" or authentication errors

Solutions: 1. Check that the API key is correct and active 2. Ensure there are no extra spaces when copying/pasting 3. Verify the API key has the required permissions in Google Cloud Console 4. Confirm you have not exceeded your API key usage limits

Image/Video URLs Do Not Work

Problem: Analysis fails or cannot find the images/videos

Solutions: 1. Verify the URLs are publicly accessible 2. Ensure the URLs are valid and point to existing files 3. Check the file format (JPG, PNG for images; MP4, MOV for videos) 4. Confirm the servers do not block access from Gemini services 5. For private URLs, consider using temporary URLs or public storage services

Prompt Does Not Produce Expected Results

Problem: Results are not what you expected

Solutions: 1. Be more specific: Add more detail to your prompt 2. Rephrase: Try different ways to ask the same question 3. Provide context: Include relevant information about what you are analyzing 4. Break into steps: For complex tasks, consider using multiple Gemini actions in sequence 5. Try different models: Some models perform better for certain tasks

Usage Limits Exceeded

Problem: "Rate limit exceeded" or "Quota exceeded"

Solutions: 1. Review your Gemini API plan limits 2. Consider using a lighter model (Flash Lite) to reduce usage 3. Add delays between calls when processing multiple items 4. Contact Google to increase your limits if needed

Model Not Available

Problem: Error when selecting a specific model

Solutions: 1. Verify the model is available in your region 2. Try an alternative model (e.g., Flash instead of Pro) 3. Check Gemini documentation for model availability 4. Update to the latest system version

Best Practices

Security

  1. Never share your API key in public code or documentation
  2. Use environment variables or secure storage for API keys in production
  3. Rotate API keys periodically
  4. Monitor usage of your API key to detect unauthorized use

Cost Optimization

  1. Choose the right model: Use lighter models (Flash Lite) when possible
  2. Combine multiple analyses in a single prompt when appropriate
  3. Cache results when processing the same content multiple times
  4. Monitor consumption to avoid billing surprises

Result Quality

  1. Write clear, specific prompts for better results
  2. Provide examples in your prompt when helpful
  3. Iterate and improve prompts based on the results you get
  4. Use the right model for each task (Pro for complex, Flash for simpler tasks)

Integration in Automations

  1. Handle errors using conditional nodes after the Gemini action
  2. Validate results before using them in downstream actions
  3. Use timeouts to avoid automations hanging
  4. Log results for debugging and continuous improvement

Frequently Asked Questions

Q: Can I use multiple images in a single analysis?
A: Yes. You can enter multiple image URLs (one per line), and Gemini will analyze them together.

Q: What image/video formats are supported?
A: For images: JPG, PNG, GIF, WebP. For videos: MP4, MOV, AVI, WebM. See Gemini’s official documentation for the full list.

Q: How long does it take to process an image/video?
A: It depends on file size and the model used. Typically, small images take seconds, while videos can take minutes.

Q: Can I use automation context variables in the prompt?
A: Yes. You can use context variables with the {{variable.name}} syntax in the prompt field.

Q: Are there limits on the size of files I can process?
A: Yes. Gemini has file size limits. Check the official documentation for current limits.

Q: Can I use this node without an API key?
A: No. A valid API key is required to use Gemini services.

Q: Where are the results stored?
A: Results are available in the automation context and can be used by downstream nodes. See the documentation on how to access them.

Q: What’s the difference between Pro and Flash models?
A: Pro is more capable and accurate but slower and more expensive. Flash is faster and cheaper but may be less accurate on very complex tasks.