Documentation: Gemini Action (AI Analysis)¶
Overview¶
The Gemini Action is an automation node that lets you use Google Gemini AI models to process and analyze different types of content. This node can generate text, analyze images, process audio, and analyze video using Gemini’s advanced models.
When to Use This Action¶
Use this action when you need to: - Analyze images automatically (detect objects, read text, identify scenes) - Generate images from text descriptions - Analyze videos to extract information or detect events - Process text with advanced AI capabilities - Process audio with recognition and analysis - Integrate AI capabilities into your automations
Node Configuration¶
Node configuration is split into two main sections that you can switch between using the selector at the top:
Section 1: Basic Configuration¶
Step 1: Configure API Key¶
- In the "API Key" field, enter your Google Gemini API key
- This key is required to authenticate with Gemini services
- Important: Keep your API key secure and do not share it
How to get an API Key: - Go to Google AI Studio - Sign in with your Google account - Create a new API key - Copy and paste the key into this field
Step 2: Select Resource Type¶
- In the "Resource Type" field, select the type of content you want to process:
- Text: For text processing
- Image: For image analysis or generation
- Audio: For audio processing
- Video: For video analysis
Note: The selected resource type determines which operations are available and which additional fields appear.
Step 3: Select Model¶
- In the "Model" field, select the Gemini model you want to use:
Available models: - Gemini 2.5 Pro: Most capable model, ideal for complex tasks - Gemini 2.5 Flash: Fast and efficient, ideal for most use cases - Gemini 2.5 Flash Image: Optimized for image processing - Gemini 2.5 Flash Lite: Lighter, faster, and more economical - Gemini 2.0 Flash: Previous version of the Flash model - Gemini 2.0 Flash Lite: Previous lightweight version
Recommendations: - For image analysis: Use Gemini 2.5 Flash Image or Gemini 2.5 Flash - For image generation: Use Gemini 2.5 Flash - For video analysis: Use Gemini 2.5 Pro or Gemini 2.5 Flash - For text processing: Use Gemini 2.5 Flash (faster) or Gemini 2.5 Pro (more capable)
Section 2: Prompt Configuration¶
Step 4: Select Operation (Image and Video Only)¶
If you selected Image as the resource type:
- In the "Operation" field, select:
- Generate Image: To create images from descriptions
- Analyze Image: To analyze existing images
If you selected Video as the resource type:
- In the "Operation" field, select:
- Analyze Video: To analyze videos
Note: For Text and Audio resources, this field does not appear since only one operation is available.
Step 5: Configure Image URLs (Image Analysis Only)¶
If you selected Image → Analyze Image:
- In the "Image URLs" field, enter the URLs of the images you want to analyze
- Enter one URL per line
- Example:
https://example.com/image1.jpg https://example.com/image2.png https://example.com/image3.jpg
Requirements: - URLs must be publicly accessible or reachable from Gemini’s servers - Supported formats: JPG, PNG, GIF, WebP - You can analyze multiple images in a single run
Step 6: Configure Video URLs (Video Analysis Only)¶
If you selected Video → Analyze Video:
- In the "Video URLs" field, enter the URLs of the videos you want to analyze
- Enter one URL per line
- Example:
https://example.com/video1.mp4 https://example.com/video2.mov
Requirements: - URLs must be publicly accessible or reachable from Gemini’s servers - Supported formats: MP4, MOV, AVI, WebM - You can analyze multiple videos in a single run
Step 7: Write the Prompt¶
- In the "Prompt" field, write the instruction or question you want Gemini to process
- Prompt content varies by operation:
For Generate Image:
Describe the image you want in detail. Example:
"A Siamese cat sitting on a windowsill with an ocean view at sunset, realistic style, high quality"
For Analyze Image:
Ask specific questions about the image. Examples:
"What objects appear in this image?"
"Describe the scene in detail"
"Is there any visible text in the image? If so, what does it say?"
"How many people are in the image and what are they doing?"
For Analyze Video:
Specify what you want analyzed in the video. Examples:
"Describe the main actions that occur in the video"
"Are there people in the video? What are they doing?"
"Detect any unusual events or suspicious movement"
"Summarize the video content in 3–5 key points"
For Process Text:
Enter the text to process or your instructions. Examples:
"Summarize the following text: [your text here]"
"Translate to English: [your text here]"
"Extract keywords from: [your text here]"
For Process Audio:
Specify what you want to do with the audio. Examples:
"Transcribe this audio to text"
"Identify the spoken language"
"Summarize the audio content"
Tips for effective prompts: - Be specific and clear in your instructions - Include context when relevant - For analysis, ask concrete questions - For generation, provide specific visual details - You can use multiple instructions separated by periods or line breaks
Usage Examples¶
Example 1: Analyze Security Image¶
Basic configuration:
- API Key: your-api-key-here
- Resource Type: Image
- Model: Gemini 2.5 Flash Image
Prompt configuration:
- Operation: Analyze Image
- Image URLs:
https://security.example.com/camera1/snapshot.jpg
Analyze this security image. Are there any people visible?
If so, describe their position and activity.
Is there any suspicious object or unusual activity?
Use case: Triggers when motion is detected and automatically analyzes the camera image.
Example 2: Generate Image for Notification¶
Basic configuration:
- API Key: your-api-key-here
- Resource Type: Image
- Model: Gemini 2.5 Flash
Prompt configuration:
- Operation: Generate Image
- Prompt:
A modern illustration of a home automation system,
showing connected devices, minimalist style, blue and white colors,
professional high quality
Use case: Generates a custom image for use in notifications or dashboards.
Example 3: Analyze Event Video¶
Basic configuration:
- API Key: your-api-key-here
- Resource Type: Video
- Model: Gemini 2.5 Pro
Prompt configuration:
- Operation: Analyze Video
- Video URLs:
https://events.example.com/recordings/event-2024.mp4
Analyze this event video. Summarize the main activities,
identify highlight moments, and estimate the approximate number of attendees.
Is there any moment that requires special attention?
Use case: Automatically analyzes event videos to produce reports.
Example 4: Process Document Text¶
Basic configuration:
- API Key: your-api-key-here
- Resource Type: Text
- Model: Gemini 2.5 Flash
Prompt configuration: - Prompt:
Analyze the following text and extract:
1. The main ideas
2. Important keywords
3. A summary in 3 sentences
Text: {{context.document_text}}
Use case: Automatically processes documents and extracts key information.
Example 5: Transcribe Audio¶
Basic configuration:
- API Key: your-api-key-here
- Resource Type: Audio
- Model: Gemini 2.5 Flash
Prompt configuration: - Prompt:
Transcribe this audio to text. If there are multiple speakers,
identify who says what. Include appropriate punctuation and formatting.
Use case: Converts audio recordings to text automatically.
Automation Workflow¶
Typical Structure¶
- Trigger (e.g., Object State Change Trigger) ↓
- Gemini Action (processes the content) ↓
- Result Action (uses the output, e.g., send notification)
Accessing Results¶
Gemini action results are available in the automation context and can be used by downstream nodes. Results typically include:
- Image analysis: Textual description, detected objects, extracted text
- Image generation: URL of the generated image
- Video analysis: Summary, detected events, scene descriptions
- Text processing: Processed text, summary, translation, etc.
- Audio processing: Transcription, summary, extracted information
JSON Editor (Advanced)¶
If you are comfortable with technical configuration, you can edit the setup directly in JSON using the "JSON Editor" tab. The structure is:
{
"api_key": "your-api-key-here",
"resource": "image",
"operation": "analyze",
"model_id": "gemini-2.5-flash",
"image_urls": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg"
],
"video_urls": [],
"prompt": "Describe this image in detail"
}
Available fields:
- api_key: Your Gemini API key (required)
- resource: Resource type: "text", "image", "audio", "video" (required)
- operation: Specific operation (required for image and video)
- For image: "generate" or "analyze"
- For video: "analyze_video"
- model_id: ID of the model to use (required)
- image_urls: Array of image URLs (for image analysis)
- video_urls: Array of video URLs (for video analysis)
- prompt: The prompt or instruction (required)
Troubleshooting¶
Authentication Error¶
Problem: "Invalid API key" or authentication errors
Solutions: 1. Check that the API key is correct and active 2. Ensure there are no extra spaces when copying/pasting 3. Verify the API key has the required permissions in Google Cloud Console 4. Confirm you have not exceeded your API key usage limits
Image/Video URLs Do Not Work¶
Problem: Analysis fails or cannot find the images/videos
Solutions: 1. Verify the URLs are publicly accessible 2. Ensure the URLs are valid and point to existing files 3. Check the file format (JPG, PNG for images; MP4, MOV for videos) 4. Confirm the servers do not block access from Gemini services 5. For private URLs, consider using temporary URLs or public storage services
Prompt Does Not Produce Expected Results¶
Problem: Results are not what you expected
Solutions: 1. Be more specific: Add more detail to your prompt 2. Rephrase: Try different ways to ask the same question 3. Provide context: Include relevant information about what you are analyzing 4. Break into steps: For complex tasks, consider using multiple Gemini actions in sequence 5. Try different models: Some models perform better for certain tasks
Usage Limits Exceeded¶
Problem: "Rate limit exceeded" or "Quota exceeded"
Solutions: 1. Review your Gemini API plan limits 2. Consider using a lighter model (Flash Lite) to reduce usage 3. Add delays between calls when processing multiple items 4. Contact Google to increase your limits if needed
Model Not Available¶
Problem: Error when selecting a specific model
Solutions: 1. Verify the model is available in your region 2. Try an alternative model (e.g., Flash instead of Pro) 3. Check Gemini documentation for model availability 4. Update to the latest system version
Best Practices¶
Security¶
- Never share your API key in public code or documentation
- Use environment variables or secure storage for API keys in production
- Rotate API keys periodically
- Monitor usage of your API key to detect unauthorized use
Cost Optimization¶
- Choose the right model: Use lighter models (Flash Lite) when possible
- Combine multiple analyses in a single prompt when appropriate
- Cache results when processing the same content multiple times
- Monitor consumption to avoid billing surprises
Result Quality¶
- Write clear, specific prompts for better results
- Provide examples in your prompt when helpful
- Iterate and improve prompts based on the results you get
- Use the right model for each task (Pro for complex, Flash for simpler tasks)
Integration in Automations¶
- Handle errors using conditional nodes after the Gemini action
- Validate results before using them in downstream actions
- Use timeouts to avoid automations hanging
- Log results for debugging and continuous improvement
Frequently Asked Questions¶
Q: Can I use multiple images in a single analysis?
A: Yes. You can enter multiple image URLs (one per line), and Gemini will analyze them together.
Q: What image/video formats are supported?
A: For images: JPG, PNG, GIF, WebP. For videos: MP4, MOV, AVI, WebM. See Gemini’s official documentation for the full list.
Q: How long does it take to process an image/video?
A: It depends on file size and the model used. Typically, small images take seconds, while videos can take minutes.
Q: Can I use automation context variables in the prompt?
A: Yes. You can use context variables with the {{variable.name}} syntax in the prompt field.
Q: Are there limits on the size of files I can process?
A: Yes. Gemini has file size limits. Check the official documentation for current limits.
Q: Can I use this node without an API key?
A: No. A valid API key is required to use Gemini services.
Q: Where are the results stored?
A: Results are available in the automation context and can be used by downstream nodes. See the documentation on how to access them.
Q: What’s the difference between Pro and Flash models?
A: Pro is more capable and accurate but slower and more expensive. Flash is faster and cheaper but may be less accurate on very complex tasks.