Skip to content

Documentation: Grok Action (AI Analysis with xAI)

Overview

The Grok Action is an automation node that allows you to use xAI's Grok artificial intelligence models to process and analyze content: text, images, audio, and video. It belongs to the AI Models node family and shares the same structure as actions from other providers (Gemini, GPT, Claude).

In IoT and security environments, it is useful for analyzing camera images with AI when an event occurs, generating structured descriptions (e.g., in JSON format), or processing text.


When to use this action?

Use this action when you need to:

  • Analyze images from cameras (detect objects/people, describe the scene).
  • Generate images from text descriptions.
  • Analyze video to extract information or detect events.
  • Process text with Grok models.
  • Integrate xAI capabilities into your automations.

Node Configuration

The configuration is divided into two sections, switchable with the top selector: Basic Configuration and Prompt Configuration. It also includes the JSON Editor tab.

Empty basic configuration of the Grok node

Section: Basic Configuration

1. API Key *Required

Select the Grok (xAI) credential that authenticates access (managed in a centralized and secure way).

2. Resource Type *Required

The type of content to process: Text, Image, Audio, or Video.

3. Model *Required

The Grok model to use:

Model Value
Grok 2 grok-2-1212
Grok 2 Vision grok-2-vision-1212
Grok Beta grok-beta
Grok Vision Beta grok-vision-beta

Basic configuration of the Grok node with Image resource

Section: Prompt Configuration

4. Operation *For Image and Video

  • For Image: Generate Image or Analyze Image.
  • For Video: Analyze Video.

5. Image / Video URLs

For analysis, enter the URLs (one per line). Supports template expressions (e.g., {{get_snapshot_node.url}}).

6. Prompt *Required

The instruction/question for the model. Supports template expressions.

Prompt configuration of the Grok node


JSON Editor View

JSON Editor view of the Grok node


JSON Structure (Input Parameters)

{
  "api_key": "",
  "resource": "image",
  "operation": "analyze",
  "model_id": "grok-2-vision-1212",
  "image_urls": [
    "{{get_snapshot_node.url}}"
  ],
  "video_urls": [],
  "prompt": "Describe in JSON format the objects detected in this image from the industrial camera."
}

JSON Fields

Field Type Description
api_key string Reference to the Grok/xAI credential (managed securely).
resource string Resource type: text, image, audio, video.
operation string Operation (image: generate/analyze; video: analyze_video).
model_id string Grok model ID (e.g. grok-2-vision-1212).
image_urls array (string) URLs of images to analyze.
video_urls array (string) URLs of videos to analyze.
prompt string The instruction/question for the model.

Output: Where the node's data comes from

The analysis result is available in the node's output and can be used in downstream nodes with {{node_key}}.


Usage Examples

Example 1: Structured description of objects on an industrial camera

Use case: Analyze the image from a plant camera and obtain in JSON the detected objects, to feed into downstream logic.

  • Resource Type: Image | Model: grok-2-vision-1212
  • Operation: Analyze Image
  • Image URLs: {{get_snapshot_node.url}}
  • Prompt: Describe in JSON format the objects detected in this image from the industrial camera.

(see JSON structure above)

Example 2: Process text

Use case: Summarize or classify a text received in the flow.

  • Resource Type: Text
  • Prompt: Summarize the following text: {{trigger.body.text}}

Validation and Errors

Condition Common cause / fix
Authentication error The Grok/xAI credential is invalid or lacks permissions/balance.
URLs not working Make sure the URLs are publicly accessible.
Model without vision For image analysis, use a vision-capable model (e.g. grok-2-vision-1212).

Best Practices

  • Use centralized credentials: Do not write the API key in the node.
  • Vision model for images: Select a Vision model when analyzing images.
  • Specific prompts: Request structured outputs (JSON) when you plan to process the result in downstream nodes.
  • Chain with snapshot capture: Typical pattern: Get snapshotGrok (Analyze Image) → condition/notification.