n8n Workflows SFT Dataset

A curated dataset of n8n workflow examples paired with natural language descriptions, designed for supervised fine-tuning (SFT) of code generation models.

Dataset Description

This dataset contains instruction-workflow pairs where each example consists of:

A natural language description of an automation task
The corresponding valid n8n workflow JSON configuration

The dataset is specifically formatted for training models to generate n8n workflows from user prompts.

Property	Value
Format	JSON
Size	1K-10K examples
Language	English
License	Apache 2.0

Dataset Structure

Data Fields

{
  "instruction": "string - Natural language description of the desired workflow",
  "output": "string - Valid n8n workflow JSON configuration"
}

Example

{
  "instruction": "Create a workflow that triggers on a webhook, filters incoming data based on a status field, and sends a notification to Slack",
  "output": "{\"name\":\"Webhook to Slack\",\"nodes\":[{\"parameters\":{\"path\":\"status-webhook\"},\"name\":\"Webhook\",\"type\":\"n8n-nodes-base.webhook\",\"typeVersion\":1,\"position\":[250,300]},{\"parameters\":{\"conditions\":{\"string\":[{\"value1\":\"={{$json[\\\"status\\\"]}}\",\"value2\":\"active\"}]}},\"name\":\"Filter\",\"type\":\"n8n-nodes-base.filter\",\"typeVersion\":1,\"position\":[450,300]},{\"parameters\":{\"channel\":\"#notifications\",\"text\":\"New active status received\"},\"name\":\"Slack\",\"type\":\"n8n-nodes-base.slack\",\"typeVersion\":1,\"position\":[650,300]}],\"connections\":{\"Webhook\":{\"main\":[[{\"node\":\"Filter\",\"type\":\"main\",\"index\":0}]]},\"Filter\":{\"main\":[[{\"node\":\"Slack\",\"type\":\"main\",\"index\":0}]]}}}"
}

Usage

Loading with 🤗 Datasets

from datasets import load_dataset

dataset = load_dataset("eclaude/n8n-workflows-sft")

# Access training data
print(dataset["train"][0])

Loading with Pandas

import pandas as pd

df = pd.read_json("hf://datasets/eclaude/n8n-workflows-sft/data.json")
print(df.head())

Preparing for SFT Training

from datasets import load_dataset

dataset = load_dataset("eclaude/n8n-workflows-sft")

def format_for_chat(example):
    """Format examples for chat-style fine-tuning."""
    return {
        "messages": [
            {
                "role": "system",
                "content": "You are an n8n workflow expert. Generate valid n8n workflow JSON configurations based on user requirements."
            },
            {
                "role": "user", 
                "content": example["instruction"]
            },
            {
                "role": "assistant",
                "content": example["output"]
            }
        ]
    }

formatted_dataset = dataset.map(format_for_chat)

Training with TRL

from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Qwen/Qwen2.5-Coder-3B-Instruct"
dataset = load_dataset("eclaude/n8n-workflows-sft")

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

def formatting_func(example):
    return f"""<|im_start|>system
You are an n8n workflow expert. Generate valid n8n workflow JSON configurations.<|im_end|>
<|im_start|>user
{example['instruction']}<|im_end|>
<|im_start|>assistant
{example['output']}<|im_end|>"""

training_args = SFTConfig(
    output_dir="./n8n-sft-model",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-5,
    bf16=True,
    logging_steps=10,
    save_strategy="epoch",
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    formatting_func=formatting_func,
    tokenizer=tokenizer,
    max_seq_length=2048,
)

trainer.train()

Covered n8n Nodes

The dataset includes workflows featuring common n8n integrations:

Category	Nodes
Triggers	Webhook, Schedule, Manual
Core	HTTP Request, Code, Function, Set, Filter, Switch, Merge
Communication	Slack, Discord, Email, Telegram
Data	PostgreSQL, MySQL, MongoDB, Airtable, Google Sheets
Dev Tools	GitHub, GitLab, Jira
Storage	AWS S3, Google Drive, Dropbox
CRM	HubSpot, Salesforce

Intended Uses

Primary Use

Fine-tuning language models for n8n workflow generation
Training code assistants specialized in automation

Out-of-Scope Use

Direct production deployment without validation
Training models for other automation platforms (Zapier, Make, etc.)

Limitations

Node Coverage: Not all 400+ n8n nodes are represented equally
Complexity: Most workflows are simple to medium complexity (2-8 nodes)
Validation: Workflows are structurally valid but may require credential configuration
Version: Based on n8n workflow schema as of late 2024; may need updates for future n8n versions

Dataset Creation

Source Data

Workflows were collected and curated from:

Public n8n workflow templates
Community-shared automations
Synthetically generated examples with manual validation

Curation Process

Collection of raw workflow JSON files
Extraction and normalization of workflow structure
Generation of natural language descriptions
Manual review for quality and accuracy
Deduplication and filtering

Models Trained on This Dataset

eclaude/qwen-coder-3b-n8n-sft

Citation

@dataset{n8n_workflows_sft_2025,
  author = {eclaude},
  title = {n8n Workflows SFT Dataset},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/datasets/eclaude/n8n-workflows-sft}
}

Contact

For questions, suggestions, or contributions, open a discussion on this repository or contact via Hugging Face.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support