n8n Workflows SFT Dataset
A curated dataset of n8n workflow examples paired with natural language descriptions, designed for supervised fine-tuning (SFT) of code generation models.
Dataset Description
This dataset contains instruction-workflow pairs where each example consists of:
- A natural language description of an automation task
- The corresponding valid n8n workflow JSON configuration
The dataset is specifically formatted for training models to generate n8n workflows from user prompts.
| Property | Value |
|---|---|
| Format | JSON |
| Size | 1K-10K examples |
| Language | English |
| License | Apache 2.0 |
Dataset Structure
Data Fields
{
"instruction": "string - Natural language description of the desired workflow",
"output": "string - Valid n8n workflow JSON configuration"
}
Example
{
"instruction": "Create a workflow that triggers on a webhook, filters incoming data based on a status field, and sends a notification to Slack",
"output": "{\"name\":\"Webhook to Slack\",\"nodes\":[{\"parameters\":{\"path\":\"status-webhook\"},\"name\":\"Webhook\",\"type\":\"n8n-nodes-base.webhook\",\"typeVersion\":1,\"position\":[250,300]},{\"parameters\":{\"conditions\":{\"string\":[{\"value1\":\"={{$json[\\\"status\\\"]}}\",\"value2\":\"active\"}]}},\"name\":\"Filter\",\"type\":\"n8n-nodes-base.filter\",\"typeVersion\":1,\"position\":[450,300]},{\"parameters\":{\"channel\":\"#notifications\",\"text\":\"New active status received\"},\"name\":\"Slack\",\"type\":\"n8n-nodes-base.slack\",\"typeVersion\":1,\"position\":[650,300]}],\"connections\":{\"Webhook\":{\"main\":[[{\"node\":\"Filter\",\"type\":\"main\",\"index\":0}]]},\"Filter\":{\"main\":[[{\"node\":\"Slack\",\"type\":\"main\",\"index\":0}]]}}}"
}
Usage
Loading with π€ Datasets
from datasets import load_dataset
dataset = load_dataset("eclaude/n8n-workflows-sft")
# Access training data
print(dataset["train"][0])
Loading with Pandas
import pandas as pd
df = pd.read_json("hf://datasets/eclaude/n8n-workflows-sft/data.json")
print(df.head())
Preparing for SFT Training
from datasets import load_dataset
dataset = load_dataset("eclaude/n8n-workflows-sft")
def format_for_chat(example):
"""Format examples for chat-style fine-tuning."""
return {
"messages": [
{
"role": "system",
"content": "You are an n8n workflow expert. Generate valid n8n workflow JSON configurations based on user requirements."
},
{
"role": "user",
"content": example["instruction"]
},
{
"role": "assistant",
"content": example["output"]
}
]
}
formatted_dataset = dataset.map(format_for_chat)
Training with TRL
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Qwen/Qwen2.5-Coder-3B-Instruct"
dataset = load_dataset("eclaude/n8n-workflows-sft")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
def formatting_func(example):
return f"""<|im_start|>system
You are an n8n workflow expert. Generate valid n8n workflow JSON configurations.<|im_end|>
<|im_start|>user
{example['instruction']}<|im_end|>
<|im_start|>assistant
{example['output']}<|im_end|>"""
training_args = SFTConfig(
output_dir="./n8n-sft-model",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
num_train_epochs=3,
learning_rate=2e-5,
bf16=True,
logging_steps=10,
save_strategy="epoch",
)
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
formatting_func=formatting_func,
tokenizer=tokenizer,
max_seq_length=2048,
)
trainer.train()
Covered n8n Nodes
The dataset includes workflows featuring common n8n integrations:
| Category | Nodes |
|---|---|
| Triggers | Webhook, Schedule, Manual |
| Core | HTTP Request, Code, Function, Set, Filter, Switch, Merge |
| Communication | Slack, Discord, Email, Telegram |
| Data | PostgreSQL, MySQL, MongoDB, Airtable, Google Sheets |
| Dev Tools | GitHub, GitLab, Jira |
| Storage | AWS S3, Google Drive, Dropbox |
| CRM | HubSpot, Salesforce |
Intended Uses
Primary Use
- Fine-tuning language models for n8n workflow generation
- Training code assistants specialized in automation
Out-of-Scope Use
- Direct production deployment without validation
- Training models for other automation platforms (Zapier, Make, etc.)
Limitations
- Node Coverage: Not all 400+ n8n nodes are represented equally
- Complexity: Most workflows are simple to medium complexity (2-8 nodes)
- Validation: Workflows are structurally valid but may require credential configuration
- Version: Based on n8n workflow schema as of late 2024; may need updates for future n8n versions
Dataset Creation
Source Data
Workflows were collected and curated from:
- Public n8n workflow templates
- Community-shared automations
- Synthetically generated examples with manual validation
Curation Process
- Collection of raw workflow JSON files
- Extraction and normalization of workflow structure
- Generation of natural language descriptions
- Manual review for quality and accuracy
- Deduplication and filtering
Models Trained on This Dataset
Citation
@dataset{n8n_workflows_sft_2025,
author = {eclaude},
title = {n8n Workflows SFT Dataset},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/eclaude/n8n-workflows-sft}
}
Contact
For questions, suggestions, or contributions, open a discussion on this repository or contact via Hugging Face.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support