I Built an n8n Workflow to Fetch APIs & Clean Data Automatically (Full Guide)

Let me tell you a quick story first. Three months ago I was building an automation that pulled product data from a third-party REST API. The HTTP Request node fetched 200+ items in one shot — beautiful. Except the JSON response was a nightmare. Nested three levels deep, half the fields had null values, dates were in Unix timestamps, and the price field was a string with a currency symbol attached. My PostgreSQL node took one look at that data and basically said “bro, no.”

That one workflow taught me more about data cleaning in n8n than any tutorial ever could. And now, I’m passing all of that knowledge to you.

Today’s session is going to be a deep, practical guide on how to fetch APIs properly in n8n, handle all the weird edge cases, and clean your data so it’s ready for storage, AI analysis, or whatever downstream node is waiting for it. This isn’t a “click here, then click there” tutorial. We’re going to understand why things break and how to think like a developer when building n8n workflows.

For those who are new here — hello, I goes by the nickname axiomcompute, and welcome to the session. For the returning folks, you guys already know the vibe — let’s get straight into it.

Why APIs Are the Backbone of n8n Automation

Before we touch a single node, let me set the context. Everything in modern automation revolves around APIs. Your CRM has an API. Your payment gateway has an API. The weather forecast? API. Even that random meme generator you found at 2 AM? That’s an API too.

n8n has dedicated nodes for popular services — Slack, Google Sheets, Notion, etc. But here’s the reality — 80% of real-world integrations don’t have a dedicated node. They have an API documentation page and an endpoint URL. That’s it. And for those, you need the HTTP Request node.

The HTTP Request node is honestly the most important node in n8n after the Webhook. Master this, and you can connect n8n to literally anything that exists on the internet.

The HTTP Request Node: Deep Dive

Let’s break down this node — not just what buttons to click, but what’s actually happening under the hood.

Method Types — Picking the Right One

When you open the HTTP Request node, the first thing you see is the Method dropdown. This is where most beginners go wrong — they pick GET for everything or POST for everything without understanding what each method means.

MethodWhat It DoesWhen to Use in n8n
GETFetches data from a serverReading data — list of products, user profiles, weather info
POSTSends data to create something newSubmitting forms, creating records, sending messages
PUTReplaces entire resourceUpdating a full record (overwrites everything)
PATCHUpdates specific fieldsChanging just the status or a single field
DELETERemoves a resourceDeleting a record from external service

Pro Tip: Always check the API documentation for which method to use. I once used POST instead of PATCH on a CRM API and accidentally created 50 duplicate contacts instead of updating existing ones. Fun times.

URL Configuration

The URL field accepts both static and dynamic URLs. This is where n8n’s expression system becomes your best friend.

// Static URL
https://api.example.com/v1/products

// Dynamic URL using expressions
https://api.example.com/v1/products/{{ $json.product_id }}

// With query parameters built dynamically
https://api.example.com/v1/search?q={{ $json.search_term }}&limit=50

Note:- If you’re dealing with special characters in your dynamic values (spaces, emojis, Indian language text), you’ll need to encode them. Use the encodeURIComponent() function inside expressions:

https://api.example.com/search?q={{ encodeURIComponent($json.search_term) }}

I learned this the hard way when a user’s name contained “é” and the API returned a 400 error. Encoding fixes that instantly.

Authentication — The Part Everyone Struggles With

This is where 60% of beginners give up. The API works in Postman but breaks in n8n. I’ve been there. Let me save you those hours.

n8n supports multiple authentication types:

  • Predefined Credential Type: If n8n has a built-in credential for the service (like OpenAI, Slack), always use this. It handles token refresh, base URLs, everything.
  • Generic Header Auth: For APIs that need an Authorization: Bearer your_token header. Most common one you’ll use.
  • Basic Auth: Username and password. Some older APIs still use this.
  • OAuth2: The complex one. For services like Google, Microsoft, etc. n8n handles the token refresh automatically once set up.
  • Custom Auth: When the API has a non-standard auth method (like API key in query params, or custom headers).

My recommendation: Always use Predefined Credential Type if available. It’s pre-configured, tested, and handles edge cases like token expiry. Only go custom when there’s no built-in option.

Headers and Query Parameters

Beyond the URL and auth, sometimes you need to send specific headers. The most common ones:

Content-Type: application/json       // Telling API you're sending JSON
Accept: application/json              // Telling API you want JSON back
X-API-Version: 2024-01-01            // Some APIs use versioning in headers
User-Agent: n8n-workflow/1.0          // Some APIs require this

Important:- Some APIs (especially Indian payment gateways like Razorpay, Cashfree) require specific headers for every request. Always read the docs carefully. I once spent 4 hours debugging a Cashfree integration only to realize I was missing a x-client-id header. The error message? “Unauthorized.” That’s it. No helpful hint. Just “Unauthorized.”

Handling Pagination

This is a big one. Most APIs don’t return all data at once. They paginate — give you 20, 50, or 100 records per page. If you need all records, you must handle pagination.

n8n’s HTTP Request node has a built-in Pagination option under the Options section. There are three modes:

  • Offset-Based: The API uses ?offset=0&limit=50, then ?offset=50&limit=50, etc.
  • Cursor-Based: The API returns a cursor/token for the next page. More modern APIs use this.
  • Response Contains Link to Next Page: The API response itself includes the URL for the next page.

Example of offset-based pagination config:

// Under Options → Pagination
Mode: Offset
Limit per page: 50
Offset parameter: offset
Limit parameter: limit
Max pages: 10  // Safety limit to prevent infinite loops

Pro Tip: Always set a Max Pages limit. I once forgot this, and my workflow fetched 47,000 records from an API in a loop. My Neon database almost cried, and my n8n instance became unresponsive for 15 minutes. Learn from my mistakes, yeah?

Understanding API Responses — The Messy Reality

Here’s what tutorials don’t show you — real API responses are messy. Let me show you what I mean. Take a typical e-commerce API response:

{
  "status": "success",
  "data": {
    "products": [
      {
        "id": 1024,
        "title": "Wireless Mouse  ",
        "price": "$20.00",
        "stock": null,
        "category": {
          "id": 5,
          "name": "Electronics",
          "parent": { "id": 1, "name": "All Products" }
        },
        "created": 1702569600,
        "tags": ["wireless", "mouse", "computer"],
        "description": "<p>High quality <b>wireless</b> mouse</p>",
        "variants": []
      }
    ],
    "pagination": {
      "total": 234,
      "page": 1,
      "per_page": 50
    }
  }
}

Look at the problems here:

  • The actual data is nested inside data.products — not at root level
  • Title has trailing whitespace: "Wireless Mouse "
  • Price is a string with currency symbol: "$20.00"
  • stock is null — will crash your calculations
  • Category is deeply nested — 3 levels deep
  • Date is Unix timestamp, not human-readable
  • Description has HTML tags mixed in
  • variants is an empty array — might cause issues downstream

If you pass this raw data to a PostgreSQL Insert node, half your columns will be wrong, some will be null, and some will throw type mismatch errors. This is why data cleaning exists.

Data Cleaning in n8n: Three Approaches

Now we get to the main event. There are three ways to clean data in n8n, each with its own strengths. I use all three depending on the situation.

Approach 1: Expressions (Quick Fixes)

For simple transformations, you can use expressions directly in the field mappings of any node. No extra node needed.

// Remove whitespace
{{ $json.title.trim() }}

// Extract number from price string
{{ parseFloat($json.price.replace(/[^0-9.]/g, '')) }}

// Handle null with fallback
{{ $json.stock ?? 0 }}

// Access nested data
{{ $json.category.name }}

// Format Unix timestamp to ISO date
{{ new Date($json.created * 1000).toISOString() }}

// Join array to string
{{ $json.tags.join(', ') }}

When to use expressions: When you need to fix 1-3 fields and the transformations are simple. Don’t overcomplicate it.

When NOT to use: When you’re transforming more than 5 fields, or when the logic involves conditions (if-else). At that point, switch to the Code node.

Approach 2: The Edit Fields (Set) Node

This node is highly underrated. It lets you restructure data completely — rename fields, set new values using expressions, drop fields you don’t need. No code required.

How I use it:

  • Set the mode to “Manual Mapping”
  • Add only the fields you want in the output
  • Toggle “Keep Only Set Fields” — this removes all the junk

Example configuration:

// Field mappings in the Edit Fields node:
product_name  →  {{ $json.title.trim() }}
price_inr     →  {{ parseFloat($json.price.replace('$','').replace(',','')) }}
stock_count   →  {{ $json.stock ?? 0 }}
category      →  {{ $json.category.name }}
created_date  →  {{ new Date($json.created * 1000).toISOString().split('T')[0] }}
has_variants  →  {{ $json.variants.length > 0 }}

See? No code, no JavaScript knowledge needed. Just expressions. This is the sweet spot for most data cleaning tasks.

Pro Tip: The Edit Fields node also has a “Dot Notation” option. If enabled, you can access nested fields like category.name directly. If disabled, it treats dots as literal characters in field names. I keep it enabled 99% of the time.

Approach 3: The Code Node (Full Developer Power)

When things get serious — complex conditionals, loops, regex operations, or processing arrays — the Code node is where you go. It supports both JavaScript and Python.

Here’s a real-world cleaning script I use for processing API responses:

// Code Node — Clean e-commerce API data
const items = $input.all();
const cleaned = [];

for (const item of items) {
  const raw = item.json;
  
  // Skip if product is missing essential fields
  if (!raw.title || !raw.price) continue;
  
  // Clean price — handle ₹, commas, random spaces
  const priceStr = String(raw.price)
    .replace(/[₹$€,\s]/g, '')
    .trim();
  const price = parseFloat(priceStr);
  
  // Skip invalid prices
  if (isNaN(price) || price <= 0) continue;
  
  // Strip HTML from description
  const cleanDescription = (raw.description || '')
    .replace(/<[^>]*>/g, '')
    .replace(/&nbsp;/g, ' ')
    .trim();
  
  // Normalize category — handle nested or flat
  let categoryName = 'Uncategorized';
  if (typeof raw.category === 'object' && raw.category?.name) {
    categoryName = raw.category.name;
  } else if (typeof raw.category === 'string') {
    categoryName = raw.category;
  }
  
  // Convert Unix timestamp or ISO string
  let createdDate;
  if (typeof raw.created === 'number') {
    createdDate = new Date(raw.created * 1000).toISOString();
  } else if (typeof raw.created === 'string') {
    createdDate = new Date(raw.created).toISOString();
  } else {
    createdDate = new Date().toISOString();
  }
  
  cleaned.push({
    json: {
      product_id: raw.id,
      name: raw.title.trim(),
      price_inr: price,
      stock: raw.stock ?? 0,
      category: categoryName,
      tags: Array.isArray(raw.tags) ? raw.tags.join(', ') : '',
      description: cleanDescription.substring(0, 500),
      has_variants: Array.isArray(raw.variants) && raw.variants.length > 0,
      created_at: createdDate,
      fetched_at: new Date().toISOString()
    }
  });
}

return cleaned;

That script handles everything — null values, type mismatches, nested objects, HTML stripping, price normalization, date conversion, and even skips invalid records. This is how a developer thinks about data cleaning.

Note:- The Code node has two modes — “Run Once for All Items” and “Run Once for Each Item”. For data cleaning, almost always use “Run Once for All Items” because you can filter, merge, and restructure the entire dataset. The “Each Item” mode processes one at a time — useful for independent API calls but wasteful for cleaning.

Real-World Scenario: Weather API to Database Pipeline

Theory ke baad practice. Let’s build a real workflow — fetch weather data for Indian cities from an API, clean it, and store it in PostgreSQL. This is something you can genuinely use.

Step 1: The API Call

We’ll use Open-Meteo API — it’s free, no API key needed, and perfect for learning.

// HTTP Request Node Configuration
Method: GET
URL: https://api.open-meteo.com/v1/forecast
Query Parameters:
  latitude: 28.6139       
  longitude: 77.2090
  current_weather: true
  hourly: temperature_2m,relative_humidity_2m
  timezone: Asia/Kolkata

The response you’ll get looks something like:

{
"latitude": 28.625,
"longitude": 77.21875,
"current_weather": {
"temperature": 34.2,
"windspeed": 12.5,
"weathercode": 2,
"time": "2026-04-11T14:00"
},
"hourly": {
"time": ["2026-04-11T00:00", "2026-04-11T01:00", ...],
"temperature_2m": [28.1, 27.3, 26.8, ...],
"relative_humidity_2m": [45, 48, 52, ...]
}
}

See the problem? The hourly data is in separate arrays — time[]temperature_2m[]humidity[]. They’re parallel arrays — index 0 of each corresponds to the same hour. But n8n nodes work with objects, not parallel arrays. We need to zip them together.

Step 2: Clean & Transform

// Code Node — Transform weather data
const weather = $input.first().json;
const results = [];

const times = weather.hourly.time;
const temps = weather.hourly.temperature_2m;
const humidity = weather.hourly.relative_humidity_2m;

for (let i = 0; i < times.length; i++) {
  // Parse the time
  const dateObj = new Date(times[i]);
  const hour = dateObj.getHours();
  
  // Only keep daytime hours (6 AM to 10 PM)
  if (hour < 6 || hour > 22) continue;
  
  // Categorize temperature for Indian context
  let tempCategory;
  if (temps[i] >= 42) tempCategory = 'extreme_heat';
  else if (temps[i] >= 35) tempCategory = 'hot';
  else if (temps[i] >= 25) tempCategory = 'warm';
  else if (temps[i] >= 15) tempCategory = 'pleasant';
  else tempCategory = 'cold';
  
  results.push({
    json: {
      city: 'Delhi',
      recorded_at: times[i],
      temperature_c: temps[i],
      humidity_percent: humidity[i],
      temp_category: tempCategory,
      heat_index: calculateHeatIndex(temps[i], humidity[i]),
      fetched_at: new Date().toISOString()
    }
  });
}

function calculateHeatIndex(temp, hum) {
  // Simplified heat index for high temperatures
  if (temp < 27) return temp;
  const hi = -8.785 + 1.611*temp + 2.339*hum 
    - 0.146*temp*hum - 0.012*(temp**2) 
    - 0.016*(hum**2) + 0.002*(temp**2)*hum;
  return Math.round(hi * 10) / 10;
}

return results;

See what we did there? We zipped parallel arrays, filtered night hours, categorized temperatures for Indian weather context (because “warm” in Delhi is very different from “warm” in London), and even calculated a heat index. This is the kind of data cleaning that makes your downstream nodes happy.

Step 3: Store in PostgreSQL

-- Create the table first (run in Neon SQL Editor)
CREATE TABLE weather_logs (
    id SERIAL PRIMARY KEY,
    city VARCHAR(50) NOT NULL,
    recorded_at TIMESTAMP WITH TIME ZONE NOT NULL,
    temperature_c DECIMAL(5,2),
    humidity_percent INTEGER,
    temp_category VARCHAR(20),
    heat_index DECIMAL(5,2),
    fetched_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(city, recorded_at)
);

CREATE INDEX idx_weather_city ON weather_logs(city);
CREATE INDEX idx_weather_recorded ON weather_logs(recorded_at);

Then connect a PostgreSQL node with Insert or Update operation, matching on city + recorded_at. This way if you run the workflow multiple times, it updates existing records instead of creating duplicates. Clean.

10 Data Cleaning Patterns I Use Daily

This is the section I wish someone wrote for me when I started. These are copy-paste ready snippets you can drop into any Code node or expression field.

// 1. Phone number normalization (Indian)
const phone = raw.phone.replace(/[\s-()]/g, '');
const normalized = phone.startsWith('+91') ? phone 
  : phone.startsWith('91') ? '+' + phone 
  : '+91' + phone;

// 2. Email validation & lowercase
const email = (raw.email || '').trim().toLowerCase();
const isValid = /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);

// 3. Name title case
const titleCase = raw.name.replace(/\w\S*/g, 
  txt => txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase()
);

// 4. Remove emoji from text
const noEmoji = raw.text.replace(/[\u{1F600}-\u{1F9FF}]/gu, '').trim();

// 5. Parse Indian currency
const amount = parseFloat(
  raw.price.replace(/[₹,\s]/g, '')
);

// 6. Flatten nested object
const flat = {
  name: raw.user.profile.name,
  city: raw.user.address?.city || 'Unknown',
  pin: raw.user.address?.pincode || ''
};

// 7. Date format: DD/MM/YYYY to ISO
const [day, month, year] = raw.date.split('/');
const isoDate = new Date(year, month - 1, day).toISOString();

// 8. Deduplicate array
const unique = [...new Set(raw.tags)];

// 9. Safe JSON parse (from stringified fields)
let parsed;
try { parsed = JSON.parse(raw.metadata); } 
catch { parsed = {}; }

// 10. Truncate long text with ellipsis
const short = raw.description.length > 200 
  ? raw.description.substring(0, 197) + '...' 
  : raw.description;

Bookmark this section. Seriously. You’ll come back to it more than you think.

Error Handling — Because APIs Will Fail

This is the section that separates hobby workflows from production-grade ones. APIs go down. They return 500 errors. They rate-limit you. They change their response format without telling you. You need to be ready.

Retry on Failure

The HTTP Request node has a built-in Retry on Fail option under Settings. Always enable this for critical API calls.

  • Max Retries: 3 (don’t go higher, you’ll just hammer the API)
  • Wait Between Retries: 1000ms minimum (some APIs ban you for rapid retries)
  • Retry On: “All Errors” or specific status codes like 429 (rate limit), 500, 502, 503

The Error Workflow Pattern

Set up an Error Workflow that catches failures and notifies you. I covered this concept briefly in my WhatsApp Lead Agent blog — the Error Handler node (Node 27). Same principle applies here.

// Code Node — Validate API response before processing
const response = $input.first().json;

// Check if response is valid
if (!response || !response.data) {
  throw new Error('API returned empty or invalid response');
}

// Check for API-level errors
if (response.status === 'error' || response.error) {
  throw new Error('API error: ' + (response.message || 'Unknown error'));
}

// Check if data array exists and has items
if (!Array.isArray(response.data.products) || response.data.products.length === 0) {
  // Return empty but valid result instead of crashing
  return [{ json: { _empty: true, reason: 'No products found' } }];
}

// Data is valid — pass through
return $input.all();

My rule of thumb: Always validate the API response before cleaning. A Code node right after the HTTP Request node that checks for errors, empty responses, and unexpected formats. 30 seconds to set up, saves hours of debugging later.

Rate Limiting and Batch Processing

Many APIs limit how many requests you can make per minute. If your workflow processes 500 items and each needs an API call, you’ll get rate-limited real quick.

Solutions:

  • Split In Batches node: Process items in groups of 10 or 20 instead of all at once
  • Wait node: Add a delay between batches (even 1 second helps)
  • HTTP Request node’s “Batch Size” option: Built-in batching that limits concurrent requests
// Workflow structure for rate-limited APIs:
[Trigger] → [Split In Batches (10)] → [HTTP Request] → [Wait 1s] → [Loop Back]
                                            ↓
                                      [Code: Clean Data]
                                            ↓
                                      [PostgreSQL: Store]

Pro Tip: If the API returns rate-limit headers (X-RateLimit-RemainingRetry-After), you can read these in the Code node and dynamically adjust your wait time. That’s advanced stuff, but incredibly useful for high-volume workflows.

When to Use Which Approach — Decision Framework

I know I’ve thrown a lot at you. Let me simplify the decision with this table:

ScenarioBest ApproachWhy
Fix 1-2 fieldsExpressionsNo extra node needed
Restructure/rename 3-8 fieldsEdit Fields nodeVisual, no code, easy to debug
Complex logic, filtering, loopsCode nodeFull programming power
Validate API responseCode nodeNeed conditionals and error throwing
Merge data from 2+ sourcesMerge node + Edit FieldsBuilt-in merge logic handles most cases
Split arrays into itemsSplit Out nodeOne-click, no code needed

Quick Homework (Optional, But Rewarding)

If you’ve read this far, you’re serious about learning. So here’s a mini challenge for you:

  1. Pick any free public API from this list
  2. Fetch data using HTTP Request node
  3. Clean it using at least 2 of the 3 approaches I taught
  4. Store the cleaned data in your Neon PostgreSQL database
  5. Run the workflow 3 times and verify no duplicates were created

If you get stuck, email me or drop a comment. I reply to every single one — I promise. You can also refer to my PostgreSQL operations guide for the database part.

Wrapping Up

Fetching APIs in n8n is easy. Fetching APIs properly — with authentication, pagination, error handling, and clean data pipelines — that’s where the skill is. And now you have that skill.

Let me give you the TL;DR one more time:

  • Use the HTTP Request node for any API without a dedicated n8n node
  • Always check authentication type in the API docs first
  • Handle pagination with built-in options (and always set a max limit)
  • Clean data using Expressions for quick fixes, Edit Fields for restructuring, Code node for complex transformations
  • Always validate responses before processing
  • Enable Retry on Fail for production workflows
  • Use parameterized queries when storing data in PostgreSQL

That’s it for today’s session. This one was a long one — but I genuinely believe if you practice these patterns, you’ll build automations that actually work in production, not just in tutorials. Keep grinding, keep automating. See you in the next blog!

By axiomcompute

I’m a developer who’s into tech, automation, and figuring things out in my own way. I like thinking beyond the usual approach and building systems that actually work in real life. I pick things up fast, so I’m always experimenting with new tools and ideas. Lately, I’ve also started writing blogs to share what I’m learning and building along the way.

Leave a Reply

Your email address will not be published. Required fields are marked *