Shopify Integrations: The Stuff Nobody Warns You About

So after I’ve been in software dev for about 10 years now, mostly on product teams, last few years I’ve been deep in Shopify — and honestly, it’s the most interesting platform I’ve worked with from a DX perspective. Completely different from anything else I’ve done.

Most of my day work is integrations. Syncs between Shopify and ERPs, CRMs, fulfillment systems. In short, third party to third party.

It’s creative work. Sometimes frustrating. Sometimes hilarious.

Like when you get a response with status code 200… and the error message is hiding in body.data.error.message.

What integration work actually looks like

When people hear “Shopify integration,” they picture a REST API call and a webhook. Easy, right?

Here’s what it actually looks like for a single order:

Shopify fires an orders/create webhook
You validate the HMAC signature
You don’t process the order immediately — you enqueue a background task with a delay (because the webhook payload is a snapshot that’s already stale)
The task fires, you fetch the fresh order from Shopify’s GraphQL API
You check if it needs VAT validation — if yes, you parse the VAT ID from the order’s event messages and call an external validation service
You transform the order into whatever format the ERP expects
You send it to the ERP
If it fails, the task queue retries automatically with backoff
You log everything, because you will need it at 2 AM

That’s the happy path. Nine steps for a single order. And that’s just create — now add fulfillment syncs, inventory updates, customer data, price lists, and returns flowing in both directions.

The webhook-to-task pattern

One thing I learned early: never process a webhook synchronously. The payload is already outdated by the time you receive it. We enqueue everything through a task queue with a short delay:

// webhook handler — don't do the work here
async onOrderCreated(shop, webhookPayload) {
  await taskQueue.enqueue({
    queue: "order-create",
    endpoint: "/orders/process",
    payload: { shop, order: webhookPayload },
    delay: 120,                              // seconds — let Shopify finalize
    taskId: hash(endpoint + payload),        // same webhook twice? same task ID → ignored
  });
}

The 2-minute delay is key. It gives Shopify time to finalize the order — payment capture, fraud analysis, tag rules. By the time we process it, we’re working with the real state, not a snapshot.

The task ID is a hash of the endpoint + payload. If the same webhook fires twice (which happens), the second enqueue is silently ignored. No extra code needed.

Throttling that handles itself

Shopify’s GraphQL API will throttle you when you go too fast. Instead of building a manual rate limiter, we baked retry logic into a wrapper that sits around every single GraphQL operation:

// wraps every shopify graphql call automatically
async withRetry(action) {
  try {
    return await action();
  } catch (err) {
    if (isThrottleError(err)) {
      const wait = err.retryAfter ?? 2 + Math.random() * 3;  // jitter prevents stampede
      await sleep(wait * 1000);
      return withRetry(action);  // try again
    }
    throw err;  // not a throttle? let it blow up
  }
}

The jitter (Math.random() * 3) prevents thundering herd when multiple tasks hit the limit simultaneously. Every GraphQL call — admin, storefront, queries, mutations — goes through this wrapper. You never have to think about throttling in application code.

For heavier workloads (bulk syncs of thousands of products), we add another layer — exponential backoff with configurable retries and per-request timeouts. Think: 5s → 10s → 20s → 40s, with a max of 5 attempts before giving up.

The ERP side is where it gets wild

Shopify’s API is well-documented, versioned, and mostly predictable. The systems you’re integrating with are a different story.

I’ve worked with ERPs that:

Return XML wrapped in a JSON string
Use SOAP in 2026 (yes, still SOAP)
Have a “sandbox” that’s actually last month’s production snapshot
Document an endpoint that doesn’t exist yet
Wrap arrays in unpredictable ways — sometimes items is an array, sometimes it’s { value: { value: [...] } }

That last one was fun. We actually had to write a recursive unwrapper because one ERP’s responses are structurally unpredictable:

// the ERP returns arrays in... creative ways
function findArraySomewhere(response) {
  if (Array.isArray(response)) return response;              // lucky day

  for (const val of Object.values(response)) {
    if (Array.isArray(val)) return val;                      // found at depth 1
    if (typeof val === "object" && val !== null) {
      for (const inner of Object.values(val)) {
        if (Array.isArray(inner)) return inner;              // found at depth 2
      }
    }
  }

  log.error("could not find array in response, returning []");
  return [];
}

This runs in production. Every day.

What I’ve learned

The most important thing in integration work isn’t the code. It’s observability. You need to know what happened, when, and why. Every sync, every API call, every transformation — logged and traceable. We even built safe logging utilities that handle circular references and redact sensitive fields automatically.

The second most important thing is idempotency. Every operation should be safe to retry. Because it will be retried, whether you planned for it or not. Our task architecture makes this explicit — we have two special exception types: one that says “done, don’t retry” (returns 200), and another that says “failed, please try again” (returns 503). Everything else is a real error.

And the third? Communication. Half of integration bugs aren’t bugs at all — they’re misunderstandings between teams about what a field means or when a webhook fires. A 30-minute call with the other team’s developer saves you 3 days of debugging.

I’ll be sharing more of these patterns — the actual thinking, the actual failures, how AI is changing the way we build these systems. The stuff you only learn after your first sync blows up production on a Friday afternoon.