I once replaced a sleepy foreach with Parallel.ForEach expecting fireworks. The only thing that lit up was my laptop fan. The app got slower. Plot twist. If you have felt that sting, this post is your friendly tour through when parallel loops rock and when they troll you.

A tiny mental model

Think of moving couches with the Fellowship. One hobbit can move a cushion fine. Ten hobbits in a narrow hallway spend more time coordinating than lifting. Parallelism buys speed only when the work per worker is chunky and independent.

What Parallel.ForEach actually does

Parallel.ForEach actually slices your input, schedules chunks on multiple threads, runs them at the same time, then stitches results back together. That stitching and scheduling cost real time and memory. So you want enough useful work per iteration to pay for the overhead.

To make the examples concrete, here is a fake CPU hungry function. It keeps the CPU busy without touching I/O.

static int ForcePush(int n)
{
var sum = 0d;
for (int i = 0; i < 8_000; i++) sum += Math.Sqrt(n + i);
return (int)sum;
}

When Parallel.ForEach helps: chunky, independent, CPU bound

If each item is expensive and does not touch shared state, parallel loops can shine. Here is a baseline that runs the work sequentially so we have something to compare.

var numbers = Enumerable.Range(1, 200);
var sw = Stopwatch.StartNew();
var total = 0;
foreach (var n in numbers)
{
total += ForcePush(n);
}
Console.WriteLine($"Seq ms {sw.ElapsedMilliseconds}, total {total}");

Now a parallel version. Note the use of Interlocked to avoid races when updating a shared total.

var numbers = Enumerable.Range(1, 200);
var sw = Stopwatch.StartNew();
var total = 0;
Parallel.ForEach(numbers, n => { Interlocked.Add(ref total, ForcePush(n)); });
Console.WriteLine($"Parallel ms {sw.ElapsedMilliseconds}, total {total}");

If your CPU has several cores, the parallel version often wins because each iteration is chunky and independent.

When it hurts: tiny work and coordination overhead

If each iteration is basically a shrug, the coordination cost dominates. The hallway is too narrow for that many hobbits.

var stuff = Enumerable.Range(1, 1_000_000);
var sw = Stopwatch.StartNew();
foreach (var _ in stuff) { /* tiny work */ }
Console.WriteLine($"Seq {sw.ElapsedMilliseconds} ms");
sw.Restart();
Parallel.ForEach(stuff, _ => { /* tiny work */ });
Console.WriteLine($"Parallel {sw.ElapsedMilliseconds} ms");

Expect sequential to win here. Threads and context switches are not free.

The shared state trap: one list to rule them all

Parallel loops that update the same object can serialize under the hood. That turns your shiny parallel loop into a slower, lock filled version of a foreach.

This is risky:

var rebels = new List<int>();
Parallel.ForEach(Enumerable.Range(1, 1000), n => rebels.Add(n));
Console.WriteLine(rebels.Count);

Safer with a thread safe collection:

var rebels = new ConcurrentBag<int>();
Parallel.ForEach(Enumerable.Range(1, 1000), n => rebels.Add(n));
Console.WriteLine(rebels.Count);

Even better, avoid contended writes. Use thread local aggregation, then merge once.

var sum = 0;
Parallel.ForEach(
Enumerable.Range(1, 1000),
() => 0,
(n, _, local) => local + n,
local => Interlocked.Add(ref sum, local)
);
Console.WriteLine(sum);

The I/O trap: faster threads do not make faster networks

Parallel.ForEach spins up work on thread pool threads. That helps when threads compute. It hurts when threads wait on sockets or disks.

Here is the foot gun: blocking on async inside Parallel.ForEach.

var http = new HttpClient();
var urls = new[] { "https://example.com", "https://example.org" };
Parallel.ForEach(urls, url =>
{
var html = http.GetStringAsync(url).Result; // blocks threads
Console.WriteLine(html.Length);
});

Prefer async concurrency. In .NET 6+, Parallel.ForEachAsync makes this simple and lets you cap concurrency.

var http = new HttpClient();
var urls = new[] { "https://example.com", "https://example.org" };
await Parallel.ForEachAsync(
urls,
new ParallelOptions { MaxDegreeOfParallelism = 8 },
async (url, ct) =>
{
var html = await http.GetStringAsync(url, ct);
Console.WriteLine(html.Length);
});

If you are on older versions or want custom control, throttle with SemaphoreSlim and Task.WhenAll.

var throttler = new SemaphoreSlim(8);
var tasks = urls.Select(async url =>
{
await throttler.WaitAsync();
try { Console.WriteLine((await http.GetStringAsync(url)).Length); }
finally { throttler.Release(); }
});
await Task.WhenAll(tasks);

Tuning without tears

A few small dials go a long way.

  • Limit workers with MaxDegreeOfParallelism to match cores or protect a resource
  • Pass a CancellationToken so you can bail out during spikes
  • Keep iterations chunky to amortize overhead
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount,
CancellationToken = cts.Token
};
Parallel.ForEach(Enumerable.Range(1, 500), options, i => ForcePush(i));

Quick checklist

  • Is each iteration CPU heavy and independent? Parallel loops can help
  • Is each iteration tiny? Sequential might be faster
  • Is there shared state? Use thread local state or thread safe types
  • Is the work I/O bound? Go async and throttle
  • Did you measure? Stopwatch or BenchmarkDotNet beats vibes

Wrap up

Parallel.ForEach is a power tool. Used on the right material it cuts build time. Used on drywall it makes a mess. Keep the work per iteration chunky, keep your state isolated, keep I/O async, and keep the number of workers reasonable. Your code will move couches like a team and your laptop fan can retire from its nightclub gig.