Skip to main content

LINQ

Definition

LINQ (Language Integrated Query) is a set of language features and APIs that allow you to query data sources (collections, databases, XML, etc.) using a uniform, type-safe syntax directly within C#. It bridges the gap between the object-oriented world and the data world by treating queries as first-class citizens in the language.

Why LINQ Exists

Problem without LINQSolution with LINQ
Different APIs for every data source (SQL, XML, collections)One unified query syntax for all sources
Manual loops and conditions are verbose and error-proneDeclarative, readable queries
Type mismatches discovered at runtimeCompile-time type checking
Database queries built with string concatenation (SQL injection risk)LINQ to Entities generates parameterized SQL
// Without LINQ — verbose, imperative
List<string> result = new List<string>();
foreach (var student in students)
{
if (student.Age > 18 && student.Grade > 80)
{
result.Add(student.Name.ToUpper());
}
}
result.Sort();

// With LINQ — declarative, concise
var result = students
.Where(s => s.Age > 18 && s.Grade > 80)
.Select(s => s.Name.ToUpper())
.OrderBy(n => n)
.ToList();
Key Benefits
  • Unified syntax across all data sources
  • Compile-time checking of queries and types
  • Declarative style — describe what you want, not how to get it
  • Composable — chain operations fluently
  • Readable — intent is clear from the query itself

Core Concepts

How LINQ Works Under the Hood

Extension Methods

All LINQ operators are extension methods defined on IEnumerable<T> and IQueryable<T> in the System.Linq namespace. This is why they appear as instance methods on any collection — the compiler resolves source.Where(...) to Enumerable.Where(source, ...).

// What you write
var result = students.Where(s => s.Grade > 90);

// What the compiler generates
var result = Enumerable.Where(students, s => s.Grade > 90);

Extension methods cannot access private members of the type they extend, and they don't modify the original type. They are purely syntactic sugar that the compiler resolves at compile time.

Query Syntax Translation

The C# compiler translates query syntax into method syntax before generating IL. Every query keyword maps to a corresponding method:

Query KeywordMethod Equivalent
where.Where()
select.Select()
orderby.OrderBy() / .OrderByDescending()
join ... on ... equals.Join()
join ... into.GroupJoin()
group ... by.GroupBy()
let.Select() with transparent identifier
// Query syntax
var query = from s in students
let average = s.Grade
where average > 90
orderby average descending
select new { s.Name, average };

// Compiler translates to (approximately)
var query = students
.Select(s => new { s, average = s.Grade })
.Where(x => x.average > 90)
.OrderByDescending(x => x.average)
.Select(x => new { x.s.Name, x.average });

Query syntax is fully removed during compilation — only method calls remain in the IL.

Iterator Pattern and Lazy Evaluation

Deferred execution works because LINQ operators return iterators built with yield return. When you call Where(), no filtering happens — it returns an iterator object that will filter elements when you enumerate it.

// Simplified implementation of Where
public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
foreach (T item in source)
{
if (predicate(item))
{
yield return item; // lazy — returns one element at a time
}
}
}

Each LINQ operator wraps the previous one, forming a chain of iterators. When you enumerate the final result, each element flows through the entire chain one at a time — pull-based, not push-based.

This means:

  • No intermediate collections are created — elements flow one at a time
  • Memory efficient — you never hold the full result set in memory (unless you call ToList())
  • Re-evaluated on every enumeration — the chain runs again from scratch each time

Chaining and Composition

Because each operator returns IEnumerable<T>, they compose naturally. The output of one becomes the input of the next, forming a pipeline.

// This creates a pipeline, not a series of collections
var pipeline = students // IEnumerable<Student>
.Where(Filter) // IEnumerable<Student> (deferred)
.OrderBy(Sort) // IOrderedEnumerable<Student> (deferred)
.Select(Project) // IEnumerable<string> (deferred)
.Take(10); // IEnumerable<string> (deferred)

// Nothing executes until enumeration
foreach (var name in pipeline) // NOW the pipeline runs end-to-end
{
Console.WriteLine(name);
}

Query Syntax vs Method Syntax

LINQ offers two equivalent syntaxes. Method syntax (fluent API) is more common in practice, but query syntax can be more readable for complex joins and groupings.

// Query syntax
var honorRoll = from s in students
where s.Grade >= 90
orderby s.Name
select new { s.Name, s.Grade };

// Method syntax (equivalent)
var honorRoll = students
.Where(s => s.Grade >= 90)
.OrderBy(s => s.Name)
.Select(s => new { s.Name, s.Grade });
Which to Use?
  • Use method syntax for most queries — it's the dominant style in .NET codebases.
  • Use query syntax when you need let, join, or group ... into — these have no direct method equivalent and require intermediate variables otherwise.

Deferred vs Immediate Execution

One of the most important concepts in LINQ is understanding when a query actually runs. LINQ operators are divided into two categories based on when they execute.

Deferred execution means the query is not run when you define it — it runs only when you enumerate the results (e.g., with foreach). The operators simply build a pipeline; nothing happens until you consume the output. This is powerful because you can compose complex queries without any cost until you actually need the data.

Immediate execution means the query runs right at the point where you define it. Operators like ToList(), Count(), First(), and Any() force the entire pipeline to execute because they need a concrete result — a list, a number, or a single element.

Deferred Execution

Most LINQ operators (Where, Select, OrderBy, SelectMany, etc.) are lazy — they don't execute until you enumerate the result.

var query = students.Where(s => s.Grade > 90); // Nothing runs yet

// Executes when enumerated
foreach (var student in query) // Query runs NOW
{
Console.WriteLine(student.Name);
}

// Executes AGAIN on second enumeration
foreach (var student in query) // Query runs AGAIN
{
Console.WriteLine(student.Name);
}

Immediate Execution

Operators that return a single value or a concrete collection execute immediately.

// Immediate — returns a concrete List<T>
var list = students.Where(s => s.Grade > 90).ToList();

// Immediate — returns a single value
int count = students.Count(s => s.Grade > 90);
Student first = students.First(s => s.Grade > 90);
bool any = students.Any(s => s.Grade > 90);
int max = students.Max(s => s.Grade);

// Immediate — aggregate
double average = students.Average(s => s.Grade);
Common Mistake

Enumerating a deferred query multiple times re-executes the entire pipeline. If the source is a database, this means multiple round trips. Cache results with ToList() or ToArray() if you need to iterate more than once.

Architecture and Providers

LINQ Provider Model

LINQ is not a single technology — it's a pattern with multiple providers that implement the same interface for different data sources.

ProviderInterfaceData SourceHow It Works
LINQ to ObjectsIEnumerable<T>In-memory collectionsExtension methods with delegates
LINQ to EntitiesIQueryable<T>Database (EF Core)Expression tree to SQL translation
LINQ to XMLIEnumerable<XElement>XML documentsTraverses XElement tree
PLINQParallelQuery<T>In-memory (parallel)Partitions work across threads
CustomIQueryable<T>Any sourceImplement IQueryProvider

How Providers Translate Queries

An IQueryable<T> has an associated IQueryProvider that decides how to execute the query:

  1. Build — each .Where(), .Select() call wraps the expression in a larger expression tree
  2. Translate — when enumeration starts, the provider traverses the expression tree and translates it (e.g., to SQL)
  3. Execute — the provider executes the translated query against the data source
  4. Materialize — results are converted back to objects
// IQueryable builds an expression tree
IQueryable<Student> query = db.Students
.Where(s => s.Grade > 90)
.OrderBy(s => s.Name)
.Select(s => s);

// The provider (EF Core) translates the expression tree to:
// SELECT * FROM Students WHERE Grade > 90 ORDER BY Name

IEnumerable vs IQueryable

IEnumerable<T> is the standard interface for in-memory collections. When you call LINQ operators on it, the compiler uses the Enumerable static class — each operator takes a delegate (Func<T, bool>) and executes the lambda directly in memory. The entire pipeline runs as .NET code.

IQueryable<T> extends IEnumerable<T> but adds an expression tree and a query provider. When you call LINQ operators on it, the compiler uses the Queryable static class — each operator takes an expression (Expression<Func<T, bool>>) and appends it to the expression tree. The provider (e.g., EF Core) later traverses this tree and translates it to SQL or another query language.

When to use which:

  • Use IEnumerable<T> when working with in-memory data (Lists, Arrays, etc.) — all LINQ to Objects queries.
  • Use IQueryable<T> when working with a remote data source (databases, APIs) — you want the server to do the filtering/sorting, not your app.
  • Switch from IQueryable to IEnumerable with .AsEnumerable() when you need to run C# code that can't be translated to SQL.
// IQueryable<T> — composable, translates to SQL
IQueryable<Student> query = db.Students.Where(s => s.Grade > 90);
query = query.OrderBy(s => s.Name); // Added to SQL expression tree
var result = query.ToList(); // Single SQL query

// IEnumerable<T> — in-memory, uses delegates
IEnumerable<Student> items = db.Students.Where(s => s.Grade > 90).AsEnumerable();
items = items.OrderBy(s => s.Name); // In-memory sort — all rows already loaded
var result = items.ToList();

Expression Trees

An expression tree is a data structure that represents code as a tree of nodes, where each node is an operation (method call, binary operator, property access, etc.). Unlike a delegate (which is executable compiled code), an expression tree can be inspected, analyzed, and translated at runtime.

This is the key mechanism that makes LINQ to Entities work: when you write s => s.Grade > 90 in an IQueryable context, the compiler doesn't compile it into executable IL — instead, it builds a tree that says "parameter s, access property Grade, compare with 90 using greater-than." The EF Core provider can then walk this tree and emit WHERE Grade > 90 in SQL.

Expression trees are also used in dynamic querying, rule engines, and building APIs that analyze user-defined filters at runtime.

// Func<T> — executable delegate (LINQ to Objects)
Func<Student, bool> predicate = s => s.Grade > 90;

// Expression<T> — data structure representing the code (LINQ to Entities)
Expression<Func<Student, bool>> predicate = s => s.Grade > 90;

// The expression tree can be inspected, modified, and translated to SQL
string body = predicate.Body.ToString(); // "s.Grade > 90"
tip

IQueryable<T>.Where() accepts Expression<Func<T, bool>>, while IEnumerable<T>.Where() accepts Func<T, bool>. The compiler automatically wraps lambdas in expressions when the target is IQueryable.

LINQ to Objects vs LINQ to Entities

AspectLINQ to ObjectsLINQ to Entities (EF Core)
SourceIEnumerable<T> in memoryIQueryable<T> mapped to database
ExecutionDelegates in memoryTranslated to SQL and executed on database
PredicateAny C# codeOnly expressions translatable to SQL
PerformanceIn-memory speedDepends on generated SQL and indexes
// LINQ to Objects — works with any C# method
var result = students.Where(s => CustomFilter(s)).ToList();

// LINQ to Entities — must be translatable to SQL
var result = db.Students
.Where(s => s.Grade > 90) // OK — translates to SQL WHERE
.Where(s => s.Name.StartsWith("A")) // OK — translates to LIKE 'A%'
.Where(s => CustomFilter(s)) // ERROR — client-side method can't be translated
.ToList();
Client-Side Evaluation

EF Core may evaluate parts of a query on the client if they can't be translated to SQL. This can cause the entire table to be loaded into memory. Watch for warnings in the console and use .ToList() explicitly at the right point to control where the SQL boundary is.

PLINQ (Parallel LINQ)

PLINQ is a parallel implementation of LINQ to Objects that distributes work across multiple CPU cores. When you add .AsParallel() to a query, the source collection is partitioned into chunks, each chunk is processed on a separate thread, and the results are merged back.

PLINQ is useful when you have a large collection and each element requires expensive CPU work (e.g., complex calculations, image processing, encryption). For small collections or cheap operations, PLINQ can actually be slower due to the overhead of partitioning, thread coordination, and merging results.

// Convert to parallel execution
var result = students
.AsParallel()
.Where(s => ExpensiveFilter(s))
.Select(s => Transform(s))
.ToList();

// With degree of parallelism
var result = students
.AsParallel()
.WithDegreeOfParallelism(4)
.Where(s => ExpensiveFilter(s));

// Preserve ordering (has performance cost)
var result = students
.AsParallel()
.AsOrdered()
.Select(s => Transform(s));

// Force sequential for parts that don't benefit from parallelism
var result = students
.AsParallel()
.Where(s => ExpensiveFilter(s))
.AsSequential()
.Take(10);
PLINQ Caveats
  • PLINQ is useful for CPU-bound operations, not I/O-bound (use Task.WhenAll instead).
  • There is overhead in partitioning, merging, and thread coordination — small or fast collections may run slower with PLINQ.
  • Thread safety is your responsibility — the lambda body must be safe for concurrent execution.
  • Order is not preserved by default — use .AsOrdered() if you need it.

Standard Query Operators

Filtering

// Where — filter by predicate
var adults = people.Where(p => p.Age >= 18);

// OfType — filter by type
var errors = mixedList.OfType<Exception>();

Projection

// Select — transform each element
var names = students.Select(s => s.Name);

// SelectMany — flatten nested collections
var allCourses = students.SelectMany(s => s.Courses);

// Anonymous type projection
var summary = students.Select(s => new { s.Name, s.Grade, Status = s.Grade >= 60 ? "Pass" : "Fail" });

Ordering

var sorted = students.OrderBy(s => s.Grade); // ascending
var sorted = students.OrderByDescending(s => s.Grade); // descending
var sorted = students
.OrderBy(s => s.LastName)
.ThenBy(s => s.FirstName); // multi-level

// Reverse
var reversed = students.Reverse();

Aggregation

int count = students.Count();
int count = students.Count(s => s.Grade > 90);
int sum = orders.Sum(o => o.Total);
double avg = students.Average(s => s.Grade);
int min = students.Min(s => s.Grade);
int max = students.Max(s => s.Grade);

// Aggregate — custom accumulation
string combined = words.Aggregate((a, b) => a + ", " + b);

// Aggregate with seed
int totalLength = words.Aggregate(0, (sum, word) => sum + word.Length);

Grouping

// Group by property
var byGrade = students.GroupBy(s => s.Grade >= 90 ? "A" : "Other");

foreach (var group in byGrade)
{
Console.WriteLine($"Group: {group.Key}, Count: {group.Count()}");
}

// GroupBy with projection
var byDept = employees
.GroupBy(e => e.Department, e => e.Name);

// Query syntax grouping
var grouped = from s in students
group s by s.Major into g
select new { Major = g.Key, Count = g.Count(), AvgGrade = g.Average(s => s.Grade) };

Joining

// Inner join (method syntax)
var result = students.Join(
enrollments,
s => s.Id,
e => e.StudentId,
(s, e) => new { s.Name, e.CourseId });

// Inner join (query syntax)
var result = from s in students
join e in enrollments on s.Id equals e.StudentId
select new { s.Name, e.CourseId };

// Group join — produces hierarchical results
var result = students.GroupJoin(
enrollments,
s => s.Id,
e => e.StudentId,
(s, es) => new { Student = s, Enrollments = es });

// Left outer join
var result = from s in students
join e in enrollments on s.Id equals e.StudentId into se
from e in se.DefaultIfEmpty()
select new { s.Name, CourseId = e?.CourseId };

Set Operations

var a = new[] { 1, 2, 3, 4 };
var b = new[] { 3, 4, 5, 6 };

a.Union(b); // { 1, 2, 3, 4, 5, 6 }
a.Concat(b); // { 1, 2, 3, 4, 3, 4, 5, 6 }
a.Intersect(b); // { 3, 4 }
a.Except(b); // { 1, 2 }
a.Distinct(); // removes duplicates

Element Operators

Student first = students.First(); // throws if empty
Student? first = students.FirstOrDefault(); // returns default if empty
Student last = students.Last(s => s.Grade > 90);
Student single = students.Single(s => s.Id == 42); // throws if 0 or 2+ matches
Student? single = students.SingleOrDefault(s => s.Id == 42);

Quantifiers

bool anyHigh = students.Any(s => s.Grade > 90); // at least one matches
bool allPassed = students.All(s => s.Grade >= 60); // every element matches
bool contains = grades.Contains(95); // specific element exists

Partitioning

var top10 = students.OrderByDescending(s => s.Grade).Take(10);
var skip5 = students.Skip(5);
var page2 = students.Skip(20).Take(10); // pagination

// TakeWhile / SkipWhile
var good = grades.TakeWhile(g => g >= 60); // take until condition fails
var rest = grades.SkipWhile(g => g >= 60); // skip until condition fails

Generation

// DefaultIfEmpty — provides a default element for empty sequences
var result = students.Where(s => s.Grade > 100).DefaultIfEmpty();

// Range (static method)
var numbers = Enumerable.Range(1, 10); // { 1, 2, 3, ..., 10 }

// Repeat
var zeros = Enumerable.Repeat(0, 5); // { 0, 0, 0, 0, 0 }

// Empty
var empty = Enumerable.Empty<Student>();

Performance Considerations

Avoid Multiple Enumeration

// Bad — query executes twice
var query = GetExpensiveQuery();
var count = query.Count(); // Executes once
var list = query.ToList(); // Executes AGAIN

// Good — materialize once
var list = GetExpensiveQuery().ToList();
var count = list.Count;

Use the Right Operator

// Bad — iterates entire collection to check existence
bool exists = items.Where(x => x.Id == target).Any();

// Good — stops at first match
bool exists = items.Any(x => x.Id == target);

Be Careful with Index-Based Access

// Bad — O(n) for each access with IEnumerable
var first = items.ElementAt(0); // iterates from start

// Good — use IList<T> or List<T> for indexed access
List<T> list = items.ToList();
var first = list[0]; // O(1)

Optimize Count Checks

// Bad — iterates entire collection
bool hasItems = items.Count() > 0;

// Good — stops immediately
bool hasItems = items.Any();

Query Optimization for EF Core

// Bad — N+1 problem
var orders = db.Orders.ToList(); // loads ALL orders
foreach (var order in orders)
{
var items = db.Items.Where(i => i.OrderId == order.Id).ToList(); // query per order
}

// Good — eager loading with Include
var orders = db.Orders.Include(o => o.Items).ToList(); // single query with JOIN

// Good — projection to only what you need
var orderSummaries = db.Orders
.Where(o => o.Date >= startDate)
.Select(o => new { o.Id, o.Total, ItemCount = o.Items.Count })
.ToList();

When to Use

  • In-memory data transformation — filtering, sorting, projecting collections of any size
  • Database queries — EF Core LINQ translates to SQL, giving type-safe database access
  • Data aggregation — counting, summing, averaging, grouping across datasets
  • Complex joins — combining multiple data sources with inner/left outer/group joins
  • PaginationSkip + Take for any data source
  • Set operations — union, intersection, difference between collections
  • Avoid when a simple foreach loop is clearer for trivial single-step operations with no transformation

Common Pitfalls

Modifying Collections During Enumeration

// Bad — throws InvalidOperationException
foreach (var item in list)
{
if (item.ShouldBeRemoved)
list.Remove(item); // Cannot modify collection during enumeration
}

// Good — materialize removals
var toRemove = list.Where(x => x.ShouldBeRemoved).ToList();
toRemove.ForEach(x => list.Remove(x));

// Good — create a new filtered list
var filtered = list.Where(x => !x.ShouldBeRemoved).ToList();

Capturing Loop Variables

// Bad — all lambdas capture the SAME variable
var actions = new List<Action>();
for (int i = 0; i < 5; i++)
{
actions.Add(() => Console.WriteLine(i)); // captures variable 'i', not its value
}
actions.ForEach(a => a()); // prints 5, 5, 5, 5, 5

// Good — copy to local variable
for (int i = 0; i < 5; i++)
{
int copy = i;
actions.Add(() => Console.WriteLine(copy));
}

Unexpected Null Results

// FirstOrDefault can return null — always handle it
var student = students.FirstOrDefault(s => s.Id == 999);
Console.WriteLine(student.Name); // NullReferenceException if not found

// Safe approach
var student = students.FirstOrDefault(s => s.Id == 999);
if (student is not null)
{
Console.WriteLine(student.Name);
}

// Or use pattern matching
if (students.FirstOrDefault(s => s.Id == 999) is { Name: var name })
{
Console.WriteLine(name);
}

Chaining After Materialization

// Confusing — AsEnumerable() switches to client-side, then ToList() is redundant
var result = db.Students
.Where(s => s.Grade > 90)
.AsEnumerable()
.Where(s => CustomFilter(s)) // client-side filter is fine
.ToList();

// Wrong order — filters in memory instead of database
var result = db.Students
.ToList() // loads ENTIRE table into memory
.Where(s => s.Grade > 90); // filters in memory

Key Takeaways

  • LINQ provides a unified, declarative query syntax that works across in-memory collections, databases, XML, and more.
  • All LINQ operators are extension methods — the compiler translates query syntax to method calls.
  • Deferred execution works via iterator chains (yield return) — elements flow one at a time through the pipeline with no intermediate collections.
  • Understand deferred vs immediate execution — deferred queries re-execute on every enumeration.
  • Use method syntax for most queries; use query syntax for complex joins and groupings.
  • Know the difference between IEnumerable<T> (in-memory, delegates) and IQueryable<T> (translatable, expression trees).
  • LINQ providers implement the same interface for different data sources — the same query can target in-memory collections, databases, or XML.
  • Avoid the N+1 query problem in EF Core by using .Include() or projection.
  • Use .Any() instead of .Count() > 0 and .ToList() to avoid multiple enumeration.

Interview Questions

Q: What is the difference between deferred and immediate execution in LINQ? Deferred execution means the query is not evaluated until you enumerate the results (e.g., Where, Select, OrderBy). Immediate execution means the query runs right away (e.g., ToList, Count, First, Any). Deferred queries re-execute every time they are enumerated.

Q: What is the difference between IEnumerable<T> and IQueryable<T>? IEnumerable<T> executes queries in memory using delegates (LINQ to Objects). IQueryable<T> represents a query as an expression tree that a provider (like EF Core) can translate to SQL and execute on the database.

Q: What are expression trees and why do they matter in LINQ? Expression trees are data structures that represent code as traversable objects. In LINQ, they allow IQueryable providers to inspect lambda expressions and translate them into other languages like SQL, rather than executing them directly. IEnumerable uses Func<T> (executable delegates), while IQueryable uses Expression<Func<T>> (inspectable trees).

Q: What is the N+1 query problem and how do you solve it? The N+1 problem occurs when you load a list of entities (1 query) and then lazily load a related entity for each one (N queries). Solve it by using .Include() for eager loading, projection with .Select(), or splitting queries with .AsSplitQuery().

Q: What is the difference between First() and SingleOrDefault()? First() returns the first matching element and throws if the sequence is empty. SingleOrDefault() returns the only matching element, the default if none match, and throws if more than one matches. Use FirstOrDefault() when you expect zero or one result and want to handle the absence gracefully.

Q: What is the difference between Select and SelectMany? Select maps each element to a new value (1-to-1). SelectMany flattens nested collections — it maps each element to a sequence and then flattens all sequences into one (1-to-many).

Q: How does deferred execution work internally? LINQ operators return iterators built with yield return. Each call wraps the previous iterator, forming a chain. No code runs until you enumerate — then each element is pulled through the chain one at a time. This means no intermediate collections and memory-efficient streaming, but re-execution on every enumeration.

Q: What is PLINQ and when should you use it? PLINQ (AsParallel()) parallelizes LINQ queries across multiple CPU cores. Use it for CPU-bound operations on large in-memory collections where the per-element work is expensive. Avoid it for I/O-bound work (use Task.WhenAll) and small collections where overhead exceeds benefit.

References