Sometimes you need join, sometimes you can use where
This blog post shows the way Linq2Sql translates language elements that look like Join statements. (I chose that way of phrasing it because not all queries result in the JOIN command in SQL.) So, once again, I’m going to skimp on the output, instead concentrate on the code and the SQL query.
There is one other important key concept here. Some of the queries use ‘where’ clauses. And some use ‘join … in … on … equals .. into …” clauses. The difference is simple. You can use ‘where’ when your db to object mapping classes have defined the entity relationships. You must use join when you create queries that don’t have that built in relationship.
So, the first Join sample finds all the orders for customers in a given city (London).
var q =
from c in db.Customers
from o in c.Orders
where c.City == "London"
select o;
That query uses the key navigation to find customers in the target city, and then retrieve all their orders. Here’s the query:
SELECT [t1].[OrderID], [t1].[CustomerID],
[t1].[EmployeeID], [t1].[OrderDate],
[t1].[RequiredDate], [t1].[ShippedDate],
[t1].[ShipVia], [t1].[Freight], [t1].[ShipName],
[t1].[ShipAddress], [t1].[ShipCity],
[t1].[ShipRegion], [t1].[ShipPostalCode],
[t1].[ShipCountry]
FROM [Customers] AS [t0], [Orders] AS [t1]
WHERE ([t0].[City] = @p0) AND ([t1].[CustomerID] = [t0].[CustomerID])
Which produces the following text:
OrderID=10289 CustomerID=BSBEV
EmployeeID=7 OrderDate=8/26/1996
RequiredDate=9/23/1996 ShippedDate=8/28/1996
ShipVia=3 Freight=22.7700 ShipName=B's Beverages
ShipAddress=Fauntleroy Circus ShipCity=London
ShipRegion=null ShipPostalCode=EC2 5NT
ShipCountry=UK OrderDetails=... Customer={ } Employee={ } Shipper={ }
Well, obviously, if you can look at one field, you can look at multiple fields. This query looks at the suppliers table and the products table, matching on the supplier ID. The goal is to find all products from suppliers in the US, where the product is in stock.
var q =
from p in db.Products
where p.Supplier.Country == "USA" && p.UnitsInStock == 0
select p;
Producing this SQL:
SELECT [t0].[ProductID], [t0].[ProductName],
[t0].[SupplierID], [t0].[CategoryID],
[t0].[QuantityPerUnit], [t0].[UnitPrice],
[t0].[UnitsInStock], [t0].[UnitsOnOrder],
[t0].[ReorderLevel], [t0].[Discontinued]
FROM [Products] AS [t0]
LEFT OUTER JOIN [Suppliers] AS [t1]
ON [t1].[SupplierID] = [t0].[SupplierID]
WHERE ([t1].[Country] = @p0) AND ([t0].[UnitsInStock] = @p1)
And this output:
ProductID=5 ProductName=Chef Anton's Gumbo Mix SupplierID=2
CategoryID=2 QuantityPerUnit=36 boxes
UnitPrice=21.3500 UnitsInStock=0 UnitsOnOrder=0
ReorderLevel=0 Discontinued=True OrderDetails=...
Category={ } Supplier={ }
This is a similar variation, but the where clause matches multiple records (multiple employees are from Seattle), and an employee may have multiple territories.
var q =
from e in db.Employees
from et in e.EmployeeTerritories
where e.City == "Seattle"
select new {e.FirstName, e.LastName, et.Territory.TerritoryDescription};
And LINQ turns that C# into the following SQL:
SELECT [t0].[FirstName], [t0].[LastName], [t3].[TerritoryDescription]
FROM [Employees] AS [t0]
CROSS APPLY ((
SELECT [t1].[TerritoryID]
FROM [EmployeeTerritories] AS [t1]
WHERE ([t0].[City] = @p0) AND ([t1].[EmployeeID] = [t0].[EmployeeID])
) AS [t2]
LEFT OUTER JOIN [Territories] AS [t3] ON [t3].[TerritoryID] = [t2].[TerritoryID])
And the output is:
FirstName=Nancy LastName=Davolio TerritoryDescription=Wilton
FirstName=Nancy LastName=Davolio TerritoryDescription=Neward
FirstName=Laura LastName=Callahan TerritoryDescription=Philadelphia
FirstName=Laura LastName=Callahan TerritoryDescription=Beachwood
FirstName=Laura LastName=Callahan TerritoryDescription=Findlay
FirstName=Laura LastName=Callahan TerritoryDescription=Racine
This next Join query finds pairs of employees where the second employee reports to the first employee, and both are from the same city.
var q =
from e1 in db.Employees
from e2 in e1.Employees
where e1.City == e2.City
select new {
FirstName1 = e1.FirstName, LastName1 = e1.LastName,
FirstName2 = e2.FirstName, LastName2 = e2.LastName,
e1.City
};
For those happier with SQL than C# (strange folk you may be)
SELECT [t0].[FirstName] AS [FirstName1], [t0].[LastName] AS [LastName1],
[t1].[FirstName] AS [FirstName2], [t1].[LastName] AS [LastName2],
[t0].[City]
FROM [Employees] AS [t0], [Employees] AS [t1]
WHERE ([t0].[City] = [t1].[City]) AND
([t1].[ReportsTo] = [t0].[EmployeeID])
And, the output:
FirstName1=Steven LastName1=Buchanan FirstName2=Michael
LastName2=Suyama City=London
FirstName1=Steven LastName1=Buchanan FirstName2=Robert
LastName2=King City=London
FirstName1=Steven LastName1=Buchanan FirstName2=Anne
LastName2=Dodsworth City=London
This next query executes an explicit join, and prints the number of orders from each customer:
var q =
from c in db.Customers
join o in db.Orders on c.CustomerID equals o.CustomerID into orders
select new {c.ContactName, OrderCount = orders.Count()};
The SQL is:
SELECT [t0].[ContactName], (
SELECT COUNT(*)
FROM [Orders] AS [t1]
WHERE [t0].[CustomerID] = [t1].[CustomerID]
) AS [OrderCount]
FROM [Customers] AS [t0]
Note that the results set only contains those columns you requested:
ContactName=Maria Anders OrderCount=6
ContactName=Ana Trujillo OrderCount=4
Etc.
And, if you can join two tables, you can join three (or more):
var q =
from c in db.Customers
join o in db.Orders on c.CustomerID equals o.CustomerID into ords
join e in db.Employees on c.City equals e.City into emps
select new {c.ContactName, ords=ords.Count(), emps=emps.Count()};
That code joins the customers, orders, and employees table in order to produce the customer’s contact name, the number of orders from that customer, and the number of employees in the same city as the customer. Which looks like this in SQL:
SELECT [t0].[ContactName], (
SELECT COUNT(*)
FROM [Orders] AS [t1]
WHERE [t0].[CustomerID] = [t1].[CustomerID]
) AS [ords], (
SELECT COUNT(*)
FROM [Employees] AS [t2]
WHERE [t0].[City] = [t2].[City]
) AS [emps]
FROM [Customers] AS [t0]
And these results:
ContactName=Maria Anders ords=6 emps=0
ContactName=Ana Trujillo ords=4 emps=0
Etc.
Now, it gets a bit more complicated. This next query enumerates all the employees, joined to all that employee’s orders. If there are any orders, the result contains the employee’s first and last name, and the order. If there are no orders, you have the employee’s first and last name, concatenated with an empty order.
var q =
from e in db.Employees
join o in db.Orders on e equals o.Employee into ords
from o in ords.DefaultIfEmpty()
select new {e.FirstName, e.LastName, Order = o};
The SQL is getting more complicated, and here it is:
SELECT [t0].[FirstName], [t0].[LastName],
[t2].[test], [t2].[OrderID], [t2].[CustomerID],
[t2].[EmployeeID], [t2].[OrderDate], [t2].[RequiredDate],
[t2].[ShippedDate], [t2].[ShipVia], [t2].[Freight],
[t2].[ShipName], [t2].[ShipAddress], [t2].[ShipCity],
[t2].[ShipRegion], [t2].[ShipPostalCode], [t2].[ShipCountry]
FROM [Employees] AS [t0]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t1].[OrderID], [t1].[CustomerID],
[t1].[EmployeeID], [t1].[OrderDate], [t1].[RequiredDate],
[t1].[ShippedDate], [t1].[ShipVia], [t1].[Freight],
[t1].[ShipName], [t1].[ShipAddress], [t1].[ShipCity],
[t1].[ShipRegion], [t1].[ShipPostalCode], [t1].[ShipCountry]
FROM [Orders] AS [t1]
) AS [t2] ON [t0].[EmployeeID] = [t2].[EmployeeID]
I’m eliding the results, because it’s really long and boring. So, let’s go on to the next query, which shows how you can create new columns in the result set using some operation on existing columns:
var q =
from c in db.Customers
join o in db.Orders on c.CustomerID
equals o.CustomerID into ords
let z = c.City + c.Country
from o in ords
select new {c.ContactName, o.OrderID, z};
Which produces this slightly simpler SQL:
SELECT [t1].[ContactName], [t2].[OrderID], [t1].[value] AS [z]
FROM (
SELECT [t0].[City] + [t0].[Country] AS [value],
[t0].[CustomerID], [t0].[ContactName]
FROM [Customers] AS [t0]
) AS [t1]
CROSS JOIN [Orders] AS [t2]
WHERE [t1].[CustomerID] = [t2].[CustomerID]
And, if you need to join tables by comparing more than one column, it’s easy. You create compatible anonymous types and compare them:
var q =
from o in db.Orders
from p in db.Products
join d in db.OrderDetails
on new {o.OrderID, p.ProductID} equals new {d.OrderID, d.ProductID}
into details
from d in details
select new {o.OrderID, p.ProductID, d.UnitPrice};
Which expands into this SQL:
SELECT [t0].[OrderID], [t1].[ProductID], [t2].[UnitPrice]
FROM [Orders] AS [t0], [Products] AS [t1], [Order Details] AS [t2]
WHERE ([t0].[OrderID] = [t2].[OrderID]) AND ([t1].[ProductID] = [t2].[ProductID])
Finally, there’s one wrinkle on the JOIN related methods. The following query shows how you need to modify the C# query when one of the columns (Orders.EmployeeID) may be null. You need to cast the other side of the comparison to an appropriate nullable type.
var q =
from o in db.Orders
join e in db.Employees
on o.EmployeeID equals (int?)e.EmployeeID into emps
from e in emps
select new {o.OrderID, e.FirstName};
Which becomes:
SELECT [t0].[OrderID], [t1].[FirstName]
FROM [Orders] AS [t0], [Employees] AS [t1]
WHERE [t0].[EmployeeID] = [t1].[EmployeeID]
Join commands in LinqToSql exist so you can perform queries and projections outside of your original data relationships. And, because the Linq2Sql libraries write the T-SQL for you, you don't have to break apart everything, and create new queries and stored procs. Cool.
Next up: Ordering and Grouping.