In Lesson 1, you learned about three base join algorithms that SQL Server uses. You also learned that the hash join can be further optimized by performing partial joins in multiple threads, and even more with early elimination of the rows used in a join from a bigger table with bitmap filters. This is the bitmap filtered hash join. You already know that it is typically used in a data warehousing scenario, where you can have large inputs for a query and few concurrent users so that SQL Server can execute a query in parallel. This kind of join is sometimes also called a star join, named after the typical data warehousing schema that resembles
a star with one central table on the many side of relationships and multiple related surrounding tables on the one side of the relationships. The following code creates four tables in a very simple Star Schema, and populates them with a large amount of rows. It uses an auxiliary tabular function that returns a table of numbers used for population of the data warehouse tables.
— Data distribution settings for DW
DECLARE
@dim1rows AS INT = 100,
@dim2rows AS INT = 50,
@dim3rows AS INT = 200;
— First dimension
CREATE TABLE dbo.Dim1
(
key1 INT NOT NULL CONSTRAINT PK_Dim1 PRIMARY KEY,
attr1 INT NOT NULL,
filler BINARY(100) NOT NULL DEFAULT (0x)
);
— Second dimension
CREATE TABLE dbo.Dim2
(
key2 INT NOT NULL CONSTRAINT PK_Dim2 PRIMARY KEY,
attr1 INT NOT NULL,
filler BINARY(100) NOT NULL DEFAULT (0x)
);
— Third dimension
CREATE TABLE dbo.Dim3
(
key3 INT NOT NULL CONSTRAINT PK_Dim3 PRIMARY KEY,
attr1 INT NOT NULL,
filler BINARY(100) NOT NULL DEFAULT (0x)
);
— Fact table
CREATE TABLE dbo.Fact
(
key1 INT NOT NULL CONSTRAINT FK_Fact_Dim1 FOREIGN KEY REFERENCES dbo.Dim1,
key2 INT NOT NULL CONSTRAINT FK_Fact_Dim2 FOREIGN KEY REFERENCES dbo.Dim2,
key3 INT NOT NULL CONSTRAINT FK_Fact_Dim3 FOREIGN KEY REFERENCES dbo.Dim3,
measure1 INT NOT NULL,
measure2 INT NOT NULL,
measure3 INT NOT NULL,
filler BINARY(100) NOT NULL DEFAULT (0x),
CONSTRAINT PK_Fact PRIMARY KEY(key1, key2, key3)
);

— Populating the first dimension
INSERT INTO dbo.Dim1(key1, attr1)
SELECT n, ABS(CHECKSUM(NEWID())) % 20 + 1
FROM dbo.GetNums(1, @dim1rows);
— Populating the second dimension
INSERT INTO dbo.Dim2(key2, attr1)
SELECT n, ABS(CHECKSUM(NEWID())) % 10 + 1
FROM dbo.GetNums(1, @dim2rows);
— Populating the third dimension
INSERT INTO dbo.Dim3(key3, attr1)
SELECT n, ABS(CHECKSUM(NEWID())) % 40 + 1
FROM dbo.GetNums(1, @dim3rows);
— Populating the fact table
INSERT INTO dbo.Fact WITH (TABLOCK)
(key1, key2, key3, measure1, measure2, measure3)
SELECT N1.n, N2.n, N3.n,
ABS(CHECKSUM(NEWID())) % 1000000 + 1,
ABS(CHECKSUM(NEWID())) % 1000000 + 1,
ABS(CHECKSUM(NEWID())) % 1000000 + 1
FROM dbo.GetNums(1, @dim1rows) AS N1
CROSS JOIN dbo.GetNums(1, @dim2rows) AS N2
CROSS JOIN dbo.GetNums(1, @dim3rows) AS N3;
Figure 17-18 shows the schema for these four tables. Note that the schema resembles a star. This is why it is called a Star Schema.

70-461-fm21

Figure 17-18 A Star Schema example.

The following query joins all four tables and aggregates the data. STATISTICS IO and STATISTICS TIME are measured as well.
— Measuring IO and time
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
— Query demonstrating star join
SELECT D1.attr1 AS x, D2.attr1 AS y, D3.attr1 AS z,
COUNT(*) AS cnt, SUM(F.measure1) AS total
FROM dbo.Fact AS F
INNER JOIN dbo.Dim1 AS D1
ON F.key1 = D1.key1
INNER JOIN dbo.Dim2 AS D2
ON F.key2 = D2.key2
INNER JOIN dbo.Dim3 AS D3
ON F.key3 = D3.key3
WHERE D1.attr1 <= 10
AND D2.attr1 <= 15
AND D3.attr1 <= 10
GROUP BY D1.attr1, D2.attr1, D3.attr1;
The query was executed on a computer that has a quad-core processor with hyper-threading. SQL Server used eight logical processors. Figure 17-19 shows the partial execution plan of this query. Note that the query was executed in parallel (the Parallelism iterator), the hash join was used (the Hash Match iterator), and a bitmap filter was used in one case before the join was performed (the Bitmap operator). Note also that there are two other properties for the iterators in SQL Server 2012. In Figure 17-19, you can see the Actual Execution Mode and the Estimated Execution Mode properties for the Hash Match (Inner Join) operator. The value for these two properties is Row.

70-461-fm22

Figure 17-19 Partial plan of a star join query.

The abbreviated STATISTICS IO and STATISTICS TIME results for this query are as follows.
Table ‘Dim2’. Scan count 1, logical reads 2, …
Table ‘Dim3’. Scan count 1, logical reads 5, …
Table ‘Dim1’. Scan count 1, logical reads 4, …
Table ‘Worktable’. Scan count 0, logical reads 0, …
Table ‘Fact’. Scan count 47, logical reads 8152, …
Table ‘Worktable’. Scan count 0, logical reads 0, …
SQL Server Execution Times:
CPU time = 671 ms, elapsed time = 255 ms.
Note the huge amount of logical read in the dbo.Fact table. In addition, note that the CPU time was nearly three times bigger than the elapsed time for this query. Because the query was processed in parallel, the CPU burden was very high, and the CPU could become a bottleneck for this scenario. Imagine what would happen if the table were compressed as well. Then SQL Server would also need to decompress it. You could also create a columnstore index on the table. Then SQL Server would need to recreate rows for the output. Altogether, the CPU can become a bottleneck in data warehousing scenarios.
SQL Server 2012 brings a solution to the CPU burden problem. It introduces iterators that process batches of rows at a time, not just row by row. This way, the CPU needs to deal with metadata for a row only once per batch. Batch processing is orthogonal to columnstore indexes; it can support row storage as well. However, best results come with columnstore indexes.

With columnstore indexes, SQL Server can sometimes perform batch operations directly on compressed data, thus skipping the uncompressing action as well. SQL Server 2012 can mix batch and row operators and can dynamically switch from batch to row mode.
The following operators support batch mode processing in SQL Server 2012:
– Filter
-Project
– Scan
– Local hash (partial) aggregation
– Hash inner join
– Batch hash table build
In order to test the batch operations, the following code builds a columnstore index on the dbo.Fact table.
CREATE COLUMNSTORE INDEX idx_cs_fact
ON dbo.Fact(key1, key2, key3, measure1, measure2, measure3);
After creating the columnstore index, execute the same star query again.
SELECT D1.attr1 AS x, D2.attr1 AS y, D3.attr1 AS z,
COUNT(*) AS cnt, SUM(F.measure1) AS total
FROM dbo.Fact AS F
INNER JOIN dbo.Dim1 AS D1
ON F.key1 = D1.key1
INNER JOIN dbo.Dim2 AS D2
ON F.key2 = D2.key2  INNER JOIN dbo.Dim3 AS D3
ON F.key3 = D3.key3
WHERE D1.attr1 <= 10
AND D2.attr1 <= 15
AND D3.attr1 <= 10
GROUP BY D1.attr1, D2.attr1, D3.attr1;
Figure 17-20 shows the partial execution plan for this execution of the star query. Note that the query uses the Columnstore Index Scan operator. In addition, as you can see from the properties of one of the Hash Match (Inner Join) operators, SQL Server used the batch execution mode.

70-461-fm23

Figure 17-20 Partial plan of a star join query that uses a columnstore index and batch processing. Also check the abbreviated STATISTICS IO and TIME results for this execution.
Table ‘Dim3’. Scan count 1, logical reads 5, …
Table ‘Dim2’. Scan count 1, logical reads 2, …
Table ‘Dim1’. Scan count 1, logical reads 4, …
Table ‘Fact’. Scan count 8, logical reads 993,…
Table ‘Worktable’. Scan count 0, logical reads 0, …
Table ‘Worktable’. Scan count 0, logical reads 0, …
SQL Server Execution Times:
CPU time = 63 ms, elapsed time = 186 ms.
You can see that the number of logical I/Os in the dbo.Fact table is much lower than it was for the first execution, when the columnstore index did not exist. The elapsed time is smaller than it was for the first execution of the query. However, note especially the CPU time. It is approximately 10 times smaller than it was when the query was executed when the batch processing mode was not used.

The following code cleans up the TSQL2012 database and sets the STATISTICS IO and TIME
to OFF.
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
DROP TABLE dbo.Fact;
DROP TABLE dbo.Dim1;
DROP TABLE dbo.Dim2;
DROP TABLE dbo.Dim3;