Rabu, 05 Januari 2011

SQL Server: Calculating Running Totals, Subtotals and Grand Total Without a Cursor

Sample Data Used by Examples

Prior to showing you my examples, I will first need to establish a set of test data, which all my examples will use. My test data will consist of an "Orders" table with the following format:
create table Orders
(OrderID int identity,
 OrderAmt Decimal(8,2),
 OrderDate SmallDatetime)
I've populated this test Orders table with the following set of records:
OrderID     OrderAmt   OrderDate                                              
----------- ---------- --------------------
1           10.50      2003-10-11 08:00:00
2           11.50      2003-10-11 10:00:00
3           1.25       2003-10-11 12:00:00
4           100.57     2003-10-12 09:00:00
5           19.99      2003-10-12 11:00:00
6           47.14      2003-10-13 10:00:00
7           10.08      2003-10-13 12:00:00
8           7.50       2003-10-13 19:00:00
9           9.50       2003-10-13 21:00:00
All my examples will be using this table to produce the running totals, sub totals, and grand total reports shown below. Basically this table contains a number of different orders that where created over time. Each order has an ID (OrderID) which uniquely identifies each record, an order amount (OrderAmt) that holds a decimal amount for the order, and a timestamp (OrderDate) that identifies when the order was placed.

Running Total On Each Record

This first example will display a simple method of calculating the running total of the OrderAmt. The calculated running total will be displayed along with each record in the Orders table. The "Running Total" column will be created with a simple SELECT statement and a correlated sub query. The correlated sub query is the part of the statement that does the heavy lifting to produce the running total.
select OrderId, OrderDate, O.OrderAmt
 ,(select sum(OrderAmt) from Orders 
                          where OrderID <= O.OrderID)
   'Running Total'
from Orders O
When I run this query against my Orders table I get the following results:
OrderId     OrderDate            OrderAmt   Running Total                            
----------- -------------------- ---------- ------------- 
1           2003-10-11 08:00:00  10.50      10.50
2           2003-10-11 10:00:00  11.50      22.00
3           2003-10-11 12:00:00  1.25       23.25
4           2003-10-12 09:00:00  100.57     123.82
5           2003-10-12 11:00:00  19.99      143.81
6           2003-10-13 10:00:00  47.14      190.95
7           2003-10-13 12:00:00  10.08      201.03
8           2003-10-13 19:00:00  7.50       208.53
9           2003-10-13 21:00:00  9.50       218.03
As you can see, there is a "Running Total" column that displays the running total along with the other column information associated with each Orders table record. This running total column is calculated, by summing up the OrderAmt for all Orders where the OrderID is less than or equal to the OrderID of the current ID being displayed.

Running Total for Each OrderDate

This example is similar to the one above, but I will calculate a running total for each record, but only if the OrderDate for the records are on the same date. Once the OrderDate is for a different day, then a new running total will be started and accumulated for the new day. Here is the code to accomplish this:
select O.OrderId,
    convert(char(10),O.OrderDate,101) as 'Order Date',
    (select sum(OrderAmt) from Orders 
                          where OrderID <= O.OrderID and 
                             = convert(char(10),O.OrderDate,101))
                                'Running Total' 
from Orders O
  order by OrderID
Here are the results returned from the query using my sample Orders Table:
OrderId     Order Date OrderAmt   Running Total                            
----------- ---------- ---------- ---------------
1           10/11/2003 10.50      10.50
2           10/11/2003 11.50      22.00
3           10/11/2003 1.25       23.25
4           10/12/2003 100.57     100.57
5           10/12/2003 19.99      120.56
6           10/13/2003 47.14      47.14
7           10/13/2003 10.08      57.22
8           10/13/2003 7.50       64.72
9           10/13/2003 9.50       74.22
Note that the "Running Total" starts out with a value of 10.50, and then becomes 22.00, and finally becomes 23.25 for OrderID 3, since all these records have the same OrderDate (10/11/2003). But when OrderID 4 is displayed the running total is reset, and the running total starts over again. This is because OrderID 4 has a different date for its OrderDate, then OrderID 1, 2, and 3. Calculating this running total for each unique date is once again accomplished by using a correlated sub query, although an extra WHERE condition is required, which identified that the OrderDate's on different records need to be the same day. This WHERE condition is accomplished by using the CONVERT function to truncate the OrderDate into a MM/DD/YYYY format.

Running Totals With Subtotals and Grand totals

In this example, I will calculate a single sub totals for all Orders that were created on the same day and a Grand Total for all Orders. This will be done using a CASE clause in the SELECT statement. Here is my example.
select O.OrderID,convert(char(10),O.OrderDate,101) 'Order Date',O.OrderAmt, 
       case when OrderID = (select top 1 OrderId from Orders 
                           where convert(char(10),OrderDate,101)
                              = convert(char(10),O.OrderDate,101)
                          order by OrderID desc)
            then (select cast(sum(OrderAmt) as char(10)) 
                     from Orders
                     where OrderID <= O.OrderID
                        and convert(char(10),OrderDate,101)
                           = convert(char(10),O.OrderDate,101))
            else ' ' end as 'Sub Total',
       case when OrderID = (select top 1 OrderId from Orders 
                           order by OrderDate desc)
            then (select cast(sum(OrderAmt) as char(10)) 
                      from Orders) 
             else ' ' end as 'Grand Total'
from Orders O
  order by OrderID
Output from the SELECT statement looks like this:
OrderID     Order Date OrderAmt   Sub Total  Grand Total 
----------- ---------- ---------- ---------- ----------- 
1           10/11/2003 10.50                           
2           10/11/2003 11.50                           
3           10/11/2003 1.25       23.25                
4           10/12/2003 100.57                          
5           10/12/2003 19.99      120.56               
6           10/13/2003 47.14                           
7           10/13/2003 10.08                           
8           10/13/2003 7.50                            
9           10/13/2003 9.50       74.22      218.03
In this example the first CASE statement controls the printing of the "Sub Total' column. As you can see, the sub total is printed only on the last order of the day, which is determined by using a correlated sub query. The second CASE statement prints the "Grand Total", which is only printed along with the very last order. Each of these CASE statements uses the TOP clause to determine which OrderID is the correct order for which to print out the "Grand Total".

 For example:
day   sales   cumu_total
1     120     120
2     60      180
3     125     305
4     40      345
In order to calculte the cumulative total I need to know the previous cumulative total (cumulative total could be any other calculation). I did this in MySQL by using variables to temporarily hold values between rows but SQL server doesn't assinging variables in retrieval statements. Is it possible to do this in a Select statement?"

The answer to the questions is "yes", you can solve this problem with a single SELECT statement. Instead of just showing the solution that came to mind, though, I want to take a look at three ways to solve the problem.

Creating the Test Data

I used the code shown below to create the base table and a few thousand rows of test data. It is important to add enough data so that the efficiency of the three solutions can be accurately measured. In other words, with only a few rows of data they all seem efficient.
CREATE TABLE Sales (DayCount smallint, Sales money)
CREATE CLUSTERED INDEX ndx_DayCount ON Sales(DayCount)
INSERT Sales VALUES (1,120)
INSERT Sales VALUES (2,60)
INSERT Sales VALUES (3,125)
INSERT Sales VALUES (4,40)

DECLARE @DayCount smallint, @Sales money
SET @DayCount = 5
SET @Sales = 10

WHILE @DayCount < 5000
  INSERT Sales VALUES (@DayCount,@Sales)
  SET @DayCount = @DayCount + 1
  SET @Sales = @Sales + 15

The Three Solutions

The three different solutions I tested are shown below. The execution time with and without a clustered index on DayCount is shown at the top of each batch.

Solution 1: Temp Table/Cursor

(NoIndex = 2 secs, Index = 2 secs)
CREATE TABLE #Sales (DayCount smallint, Sales money, RunningTotal money)

DECLARE @DayCount smallint,
        @Sales money,
        @RunningTotal money

SET @RunningTotal = 0

SELECT DayCount, Sales
FROM Sales

OPEN rt_cursor

FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales

  SET @RunningTotal = @RunningTotal + @Sales
  INSERT #Sales VALUES (@DayCount,@Sales,@RunningTotal)
  FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales

CLOSE rt_cursor
DEALLOCATE rt_cursor



Solution 2: The "Celko" Solution

(NoIndex = 25 secs, Index = 20 secs)
SELECT DayCount,
       Sales+COALESCE((SELECT SUM(Sales) 
                      FROM Sales b 
                      WHERE b.DayCount < a.DayCount),0)
                         AS RunningTotal
FROM Sales a

Solution 3: The "Guru's Guide" Solution

(NoIndex = 38 secs, Index = 17 secs)
SELECT a.DayCount,
FROM Sales a
WHERE (b.DayCount <= a.DayCount) AS RunningTotal
GROUP BY a.DayCount,a.Sales
ORDER BY a.DayCount,a.Sales
For any given account, you sum the debits (deposits) and credit (withdrawals) at a given point in time. After each transaction, you want to know the current balance. Listing A creates a simple example of such a table.
Here are sample rows:
1     2006-11-03 02:33:42.340 10000.00
2     2006-11-03 02:34:50.467 -500.00
3     2006-11-03 02:35:04.857 250.00
4     2006-11-03 02:42:19.763 -124.25 Since the date is defaulted, all you need to do is add a few amounts. The example keeps it simple, assuming only one bank account.

Now you can create the query that contains the current balance. Since you are recording deposits and withdrawals in the same column as negatives and positives, the sum is straightforward. To derive the current balance, you sum all the previous transactions and add this sum to the value of the current transaction.
The following query accomplishes this:
(SELECT SUM(amount)
FROM dbo.bankaccount as D1
WHERE D1.transactiondatetime <= D0.transactiondatetime) AS balance
FROM dbo.bankaccount AS D0 This results in the following result set:
1     2006-11-03 02:33:42.340 10000.00    10000.00
2     2006-11-03 02:34:50.467 -500.00     9500.00
3     2006-11-03 02:35:04.857 250.00      9750.00
4     2006-11-03 02:42:19.763 -124.25     9625.75 As this example demonstrates, running totals are simple to create once you understand the requirements. The example presented assumes that the table contains only one account, but it's easy to deal with many accounts—you would just need to add a column for BankAccountNumber and a WHERE predicate that specifies the account of interest.
You can also turn this example upside down and create a running difference (as in an inventory count). You begin with an inventory of 1,000, and then subtract various purchases and receipts.

Tidak ada komentar:

Posting Komentar