Wednesday, February 14, 2018

This SQL Just Saved My Butt


I live by the phrase, "Save early and often!” Throughout my career I have preached it to others with the kind of fatherly tone that conveys an attitude of “been-there” as well as “I know better”. It’s a good axiom to remember and practice, and yesterday, I was kicking myself all over my office because I didn’t. 
I was writing code in SQL Server Management Studio (SSMS), the typical plan, write, test cycle.  I spent a good part of the day working with a particuallarly nasty data set. Sometimes I would run the operations in-whole, other-times in segments. Then, after hours of intense focus and concentration, a colleague IM’d me a simple question. To answer her, I opened a SQL file containing the logic, typed out a quick response and closed the SQL file quickly, dismissing the warning. It was then that I realized I had closed the wrong file. Instant panic hit me as I realized a half-day of work was just thrown away, there is no auto-save in SSMS. It hit me like a ton-of-bricks that all my hard work was lost.

Luckly, Bing (I'm sure Google could have done it too) came to my rescue when I searched for an answer. 
https://stackoverflow.com/questions/15034832/recover-unsaved-sql-query-scripts

Use <database>
SELECT execquery.last_execution_time AS [Date Time], execsql.text AS [Script] FROM sys.dm_exec_query_stats AS execquery
CROSS APPLY sys.dm_exec_sql_text(execquery.sql_handle) AS execsql
ORDER BY execquery.last_execution_time DESC
After running the query, I had to scroll through many-many results until I found the last time I had run my code completely. When copied and pasted into a new query window, I had my work back! With a little formatting (it pasted in as a single line), I was back in business and, this time, I made sure to save it right away. My grateful thanks to the microsoft techs who made this possible for me!

Monday, January 29, 2018

Helpful Data Warehouse Essentials

Dimensional Modeling can be a bit confusing to application and transactional database developers. The approach and design of a Data Warehouse is often counter intuitive compared to the approach and design of transactional processing databases. If you are looking for a primer or refresher on a Data Warehousing, specifically, star-schema dimensional modeling, these are some of my favorite articles around philosophy, architecture and testing. I've highlighted some quotes along with each article that resonates with my experience.

Philosophy

  • It would be criminal not to mention the foremost authority on Dimensional modeling, The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling by Ralph Kimball and Margy Ross. 
  • An excellent summary of the essentials of data warehousing is contained in valuable SQL Server MVP Deep Dives by Paul Nielsen et al., Chapter 50, “BI for the Relational Guy.”
    • “A business intelligence solution needs to be able to address not just the needs of today, but those of the future, and that can only be accomplished by obtaining a core understanding of the underlying business processes.”
    • "Departments in an organization tend to vary in their readiness for data warehousing, and cooperation from all departments is critical for [Determining the overall strategy] possible.”
    • “It is true that [IT staff] are intimate with the business rules that underlie the technology solutions that run much of the business, but that should not be confused with a core understanding of the business, including insights into where the business is heading.”
  • Tried and True Concepts for DW-BI Success (http://www.kimballgroup.com/2015/10/design-tip-178-tried-and-true-concepts-for-dw-bi-success/)
    • “Never forget to maintain a laser-like focus on achieving business value. The success of a DW/BI effort is totally dependent on business user engagement…”
    • “The DW/BI effort cannot be identified as just an IT effort. The business community (not IT) needs to take ownership of the vision, strategy, roadmap (priorities), scope, timelines, governance, and quality…”
    • “Exhibit patience – The initial phase of a DW/BI initiative usually takes a disproportionately long time…”
  • What’s In a name? (http://www.kimballgroup.com/2014/07/design-tip-168-whats-name/ )
    • “As much as possible, strive to have the names in the DW/BI system be unchanged in the semantic layer and unchanged by report designers.”
    • “If your organization already has naming conventions, you may be faced with a problem: most existing naming conventions were developed for technical people. But names in the DW/BI environment should be oriented to the business users.”
  • The 10 Essential Rules of Dimensional Modeling (http://www.kimballgroup.com/2009/05/the-10-essential-rules-of-dimensional-modeling/ )
    • “Rule #1: Load detailed atomic data into dimensional structures.” –"…business users cannot live on summary data alone; they need the gory details to answer their ever-changing questions.”
    • “Rule #2: Structure dimensional models around business processes.”
    • “Rule #10: Continuously balance requirements and realities to deliver a DW/BI solution that’s accepted by business users and that supports their decision-making.”
  • Kimball Lifecycle In a Nutshell (http://www.kimballgroup.com/2009/08/design-tip-115-kimball-lifecycle-in-a-nutshell/ )
    • Thirty-thousand foot overview of the whole process and construct of Business Intelligence.
    • “Regardless of the platform, dimensional models attempt to address two simultaneous goals: ease of use from the users’ perspective and fast query performance.” (emphasis mine)

Testing

  • Data Warehouse Testing Recommendations (http://www.kimballgroup.com/2011/05/design-tip-134-data-warehouse-testing-recommendations/ )
    • “Standard testing methodology tests one little thing at a time, but a DW/BI system is all about integration and complexity, not to mention large data volumes.”
    • “Enlisting business users in the test specification process will ensure better testing than if the DW/BI team just made up tests based on what they think is interesting. Engaging key business users in the quality assurance process also provides a huge credibility boost.”

Data Architecture

  • Dimensional Modeling Matrix (http://www.kimballgroup.com/2005/12/the-matrix-revisited/ )
    • “Reusable conformed dimensions supply potent integration glue, letting businesses drill across core processes with consistent, unified views.”
    • “Yes, it’s difficult to get people in different corners of the business to agree on common attribute names, definitions and values, but that’s the crux of unified, integrated data. If everyone demands their own labels and business rules, then there’s no chance of delivering the single version of the truth DW promises.”
    • “You see the logical relationships and complex interplay between the organization’s conformed reference dimensions and key business processes. By looking across a row, you quickly understand its dimensionality. Looking down a column gives immediate feedback about conformed dimension opportunities and obstacles…”
  • A Dimensional Modeling Manifesto (http://www.kimballgroup.com/1997/08/a-dimensional-modeling-manifesto/ )
    • “First, the dimensional model is a predictable, standard framework. Report writers, query tools, and user interfaces can all make strong assumptions about the dimensional model to make the user interfaces more understandable and to make processing more efficient.”
    • “A third strength of the dimensional model is that it is gracefully extensible to accommodate unexpected new data elements and new design decisions.”
  • Two Powerful Ideas (http://www.kimballgroup.com/2002/09/two-powerful-ideas/ )
    • “There are two powerful ideas at the foundation of most successful data warehouses. First, separate your systems. Second, build stars and cubes.”

Thursday, January 18, 2018

Quick, Easy, Database Views For The Lazy Typist

I am not great at typing. Yes, I had the requisite course on typing in High School (that dates me, I know). Back then it was simply "Typing," now my kids take "Keyboarding," and they take it much earlier than High School. I didn't do that well in my Typing class and still have known bad habits in typing. All that to say, I don't like typing out letter-by-laborious-letter the views that lay atop my Data Warehouse tables. By the way, creating Views is smart even if your business users and analysts are accustomed to accessing the underlying tables.  Views provide a level of abstraction and a way to semantically beautify column names from the underlying tables.

To get around typing, I created some quick SQL to interrogate the INFORMATION_SCHEMA tables and build the views for me.  It includes some tweaks to limit the scope of which tables to create views for and which audit columns I want hidden from the users. Since I am working in a Data Warehouse environment, the tables I want to create are simple SELECT views for my Dimensions, Facts and Bridge tables. As long as your Data Warehouse follows a standard naming practice you should be able to adjust the following SQL in a way that suits your environment.

/******************************************************************
Create Views from Base Dimensional Model tables
******************************************************************/

DECLARE @Schema VARCHAR(120),
  @TableName    VARCHAR(120),
  @ViewName     VARCHAR(120),
  @colName      VARCHAR(120),
  @sql          VARCHAR(max)
  -- Use tables matching my criteriaDECLARE table_cursor
CURSOR static --Use STATIC to make a temp copy of the data for the cursor.
  FOR

    SELECT table_schema,
         table_name,
         ViewName = Replace(table_name,'tbl','')
    FROM   information_schema.tables
    WHERE  table_schema IN ('Core','Sales')
--Loop through the tables

OPEN table_cursor
FETCH next
FROM  table_cursor
INTO  @Schema,
      @TableName,
      @ViewName


WHILE @@FETCH_STATUS = 0
BEGIN 

  --Table Loop
  SET @sql = 'IF OBJECT_ID''' + @Schema + '.' + @ViewName + ''') IS NOT NULL 
DROP VIEW ' + @Schema + '.' + @ViewName + Char(13)
  PRINT @sql
  EXEC(@sql)
  SET @sql = 'CREATE VIEW ' + @Schema + '.' + @ViewName + ' AS ' + Char(13) + '     SELECT '

  DECLARE col_cursor
  CURSOR static 

    FOR
        SELECT   column_name
        FROM     information_schema.columns
        WHERE    table_schema = @Schema
            AND      table_name = @TableName
            AND      column_name NOT LIKE 'Edw%'
            AND      column_name NOT LIKE '%RowVersionStamp%'
        ORDER BY ordinal_position


-        -Loop through the columns of my table
        OPEN col_cursor
        FETCH next
        FROM  col_cursor
        INTO  @colName
        WHILE @@FETCH_STATUS = 0
        BEGIN
          SET @sql = @sql + Char(13) + '          ' + @colName
          FETCH next
          FROM  col_cursor
          INTO  @colName
          IF @@FETCH_STATUS = 0
          SET @sql = @sql + ',';
        END
    --Column Loop End
    CLOSE col_cursor
    DEALLOCATE col_cursor


  SET @sql = @sql + Char(13) + '     FROM ' + @schema + '.' + @TableName
  PRINT @sql
  PRINT ''
  EXEC(@sql)
  FETCH next
  FROM  table_cursor
  INTO  @Schema,
        @TableName,
        @ViewName
END
--Table Loop END


CLOSE table_cursor
DEALLOCATE table_cursor;

Friday, November 21, 2014

Catchpoint OAuth C#

I've been working recently on pulling web response performance data from Catchpoint. We need this to combine with website traffic as well as sales to see if the response time hinders conversion rate. This is my first time programming an OAuth2 authentication so I had a little trouble figuring out the necessary construct. Catchpoint Data Pull API and documentation is still under development so their is not much to go on. They give you one Curl example and the text of the request.

curl https://io.catchpoint.com/ui/api/token \
--data 'grant_type=client_credentials&client_id=<key>&client_secret=<secret>' 

POST /ui/api/token
Accept: application/json
Host: io.catchpoint.com
grant_type=client_credentials&client_id=<key>&client_secret=<secret>

So using the above information and some trial and error I was able to build a working C# use of their OAuth implementation to get the token I needed. I suppose if I was a linux programmer I would have know some of the defaults inherent in the curl command, but then, I wouldn't have been writing this in C# :)


                // Create the data to send
                StringBuilder data = new StringBuilder();
                data.Append("grant_type=" + Uri.EscapeDataString(grantType));
                data.Append("&client_id=" + Uri.EscapeDataString(clientString));
                data.Append("&client_secret=" + Uri.EscapeDataString(clientSecret));

                // Create a byte array of the data to be sent
                byte[] byteArray = Encoding.UTF8.GetBytes(data.ToString());

                // Setup the Request
                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
                request.Method = "POST";
                request.Accept = "application/json";
                request.ContentType = "application/x-www-form-urlencoded";
                request.ContentLength = byteArray.Length;

                // Write data
                Stream postStream = request.GetRequestStream();
                postStream.Write(byteArray, 0, byteArray.Length);
                postStream.Close();

                // Send Request & Get Response
                response = (HttpWebResponse)request.GetResponse();

                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                {
                    // Get the Response Stream
                    string json = reader.ReadLine();
                    Console.WriteLine(json);

                    // Retrieve and Return the Access Token
                    JavaScriptSerializer ser = new JavaScriptSerializer();
                    Dictionary<string, object> x = (Dictionary<string, object>)ser.DeserializeObject(json);
                    string accessToken = x["access_token"].ToString();
                    //Console.WriteLine(accessToken);
                }

Thursday, May 15, 2014

Hadoop: Questions I Am Asking

I have close to 14 years’ experience with SQL Server for ETL around Data Warehousing. I lead a team of very talented Data Warehouse Developers who have developed and maintain a multi-terabyte data warehouse. We ETL and dimensional model data daily describing tens of thousands orders, millions of dollars of sales, millions of web site visitor metrics and tens of millions of web page views. And we do this each night, and more, and have it all ready for the C-suite execs to drink in with their morning coffee! I’m not saying this to brag (well, maybe a little), but because despite that experience, Hadoop puts me in an alien world where the normal tools of my trade don’t seem to make sense.

At this point in time, the questions I am asking myself are:

  • How much of my Data Warehouse environment and processes will eventually be replaced by Hadoop related technologies and processes?
  • What ETL processes are best done in Hadoop and which in SQL/SSIS?
  • How much of my storage will transfer to Hadoop, Archive, Raw Staged, Operational Stores and Modeled Data?
  • How big of Hadoop environment do I need to surpass the power of my current SQL environment?
  • Does Hadoop mean adapting new technology to the existing BI strategy or do we need a new BI strategy?

I am tenacious, so it not a matter of “if” but “when” I’ll know which of my old tools will work, how to use new tools and new strategies to conquer the next generation of data challenges.

Monday, April 7, 2014

Two Big Data Observations

I may be suffering from selective bias but two articles came across my email recently and thought they were valid in my own exploration of Big Data. I came away from these articles with two main conclusions:

  1. A good Business Intelligence strategy could put you in an excellent position. You must have the right technology, people and purpose to find value in your company's Big Data exploration. A team's Data Warehouse experience will enhance how they are able to leverage new data technologies. 
  2. The human element involved in business purpose, interpretation, and molding good data tend to be sacrificed in the Big Data Marketing hoopla. The old adage I learned in my first programming class, “garbage in, garbage out,” still applies. Big Data is not a magic solution that just solves problems on its own.
Here are the articles and my own observations:

What Makes Big Data Projects Succeed 

  1. Technology
    “I found that companies can program big data applications with existing languages like SQL. I also learned that companies with existing data warehouse environments tend to create value faster with big data projects than those without them.”
  2. People
    “The large companies I interviewed about big data projects said they were not hiring Ph.D. level data scientists on a large scale. Instead they were forming teams of people with quantitative, computational, or business expertise backgrounds.”
  3. Good Change Management
  4. A clear business objective
    “it will be an unproductive fishing expedition unless a company has a business problem in mind.”
  5. Good project management


Google Flu Trends’ Failure Shows Good Data > Big Data
“The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.”

“more data in itself does not lead to better analysis, as amply demonstrated with Flu Trends. Large datasets don’t guarantee valid datasets. That’s a bad assumption, but one that’s used all the time to justify the use of and results from big data projects.”

"Progress will come when the companies involved in generating and crunching OCCAM datasets restrain themselves from overstating their capabilities without properly measuring their results."

Saturday, October 8, 2011

SQL Round To Quarter


This is a reminder to myself how to round to the nearest fraction, in this case a quarter.
Next time I won't spend 15 minutes trying to recall the rounding formula. I'll just look it up here.

DECLARE @target smallmoney
SET @target = 597.80


--round to quarter hourSELECT ROUND( @Target/25.0, 2) * 25

-- Result = 597.75



(Originally posted 8/3/2007)