varchar(MAX): 2014

Friday, November 21, 2014

Catchpoint OAuth C#

I've been working recently on pulling web response performance data from Catchpoint. We need this to combine with website traffic as well as sales to see if the response time hinders conversion rate. This is my first time programming an OAuth2 authentication so I had a little trouble figuring out the necessary construct. Catchpoint Data Pull API and documentation is still under development so their is not much to go on. They give you one Curl example and the text of the request.

curl https://io.catchpoint.com/ui/api/token \
--data 'grant_type=client_credentials&client_id=<key>&client_secret=<secret>'

POST /ui/api/token
Accept: application/json
Host: io.catchpoint.com
grant_type=client_credentials&client_id=<key>&client_secret=<secret>

So using the above information and some trial and error I was able to build a working C# use of their OAuth implementation to get the token I needed. I suppose if I was a linux programmer I would have know some of the defaults inherent in the curl command, but then, I wouldn't have been writing this in C# :)

                // Create the data to send
                StringBuilder data = new StringBuilder();
                data.Append("grant_type=" + Uri.EscapeDataString(grantType));
                data.Append("&client_id=" + Uri.EscapeDataString(clientString));
                data.Append("&client_secret=" + Uri.EscapeDataString(clientSecret));

                // Create a byte array of the data to be sent
                byte[] byteArray = Encoding.UTF8.GetBytes(data.ToString());

                // Setup the Request
                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
                request.Method = "POST";
                request.Accept = "application/json";
                request.ContentType = "application/x-www-form-urlencoded";
                request.ContentLength = byteArray.Length;

                // Write data
                Stream postStream = request.GetRequestStream();
                postStream.Write(byteArray, 0, byteArray.Length);
                postStream.Close();

                // Send Request & Get Response
                response = (HttpWebResponse)request.GetResponse();

                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                {
                    // Get the Response Stream
                    string json = reader.ReadLine();
                    Console.WriteLine(json);

                    // Retrieve and Return the Access Token
                    JavaScriptSerializer ser = new JavaScriptSerializer();
                    Dictionary<string, object> x = (Dictionary<string, object>)ser.DeserializeObject(json);
                    string accessToken = x["access_token"].ToString();
                    //Console.WriteLine(accessToken);
                }

Thursday, May 15, 2014

Hadoop: Questions I Am Asking

I have close to 14 years’ experience with SQL Server for ETL around Data Warehousing. I lead a team of very talented Data Warehouse Developers who have developed and maintain a multi-terabyte data warehouse. We ETL and dimensional model data daily describing tens of thousands orders, millions of dollars of sales, millions of web site visitor metrics and tens of millions of web page views. And we do this each night, and more, and have it all ready for the C-suite execs to drink in with their morning coffee! I’m not saying this to brag (well, maybe a little), but because despite that experience, Hadoop puts me in an alien world where the normal tools of my trade don’t seem to make sense.

At this point in time, the questions I am asking myself are:

How much of my Data Warehouse environment and processes will eventually be replaced by Hadoop related technologies and processes?
What ETL processes are best done in Hadoop and which in SQL/SSIS?
How much of my storage will transfer to Hadoop, Archive, Raw Staged, Operational Stores and Modeled Data?
How big of Hadoop environment do I need to surpass the power of my current SQL environment?
Does Hadoop mean adapting new technology to the existing BI strategy or do we need a new BI strategy?

I am tenacious, so it not a matter of “if” but “when” I’ll know which of my old tools will work, how to use new tools and new strategies to conquer the next generation of data challenges.

Monday, April 7, 2014

Two Big Data Observations

I may be suffering from selective bias but two articles came across my email recently and thought they were valid in my own exploration of Big Data. I came away from these articles with two main conclusions:

A good Business Intelligence strategy could put you in an excellent position. You must have the right technology, people and purpose to find value in your company's Big Data exploration. A team's Data Warehouse experience will enhance how they are able to leverage new data technologies.
The human element involved in business purpose, interpretation, and molding good data tend to be sacrificed in the Big Data Marketing hoopla. The old adage I learned in my first programming class, “garbage in, garbage out,” still applies. Big Data is not a magic solution that just solves problems on its own.

Here are the articles and my own observations:

What Makes Big Data Projects Succeed

Technology
“I found that companies can program big data applications with existing languages like SQL. I also learned that companies with existing data warehouse environments tend to create value faster with big data projects than those without them.”
People
“The large companies I interviewed about big data projects said they were not hiring Ph.D. level data scientists on a large scale. Instead they were forming teams of people with quantitative, computational, or business expertise backgrounds.”
Good Change Management
A clear business objective
“it will be an unproductive fishing expedition unless a company has a business problem in mind.”
Good project management

Google Flu Trends’ Failure Shows Good Data > Big Data
“The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.”

“more data in itself does not lead to better analysis, as amply demonstrated with Flu Trends. Large datasets don’t guarantee valid datasets. That’s a bad assumption, but one that’s used all the time to justify the use of and results from big data projects.”

"Progress will come when the companies involved in generating and crunching OCCAM datasets restrain themselves from overstating their capabilities without properly measuring their results."