Understanding Analytics Types and Needs
By Neil Raden, January, 2013
Purpose and Intent
“Analytics” is a critical component of enterprise architecture capabilities, though most organizations have only recently begun to develop experience using quantitative methods. As Information Technology emerges from a scarcity-based mentality of constrained and costly resources to a commodity consumption model of data, processors and tools, analytics is quickly becoming table stakes for competition.
This report is the first of a two-part series. (Part II will cover analytic functionality and matching the right technology to the proper analytic tools and best practices.) It discusses the importance of understanding the role of analytics, why it is a difficult topic for many, and what actions you should take. It will explore the various meanings of analytics, provide a framework for aligning various types of analytics with associated roles and skill sets needed.
Using quantitative methods is rapidly becoming, not an option for competitive advantage, but rather, at the very least, barely enough to keep up. Everyone needs to understand what’s involved in analytics, what you particular organization needs and how to do it.
Few people are comfortable with the concepts of advanced analytic methods. In fact, most people cannot explain the difference between a mean, a median and a sample mean. The misapplication of statistics is widespread, but today’s explosion of data sources and intriguing technologies to deal with them have changed the calculus. Embedded quantitative methods may relieve analysts of the actual construction of predictive models, but applying those models correctly requires understanding the different analytical types, roles and skill.
Analytics in the Enterprise
The emphasis of analytics is changing from one of long-range planning based on historical data, to dynamic and adaptive response based on timely information from multiple contexts, augmented and interpreted through various degrees of quantitative analysis. Analytics now permeates every aspect leading organizations’ operations. Competitive, technological and economic factors combine to require more precision and less lag time in discovery and decision-making.
For example, operational processing, the orchestration of business processes and secure capture of transactional data is merging with analytical processing, the gathering and processing of data for reporting and analysis. Analytics in commercial organizations has historically been limited to special groups working more or less off-line. Platforms for transaction processing were separated for performance and security reasons, an effect of “managing from scarcity.” But scarcity is not the issue anymore as the relative cost of computing has plummeted. Driven equally by technology and competition, operational systems are either absorbing or at least cooperating with analytical processes. This convergence elevates the visibility of all forms of analytics.
Confusion and mistakes in deploying analytics are common due to imprecise understanding of the various forms and types. Uncertainty about the staff and skills needed for various “types” of analytics are common. Messaging from technology vendors, service providers and analysts is murky and misleading, sometimes deliberately so.
The urgency behind implementing an analytics program, however, can be driven not by getting a leg up, but rather not falling behind.
Analytics and the Red Queen Effect
Analytics are crucial because the barriers to getting started are lower than ever. Everyone can engage in analytics now, of one type or another. As analytic capabilities increase across competitors, everyone must step up – it’s a Red Queen[i] effect. When everyone was shooting from the hip, efficiency was a matter of degree. If everyone used crude models and unreliable data, then everyone should, more or less, work within the same margin of error. What separated competitors was good strategy and good execution. But now that everyone can employ quantitative methods and techniques like Naive Bayes, C4.5 and support vector machines, it will still be the strategy and execution that count. Companies must improve just to stay in place. Each new level of analytics becomes the “table stakes” for the next.
Can You Compete on Analytics? Analytics Are Necessary – but Not Sufficient
Statistical methods using software have been shown to be useful in many aspects of an organization, such as fraud detection, demand forecasting and inventory management, but just using analytics has not been shown to necessarily improve the fortunes or effectiveness of the overall organization. In 2007, Davenport and Harris released their influential book[ii], Competing on Analytics, which described how a dozen or so companies used “analytics” to not only advise decision-makers, but to play a major role in the development of strategy and implementation of business initiatives. The book found a huge following and was a bestseller on the business book lists. It certainly placed the word “analytics” in the top of the mind of many decision makers. However, when comparing the fortunes of the twelve companies highlighted in the book, their performance in the stock market is less than spectacular as illustrated in Figure 2:
This scenario is often repeated – good work is performed inside an organization, but the benefits of the discipline do not permeate other parts of the business and, hence, have little effect on the organization as a whole. In another example, statistical methods have been used in the U.S. in agriculture for decades, and yields have improved dramatically, but the quality of the food supply has clearly degraded along with the fortunes of individual farmers.
Too many organizations, despite good intentions, do not see dramatic improvement in their fortunes after adopting wider-based analytical methods because:
First, rarely does one thing change a company. Analytics are a powerful tool, but it takes execution to realize the benefits. Perhaps if good analytical technique had been applied across the board along with a clear strategy to drive decisions based on quantitative models, better results may have followed. Instead, as is often the case, a visible project shows great promise and early results, but the follow through is wanting.
Data mining tools can actually be predictive, showing what is likely to happen or not happen. But what is often misunderstood is that data mining tools are usually poor at specifying when things will happen. In this case, too much faith is placed in the models, imbuing them with fortune-telling capabilities they simply lack. The correct approach is to test, run proofs of concept, and once in production engage in continuous improvement through mechanisms like champion/challenger and A/B testing.
Most of the companies try to understand customer behavior – which you can do with data mining – but it rarely captures the randomness of people’s behavior leading to overconfidence in the models. Given this customer is likely to purchase a car, when is the correct time to reach out? Perhaps right away, perhaps not. Data mining tools are not very good at individual propensities derived from behavior due to the randomness of human behavior. It is pretty common for inexperienced modelers to put too much faith in model results. The solution is to engage experienced talent to get a program started in the right track.
Return on investment in analytics is difficult to measure because there isn’t often a straight line from the model to results. Other parts of the organization contribute. An analytical process can inform decisions, either human or machine-driven, but the execution of those decisions is beyond the reach of an analytical system. People and process have to perform too. In addition, a successful analytical program can be the result of a well-defined strategy. Positive results from analytics would not have been possible without the formation of that strategy.
Professionals skilled in statistics, data mining, predictive modeling and optimization have been a part of many organizations for some time, but their contribution, and even an awareness of what they do, is sometimes poorly understood – and filled with many impediments to success. By categorizing analytics by the quantitative techniques used and the level of skill of the practitioners who use these techniques (the business applications that they support are detailed in Part II of the series), companies can begin to understand when and how to use analytics effectively and deploy their analytic resources to achieve better results.
The Four Types of Analytics
There are related and unrelated disciplines that are all combined under the term analytics. There is advanced analytics, descriptive analytics, predictive analytics and business analytics, all defined in a pretty murky way. It cries out for some precision. What follows is a way to characterize the many types of analytics by the quantitative techniques used and the level of skill of the practitioners who use these techniques.
Figure 4: The Four Types of Analytics
| Type I
||PhD or equivalent
||Creation of theory, development of algorithms. Academic/research. Often employed in business or government for very specialized roles
||Data Scientist orQuantitative
|Advanced Math/Stat, not necessarily PhD
||Internal expert in statistical and mathematical modeling and development, with solid business domain knowledge
||Good business domain, background in statistics optional
||Running and managing analytical models. Strong skills in and/or project management of analytical systems implementation
||Business Intelligence/ Discovery
||Data and numbers oriented, but so special advanced statistical skills
||Reporting, dashboard, OLAP and visualization use, possibly design, Performing posterior analysis of results driven by quantitative methods
Type I Analytics: Quantitative Research
The creation of theory and development of algorithms for all forms of quantitative analysis deserves the title Type I. Quantitative Research analytics are performed by mathematicians, statisticians and other pure quantitative scientists. They discover new ideas and concepts in mathematical terms and develop new algorithms with names like Hidden Markov Support Vector Machines, Linear Dynamical Systems, Spectral Clustering, Machine Learning and a host of other exotic models. The discovery and enhancement of computer-based algorithms for these concepts is mostly the realm of academia and other research institutions (though not exclusively). Commercial, governmental and other organizations (Google or Wall Street for example) employ staff with these very advanced skills; but in general, most organizations are able to conduct their necessary analytics without them, or employ the results of their research. An obvious example is the FICO score, developed by Quantitative Research experts at FICO (Formerly Fair Isaac) but employed widely in credit-granting institutions and even human resource organizations.
Type II Analytics: “Data Scientists”
More practical than theoretical, Type II is the incorporation of advanced analytical approaches derived from Type I activities. This includes commercial software companies, vertical software implementations, and even the heavy “quants” in industry who apply these methods specifically to the work they do like fraud detection, failure analysis, propensity to consume models, among hundreds of other examples. They operate in much the same way as commercial software companies but for just one customer (though they often start their own software companies too). The popular term for this role is “data scientist.”
“Heavy” Data Scientists. The Type II category could actually be broken down into two subtypes, Type II-A and Type II-B. While both perform roughly the same function – providing guidance and expertise in the application of quantitative analysis – they are differentiated by the sophistication of the techniques applied. II-A practitioners understand the mathematics behind the analytics and may apply very complex tools such as Kucene wrapper, loopy logic, path analysis, root cause analysis, synthetic time series or Naïve Bayes derivatives that are understood by a small number of practitioners. What differentiates the Type II-A from Type I is not necessarily the depth of knowledge they have about the formal methods of analytics (it is not uncommon for Type II’s to have a PhD for example), it is that they also possess the business domain knowledge they apply and their goal is to develop specific models for the enterprise, not for the general case as Type I’s usually do.
“Light” Data Scientists. Type II-Bs on the other hand may work with more common and well-understood techniques such as logistic regression, ANOVA, CHAID and various forms of linear regression. They approach the problems they deal with using more conventional best practices and/or packaged analytical solutions from third parties
Data Scientist Confusion. “Data Scientist” is a relatively new title for quantitatively adept people with accompanying business skills. The ability to formulate and apply tools to classification, prediction and even optimization, coupled with fairly deep understanding of the business itself, is clearly in the realm of Type II efforts. However, it seems pretty likely that most so-called data scientists will lean more towards the quantitative and data-oriented subjects than business planning and strategy. The reason for this is that the term data scientist emerged from those businesses like Google or Facebook where the data is the business; so understanding the data is equivalent to understanding the business. This is clearly not the case for most organizations. We see very few Type II data scientists with the in-depth knowledge of the whole business as, say, actuaries in the insurance business, whose extensive training should be a model for the newly designated data scientists (see our blog at: “What is a Data Scientist and What Isn’t”)
Though not universally accepted, data scientists must be able to effectively communicate their work to non-technical people. This is a major discriminator between a data scientist and a statistician. It is absolutely essential that someone in the analytics process have the role of chief communicator, someone who is comfortable working with quants, analysts and programmers, deconstructing their methodologies and processes, distilling them, and then rendering it in language that other stakeholders understand. Companies often fail to see that there is almost never anything to be gained by trying to put a PhD statistician into the role of managing a group of analysts and developers. It is safe to say that this role is represented more by a collaborative group of professionals than by a single individual.
Type III Analytics: Operational Analytics
Historically, this is the part of analytics we’re most familiar with. For example, a data scientist may develop a scoring model for his/her company. In Type III activity, parameters are chosen by the operational analytics expert analyst and are input into the model, generating the scores calculated by the Type II models and embedded into an operational system that, say, generates offers for credit cards. Models developed by data scientists can be applied and embedded in an almost infinite number of ways today. The application of Type II applications into real work is the realm of operational analysts. In very complex applications, real-time data can be streamed into applications based on Type II models with outcomes instantaneously derived through decision-making tools such as rules engines.
Packaged applications that embed quantitative methods such as predictive modeling or optimizations are also Type III in that the intricacies and the operation of the statistical or stochastic method are mostly hidden in a sort of “black box.” As analytics using advanced quantitative methods becomes more acceptable to management over time, these packages become more popular.
Decision making systems that are reliant on quantitative methods that are not well understood by the operators can lead to trouble. They must be carefully designed (and improved) to avoid overly burdening the recipients of useless or irrelevant information. This was a lesson learned in the early days of data mining, that generating “interesting” results without understanding what was relevant usually led to flagging interest in the technology. In today’s business environment, time is perhaps the scarcest commodity of all. Whether a decision-making system notifies people or machines, it must confine those messages to those that are the most relevant and useful.
False negatives are quite a bit more problematic as they can lead to transactions passing through that should not have. Large banks have gone under by not catching trades that cost billions of dollars. Think of false negatives as being asleep at the wheel.
Type IV Analytics: Business Intelligence & Discovery
Type III analytics aren’t of much value if their application in real business situations cannot be evaluated for their effectiveness. This is the analytical work we are most familiar with via reports, OLAP, dashboards and visualizations. This includes almost any activity that reviews information to understand what happened or how something performed, or to scan and free associate what patterns appear from analysis. The mathematics involved is simple. But pulling the right information – and understanding what information means – is still an art and requires both business sense and knowledge about sources and uses of the data.
Know Your Needs First
The scope of analytics is vast, ranging from the familiar features of business intelligence to the arcane and mysterious world of applied mathematics. Organizations need to be clear on their objectives and capabilities before funding and staffing an analytic program. Predictive modeling to dramatically improve your results makes for good reading, but the reality is quite different. The four types are meant to help you understand where you can begin or advance.
These categories are not hard and fast. Some activities are clearly a blend of various types. But the point is to add some clarity to the term “analytics” in order to understand its various use cases. Tom Davenport, for example, advocated creating a cadre of “PhDs with personality” in order to become an analytically competitive organization. That is one approach. Implementing analytics as part of other enterprise software you already have – or purchasing a specialized application that is already used and vetted in your industry – is a better place to start.
Use of some clear terminology can avoid confusion within your organization, not just internally, but in communication with vendors and service providers. To get the most out of analytics:
- Be clear about what you need. Having clarity on the meaning of analytics has clear benefits. Because the nature of analytics is a little mysterious to most people, a vendor statement that they provide “embedded predictive analytics” can no longer be taken at face value. You should look closely to see if those capabilities line up with your needs.
- Don’t assume high value means high resource costs. In the same vein, you needn’t hesitate to begin analytical projects because you believe you need to source a dozen PhDs, when in fact, your needs are in the Type II category.
- Formulate specific vendor questions based on what level of sophistication and resources you need. By more clearly specifying what type of analytics you need, it becomes very easy to ask: Is this tool designed to discover and create predictive models, or to deploy them from other sources? Do you offer training in quantitative methods or only in the use of your product? Is the tool designed for authoring scoring models or just using scored values?
- Use analytic knowledge to start to prepare for Big Data. Understanding what type of analytics – and results – you need will even help you in your soon-to-be-serious consideration of Big Data solutions, including Hadoop, its variants and its competitors, all of which use variants of the above techniques to process large quantities of information.
Analytics is a catchall phrase, but understanding the various uses and types should help in implementing the right approach for accomplishing the tasks at hand. It should also help in discerning what is meant when the term is used, as almost anything can be called analytics.
Part II of this series will examine in depth the forms that analytics take in the organization and the business purposes it serves, and demonstrate through examples and case studies how analytics of all types are successfully employed. But analytics are a step in the process. Without effective decision-making practices the value in analytics is lost. Part III of this series will deal with decision making and decision management.
Author Bio: Neil Raden
Analyst, Consultant and Author in Analytics and Decision Science
Neil Raden, firstname.lastname@example.org is the founder and Principal Analyst at Hired Brains Research, a provider of consulting and implementation services in business intelligence, analytics and decision managemen. Hired Brains focuses on the needs of organizations and capabilities of technology. He began his career as a property and casualty actuary with AIG in New York before moving into predictive analytics services, software engineering, and systems integration with experience in delivering environments for decision making in fields as diverse as health care to nuclear waste management to cosmetics marketing and many others in between.
[i] The Red Queen is a concept from evolutionary biology first used in Matt Ridley, The Red Queen: Sex and the Evolution of Human Nature, (New York: Macmillan Publishing Co, 1994). The allusion is to the Red Queen in Lewis Carroll’s Through the Looking-Glass, who had to keep running just to stay in place.
[ii] Davenport, Harris, et al, “Competing on Analytics: The New Science of Winning,” New York, Harvard Business Press, 2007.