Pervasive Analytics: Needs Organizational Change, Better Software and Training
By Neil Raden email@example.com
Principal Analyst, Hired Brains Research, LLC
The hunt for data scientists has reached its logical conclusion: There are not enough qualified ones to go around. The pull for analytics as a result of a number of factors, including big data and the march of Moore’s Law, is irresistible. As a result, industry analysts, software providers and other influencers are turning to the idea of the “democratization of analytics” as a solution. At Hired Brains, we believe this is not only a good idea (and have been writing and speaking about it for four years), but that it is inevitable. Unfortunately, turning business analysts loose of quantitative methods is an unworkable solution. As the title says, three things that are not currently in place need to be: organizational change, better software and training/mentoring for sustained periods.
From the middle of the twentieth century until nearly its end, computers in business were mostly consumed with the process of capturing operational transactions for audit and regulatory purposes. Reporting for decision-making was repetitive and inactive. Some interactivity with the computer began to emerge in the eighties, but it was applied mostly to data input forms. By the end of the century, mostly as a result of the push from personal computers, tools for interacting with data, such as Decision Support Systems, reporting tools and Business Intelligence allowed business analysts to finally use the computing power for their analytical, as opposed to operational purposes.
Nevertheless, these tools were under constant stress because of the cost and scarcity of computing power. The repository of the data, mostly data warehouses, dwarfed the size of the operational systems that fed them. As BI software providers pressed for “pervasive BI,” so that a much broader group of people in the organization would actively use the tools (and the vendors would sell more licenses of course), the movement met resistance from three areas: 1) physical resources (CPU, RAM, Disk), 2) IT concerns that a much broader user community would wreak havoc with the established security and control and 3) people themselves who, beyond the existing users, showed little interest in “self-service” so long as there were others willing to do it for them.
In 2007, Tom Davenport published his landmark book, “Competing on Analytics,” and suddenly, every CEO wanted to find out how to compete on analytics. Beyond the more or less thin advice about why this was a good idea, the book was actually anemic when it came to providing any kind of specific, prescriptive advice on transforming an organization to an “analytically-driven” one.
Fast forward to 2015 and analytics has morphed from a meme to a mania. Pervasive BI is a relic not even discussed, but pervasive analytics or, more recently, the “democratization of analytics” is widely held to be the salvation of every organization. Granted, two of the three reasons pervasive BI failed to ignite are no longer an issue in this era of big data and Hadoop, but the third, the people, looms even larger: 1) people still are not motivated to do work that was previously done by others and 2) an even greater problem, the academic prerequisites to do the work are absent in the vast majority of workers. Pulling a Naïve Bayes or C4.5 icon over same data and getting a really pretty diagram or chart is dangerous. Software providers are making it terrifyingly easy for people to DO advanced quantitative analysis without knowing what they doing.
Pervasive analytics? It can happen. It will happen, it’s inevitable and even a good idea, but most of the messaging about it has been perilously thin, from Gartner’s “Citizen Data Scientists” to Davenport’s “Light Quants” (who would ever want to be a “light” anything?) What is lacking is some formality about what kind of training organizations need to commit to, what analytical software vendors need to do to provide extraordinarily better software for neophytes to use productively and, how organizations need to restructure for all of this to be worthwhile and effective.
How to Move to Pervasive Analytics
For “pervasive analytics” or “the democratization of analytics” to be successful, it requires much more than just technology. Most prominent is a lack of training and skills on the part of the wide audience that is expected to be “pervaded” if you will. The shortage of “data scientists” is well documented, which is the motivation for pushing advanced analytics down in the organization to business analysts. The availability of new forms of data provides an opportunity to gain a better understanding of your customers and business environment (among a multitude of other opportunities), which implies a need to analyze data at a level of complexity beyond current skills, and beyond the capabilities of your current BI tools.
Much work is needed to develop realistic game plans for this. In particular, our research at Hired Brains shows that there are three critical areas that need to be addressed:
- Skills and training: A three-day course is not sufficient and organizations need to make a long-term commitment to the guiding of analysts
- Organizing for pervasive analytics: Existing IT relationships with business analysts need reconstruction and senior analysts and data scientists need to supervise the roles of governance, mentoring and vetting
- Vastly upgraded software from the analytics vendors: In reaction to this rapidly unfolding situation, software vendors are beginning to provide packaged predictive capabilities. This raises a whole host of concerns about casual dragging of statistical and predictive icons onto a palate and almost randomly generating plausible output, that is completely wrong.
Skills and Training
Of course it’s unrealistic to think that existing analysts who can build reports and dashboards will learn to integrate moment generating functions and understand the underlying math behind probability distributions and quantitative algorithms. However, with a little help (a lot actually) from software providers, a good man-machine mix is possible where analysts can explore data and use quantitative techniques while being guided, warned and corrected.
A more long-term problem is training people to be able to build models and make decisions based on probability, not a “single version of the truth.” This process will take longer and require more assistance from those with the training and experience to recognize what makes sense and what doesn’t. Here is an example:
The chart shows a correlation between a stock market index and the number of times Jennifer Lawrence was mentioned in the media. Not shown, but the correlation coefficient is a robust 0.80, which means the variables are tightly correlated. Be honest with yourself and think about what could explain this? After you’ve thought about a few confounding variables, did you consider that they are both slightly increasing time series, which is actually the basis of the correlation, not the phenomena themselves? Remove the time element and the correlation drops to almost zero.
The point here is one doesn’t need to understand the algorithms that create this spurious correlation, they just need enough experience to know that you have to filter out the effect of the time series. But how would they know that?
The fact is that making statistical errors is far more insidious than spreadsheet or BI errors when underlying concepts are hidden. Turning business analysts into analytical analysts is possible, but not automatic.
Consider how actuaries learn their craft. Organizations hire people with an aptitude for math, demonstrated by doing well in things like Calculus and Linear Algebra, but not requiring a PhD. As they join an insurance or reinsurance or consulting organization, they are given study time at work to prepare for the exams, a process that takes years, and have ample access to mentors to help them along because the firm has a vested interest in them succeeding. Being an analyst in a firm is a less extensive learning process, but the model still makes sense.
Organizational: How organizations should deal with DIY analytics
We’re just beginning our research in this area, but one thing is certain: the BI user pyramid has got to go. In many BI implementations, the work fell onto the shoulders of BI Competency Centers to create datasets, while a handful of “power users” worked with the most useful features of the toolsets. The remainder of the users, dependent on the two tiers above them, generated simple reports or dashboards for themselves or departments (an amusing anecdote from a client of ours was, “The most used feature of our BI tool was ‘Export to Excel.’”) Creating “Pervasive BI” would have entailed doing a dead lift of the “business users” into the “power user” class, but no feasible approach was ever put forward.
Pervasive analytics cannot depend on the efforts of a few “go-to guys,” it has to evolve into an analytically centered organization where a combination of training and better software can be effective. That involves a continuing commitment to longer-term training and learning, governance of models so that models developed by professional business analysts can be monitored and vetted before finding their way into production and just a wholesale effort to change the analytics workflow: where do these analyses go beyond the analyst?
Expectations from Software Providers
Packaged analytical tools are sorely lacking in advice and error catching. It is very easy to take an icon and drop it on some data, and the tools may offer some cryptic error message or, at worst, the “help” system displays 500 words from a statistics textbook to describe the workings of the tool. But this is 2015 and computers are a jillion times more powerful than they were a few years ago. It will take some hard work for the engineers, but there is no reason why a tool should not be able to respond to its use with:
- Those parameters are not likely to work in this model; why don’t you try these
- Hey, “Texas Sharpshooter”-you drew the boundaries around the data to fit the category model
- I see you’re using a p-value but haven’t verified that the distribution is normal. Shall I check for you?
We will be continuing our research in the areas of skills/training, organization and software for Pervasive Analytics. Please feel free to comment at firstname.lastname@example.org