Metrics Can Lead in the Wrong Direction

Is it really possible to use measurement — or “metrics,” in the current parlance — to drive an organization? There are two points of view, one widely accepted and current, the other opposing and more abstract.

The conventional wisdom on performance management is that our technology is perfectly capable of providing detailed, current and relevant performance information to stakeholders in an enterprise, including executives, managers, functional people, customers, vendors and regulators. Because we are blessed with abundant computing resources, connectivity, bandwidth and even standards, it is possible to present this information in cognitively effective ways (dashboards and visualization, for example). Recipients are able to receive the information in the manner in which they choose, and the whole process pays dividends by supporting the notion that “If you can’t measure it, you can’t manage it. “It is hard to imagine how anyone could manage a large undertaking without measurement, isn’t it? And most presentations I’ve heard quickly stress that measurement is only part of the solution.

The first step is knowing what to measure; then measuring it accurately; then finding a way to disseminate the information for maximum impact (figuring out how to keep it current and relevant); and then being able to actually do something about the results. A different way of saying this is that technology is never a solution to social problems, and interactions between human beings are inherently social. This is why performance management is a very complex discipline, not just the implementation of dashboard or scorecard technology. Luckily, the business community seems to be plugged into this concept in a way they never were in the old context of business intelligence. In this new context, organizations understand that measurement tools only imply remediation and that business intelligence is most often applied merely to inform people, not to catalyze change. In practice, such undertakings almost always lack a change-management methodology or portfolio.

But there is an argument against measurement, too. Unlike machines or chemical reactions in a beaker, human beings are aware that they are being measured. In the realm of physics, Heisenberg’s Uncertainty Principle demonstrates that the act of measurement itself can very often distort the phenomena one is attempting to measure. When it comes to sub-atomic particles, we can pretty much assume it is a physical law that underlies this behavior. With people, the unseen subtext is clearly conscious. People find the most ingenious ways to distort measurement systems to generate the numbers that are desired. Thus, the effort to measure can not only discourage desired behavior; it can promote dysfunctional behavior. There are excellent, documented examples of this phenomenon in Measuring and Managing Performance in Organizations by Robert D. Austin. The author’s contention is that measurement of people always introduces distortion and often brings dysfunction because measurement is never more than a proxy or an approximation of the real phenomena.

In a particularly colorful analogy, Austin writes:

“Kaplan and Norton’s cockpit analogy would be accurate if it included a multitude of tiny gremlins controlling wing flaps, fuel flow, and so on of a plane being buffeted by winds and generally struggling against nature, but with the gremlins always controlling information flow back to the cockpit instruments, for fear that the pilot might find gremlin replacements. It would not be surprising if airplanes guided this way occasionally flew into mountains when they seemed to be progressing smoothly toward their destinations.”

We all know that incomplete proxies are too easy to exploit in the same way that inadequate software with programming gaps beckons unscrupulous hackers. However, one doesn’t have to be malicious to subvert a measurement system. After all, voluntary compliance to the tax code encourages a national obsession with “loopholes.” And what salesperson hasn’t “sandbagged” a few deals for the next quarter after meeting the quota for the current one?

The solution is not to discard measurement but rather to be conscious of this tendency and to be vigilant and thorough in the design of measurement systems. We all have a tendency toward simplifying things; but in some cases, it appears better to not measure at all than to produce something inadequate. Performance management, to achieve its goals, has to be applied effectively, which is to say, with superior execution of technology, implementation and management. It has to be designed to be responsive to both incremental and unpredicted changes in the organization and the environment. There are no road maps for this. This is truly the first time that analytical and measurement technology can be embedded in day-to-day, instantaneous decision-making and tracking; and the industry is sorely lacking in skills and experience to pull it off. Those organizations that have been successful so far have relied on existing methodologies (activity-based costing or balanced scorecard, for example) to guide them through the more uncertain steps of metric formulation and change management to close the loop.

The question of whether you can ever adequately measure an organization is still open. To the extent that there are statutory and regulatory requirements, such as taxation, SEC or specific industry regulations, the answer is clearly yes. But those measurements are dictated. To measure performance after the fact, at aggregated levels, is only useful to a point. The closer and closer a measurement system gets to the actual events and actions that drive the higher-level numbers, the less reliable the cause-effect relationship becomes, just like Heisenberg found so long ago. There are many examples in the management literature of everyone “doing the right thing” while the wheels are coming off the organization.

Recommended Reading:

Measuring and Managing Performance in Organizations , by Robert D. Austin, (New York: Dorset House Publishing, 1996) 
In the Age of the Smart Machine: The Future of Work and Power by Shoshana Zuboff, (New York, Basic Books, 1988) 
“The New Productivity Challenge,” by Peter Drucker, Harvard Business Review, (Nov.-Dec. 1991): p. 70. 

Posted in Uncategorized | Tagged , , , , , | 1 Comment

A Bit About Storytelling

My take on storytelling

1. Must be a “story” with a beginning, middle and end that is relevant to the listeners.
2. Must be highly compressed
3. Must have a hero – the story must be about a person who accomplished something notable or noteworthy.
4. Must include a surprising element – the story should shock the listener out of their complacency. It should shake up their model of reality.
5. Must stimulate an “of course!” reaction – once the surprise is delivered, the listener should see the obvious path to the future.
6. Must embody the change process desired, be relatively recent and “pretty much” true.
7. Must have a happy ending.

In Stephen Denning’s words, “When a springboard story does its job, the listeners’ minds race ahead, to imagine the further implications of elaborating the same idea in different contexts, more intimately known to the listeners. In this way, through extrapolation from the narrative, the re-creation of the change idea can be successfully brought to birth, with the concept of it planted in listeners’ minds, not as a vague, abstract inert thing, but an idea that is pulsing, kicking, breathing, exciting – and alive.”

That may be a little too much excitement on a daily basis, something you save for the really important things, but it matters nonetheless that turning data into a story is a valid and necessary skill. But is it for everyone?

Not really. Actual storytelling is a craft. Not everyone knows how to do it or can even learn it. But everyone can tell a story. It just may not be of the caliber of storytelling. But to get a point across and have it stick (even if it’s just in your own mind, not to an audience), learn to apply metaphor.

More on metaphor lately

Posted in Big Data, Business Intelligence, Decision Management, Research, White Paper | Tagged , , | Leave a comment

When Are Decisions Driven by Analytics, or Merely Informed by Them?

In http://www.dataintegrationblog.com/data-quality/from-tactical-to-strategic-action-operational-decision-management-2/

Boy did Julie Hunt ever hit the nail on the head: 

“But – real-time decision-making also has to be vetted with domain knowledge, human experience and common sense, to validate the viability of analytics results. Decisions make a positive difference for the enterprise only if they are based on accurate intelligence. While many things are possible with predictive analytics, there is always the danger of trying to force ‘reality’ to fit the model. This can be deadly to real-time operational decision-making.”

When it comes to decisions that can be made via models, you have to separate them into two categories: those that do not require 100% precision, and those that are too important to get wrong. 

For example, routing a call center call, approving a credit line increase, rating a car insurance premium – these are all “decisions” that are made in high volume, but getting some of them wrong, in the aggregate, causes little harm. Obviously, the closer you get to perfect performance the better, but you can allow these decisions to be made without human interference. Obviously you track the result and continuously improve the models.

On the other hand, many decisions in an enterprise are too important to turn over to some algorithms. In these cases, the quantitative analysis can be a part of the decision process, but ultimately the decision vests with the person or persons who take responsibility for it. In point of fact, very few managers are comfortable with answers based on probability. The difference between 80% probability and 95% probability simply doesn’t resonate. For important decisions, managers want one answer, and that requires discussion and consensus. 

We have to be very careful not to over-promise on analytics. 

Posted in Big Data, Business Intelligence, Decision Management | Tagged , , , , | 1 Comment

Personalized Medicine World Conference

This is the fourth or fifth year for this conference, and each year there are some surprises. The first couple of years it was a diverse collection of researchers, entrepreneurs and vendors (Oracle, Deloitte, etc.). The number of exhibitors seems about the same as last year, but there was a small booth for SAP HANA, which was sort of a surprise, but I learned they are aggressively going after the life sciences sector and Hasso Plattner was a keynote speaker. That’s a pretty good sign this conference is getting pretty commercial.

Like any conference, a few stars emerge and become familiar and repeat presenters. Atul Butte is one example. Here is his bio:

Atul Butte, MD, PhD is Chief of the Division of Systems Medicine and Associate Professor of Pediatrics, Medicine, and by courtesy, Computer Science, at Stanford University and Lucile Packard Children’s Hospital. Dr. Butte trained in Computer Science at Brown University, worked as a software engineer at Apple and Microsoft, received his MD at Brown University, trained in Pediatrics and Pediatric Endocrinology at Children’s Hospital Boston, then received his PhD in Health Sciences and Technology from Harvard Medical School and MIT. Dr. Butte has authored more than 100 publications and delivered more than 120 invited presentations in personalized and systems medicine, biomedical informatics, and molecular diabetes, including 20 at the National Institutes of Health or NIH-related meetings.

I did find Dr. Butte’s presentation about how bioinformatics tools applied to big public data have yielded new uses for drugs and new prototype drugs and diagnostics for type 2 diabetes. It was an interesting discussion of what we call big data analytics, but in the end, it just came back to making more drugs. 

When you attend medical conferences, speakers always have these extensive pedigrees, but what I wonder is, with all of the esteem, what sort of doctors are they? Are they too distanced from day-to-day clinical work to see the problems and possibilities? Are their decisions made within a bubble that excludes consideration of alternatives? That is the sense I get listening to them. 

A common term used by many of the speakers was “omics.” First we had genomics, then epigenomics followed by proteomics or metabolomics. All of these areas combine both bench science and informatics on a huge scale. The hope is that the digital examination of these minute measurements can lead to cures for diabetes, cancer, heat disease and Alzheimers.

Michael Snyder, Ph.D., Professor & Chair, Stanford Center of Genomics & Personalized Medicine gave a notable and introspective presentation about the use of a combination of omics methods to assess health states in a single individual over the course of almost three years (himself). Genome sequencing was used to determine disease risk. Longitudinal personal profiling of transcriptome, proteome and metabolome was used to monitor disease, including viral infections and the onset of diabetes. His premise is that these aproaches can transform personalized medicine. It was discovered that he carried genes for diabetes and in fact developed it during the period, but, in my opinion, failed to see the causal effect of poor sleep from repeated respiratory infections that corresponded with the spike in blood sugar. In some ways, it seems these brilliant scientists just don’t see the forest from the trees, and that hurts us.

 

Steven C Quay, M.D., Ph.D., FCAP, Founder, Atossa Genetics, Inc. pitched his own company devoted to obtaining routine, repeated, “painless” breast biopsy samples non-invasively for cytopathology, NGS, proteome, and transcriptome analysis of precursors to breast cancer; The use of breast specimens obtained non-invasively for biomarker discovery, clinical trial support, and patient selection, and to inform personalized medical therapy; Cancer prevention using intraductal treatment of reversible hyperplastic lesions.

Two problems with his presentation. NO ONE KNOWS HOW TO PREVENT BREAST CANCER. Also, the “painless” techniques are almost medieval. If you don’t believe me, look up “ductal lavage” and let me know if you’d want to submit to that repeatedly.

The problem is no one ever seems to use the word cure, or to speculate why these diseases exist at all. All of the research presented seems to end with the following refrain: “Hopefully leading to the development of new drugs…” Well, follow the money. 

I don’t know if I’ll go next year. 

 

 

Posted in Big Data, Decision Management, Genomics, Medicine, Research | Tagged , , , , , , , , , , | Leave a comment

Transcript of Focus Research Roundtable on Pervasive BI

 This is Copyright 2011 Focus Research

A transcript of a live, one-hour discussion. About 18 months ago. I don’t think big data was mentioned once, because it was irrelevant. 

There is some really good discussion in here

 

 

Focus Research, Inc

 

Moderator: Jonathan Wu

August 9, 2011

9:30am PT

 

 

 

About the Roundtable:

Focus Expert Roundtables are 45 minute teleconferences where 3-5 members of the Focus Expert Network talk about hot topics on a particular category each week. On August 9, 2011, Focus Experts Herschel Chandler, Neil Raden, Lyndsay Wise and Jonathan Wu sat down to discuss self-service business intelligence.

Meeting the information needs of the business community is challenging for most IT support organizations because the requirements are often changing.    As individuals learn the features and functionality of their BI application as well as the content and business rules of the information, they frequently ask for new data sets and capabilities.   From the business needs and technical support perspectives, self-service BI is the ultimate goal. In this roundtable, we addressed the following items:

  • What is Self-Service BI?
  • What is the value of Self-Service BI?
  • What infrastructure is needed to support a Self-Service BI environment?
  • How can your organization evolve towards Self-Service BI?

 

Jonathan Wu         Welcome to the Focus Roundtable, Self-service Business Intelligence, what does it look like and how do you evolve towards self-service BI? Focus.com’s five thousand industry experts help millions of professionals make better business and technology decisions by answering questions, publishing research and speaking at events.

                                      

                                       Visit focus.com to learn more and become a member today. Please visit our events page at http://www.focus.com/ events, to post and view comments and or questions that you may have through this event.

                                      

                                        So, let’s get started. I’m Jonathan Wu, and I’ll be your moderator today. We have brought together some of the top focus experts in this area to share their insight on self-service BI.

                                      

                                       First we have Herschel Chandler, managing principle of Visionary Solutions Today, LLC. Visionary Solutions Today is a management consulting firm that helps client achieve their strategic IT investment goals by transforming and enhancing the client’s organizational mindset, IT infrastructure, and business processes.

                                      

                                       Next we have Neil Raden analyst and consultant for Hired Brains. Hired Brains provides services in market research, product marketing, messaging, positioning, product launch assistance to technology vendors in the business intelligence, analytics and information integration and semantic technology area.

                                      

                                       We also have Lindsey Wyse, industry analyst and President of Wyse Analytics. Wyse Analytics helps small and mid-sized organizations navigate the business intelligence and data visualization markets through thought leadership industry research, online events, and advisory services.

                                      

                                       Okay, so let’s talk about self-service BI by defining it and going from there How about we start off with you, Herschel, what is your perspective of self-service BI?

                                      

Herschel Chandler         Well, thank you Jonathan. My perspective on self-service BI is really about when a knowledge worker or an information consumer has a need for your information, that information is available to them right then. It’s kind of this concept of having information at your fingertips when you need it. You don’t have to call IT, don’t have to go through a whole development process to get the information you need to run your business or execute your process.

                                      

Jonathan Wu         Okay. So, Neil or Lindsay, any other perspectives that are different from what Herschel just said, in terms of defining, you know, Self-Service BI?

                                        

Neil Raden         Ladies first.

                                      

Lyndsay Wise         Oh, okay, thanks Neil. I was actually going to, I mean, I definitely agree, but I was also recently reading a report that Claudia Imhoff and Colin White wrote and they have their definition, or the main objective, of self-service BI which also, almost expands beyond the end user, but also towards making data warehousing solutions faster to deploy and easier to manage as well as easier to access data sources on the back end.

                                      

                                       So whether or not it’s actually possible to get there yet in terms of the back end aspects, it definitely does help broaden the range of what self-service should be.

                                      

Jonathan Wu         Oh. Okay. So that’s definitely a broader definition. When I think of self service BI I typically think of,

Download: loss

also encompassing the back end, dealing with the access to the data and how it’s structured, is that correct?

                                      

Lyndsay Wise         Right, so that’s one of that things that I also found interesting and actually thought it was nice to include and create more of a holistic view, even though right now normally when we think of self-service in the industry we really do tend to think of making it easier for consumers to consume and interact with.

                                      

Jonathan Wu         Right. Neil, how about you?

                                      

Neil Raden         Well, I love that term easier or ease of use because, we’ve been talking about that for thirty years. And, I’ve actually done some research on ease of use, and it has nothing thing to do with GUIs, or dashboards, or anything like that. Ease of use means, is this relevant to the work I do, can I incorporate it into the work I do without having to bang my head against the wall, and, do I understand it?

                                      

                                       Because people don’t want data, they want answers, and for the last fifteen years or so, we’ve really focused on giving them data, and the problem with data is, you know, data’s just a footprint. It’s the remnant of something that happened, and there are a lot of models and processes behind it that aren’t explicit.

                                      

                                       And that’s why people have a hard time with BI.

                                      

Jonathan Wu         You know, I would agree. I think when, you know, you take a look collectively at how we’re defining self-service BI, you know, what we’re looking at is allowing individuals not having to necessarily go through to the IT function of their organization in order to get access to data, to information, to the answers that they’re looking for.

                                      

                                       So, using that as the foundation, I mean, clearly we need to talk about, what are some of the benefits and drawbacks of this type of environment?

                                      

Neil Raden         Yeah, I had a lot of specific recommendations I would make, because I think that mainstream BI has really missed the mark in a lot of ways. And ironically, fifteen years before the major BI products arrived, a lot of that had been done with previous kinds of software that got obliterated by spreadsheets.

                                      

Jonathan Wu         Can you expand on that?

                                      

Neil Raden         Well, back in the early eighties you had mainframe and host-based decision support systems that allowed people to fairly easily create non-procedural models, do sensitivity analysis and what-if analysis, and look at scenarios and so forth. But when you look at BI, it’s largely reading from a relational database and generating one kind of report or another.

                                      

Jonathan Wu         Right, so being able to ask a question. Yeah. And then from there, you have to go out and dig again.

                                      

Neil Raden         Well, sure, which is why, which is why spreadsheets are so popular, because they’re so much more expressive than our BI tools.

                                      

Jonathan Wu         Yeah. Well you know, they’re just easier to use.

                                      

Neil Raden         Yeah.

                                      

Jonathan Wu         For the most part, you know. I mean it’s very easy to create a function, or a calculation and be, expand beyond that because of the way it’s laid out.

                                      

Herschel Chandler         Yeah, I think spreadsheets actually tie, tie into this topic because it also, a user has all the information they need in their little spreadsheet. So that kind of goes back to, you know, whatever I need to do my spreadsheet I don’t have to talk to anyone else to get. I think that’s also a benefit and a challenge to having people run their BI office spreadsheets.

                                      

Jonathan Wu         Yeah. Lindsay?

                                      

Lyndsay Wise         Yeah, I was actually going to say, I totally agree with Neil, and I think that’s why spreadsheets, as an example, have remained so, kind of, important, and just an essential part of why people will use spreadsheets over BI tools. Because, as you mentioned, Neil, BI really It almost, it tries to structure everything, but it really limits the way people interact with data.

                                      

                                       So even when you bring it back to self-service specifically, do people really understand what they’re looking at because all BI is really delivering are specific analytic or data set trends, but they’re not really getting into the depth and being able to be interacted with, in a way that end users want to be able to interact with them.

                                      

                                       So in terms of, you know, as long as you define your questions in advance and you know exactly what you want to get, then yes BI will deliver that information to you but, once, you know, there’s an outlier, or something additional that might not fit, or you come up with a new problem, or you need a new set of answers, well is BI really dynamic enough the way its setup.

                                      

                                       Is it really available and being able to provide that self service model or, an organization or an end user can actually go in and get the answers to what they need? And right now, in most cases, it’s not.

                                      

Jonathan Wu         Yeah. you know, I think we’re going to have to break this into two components. The first is, let’s talk about the front end aspect of it, and then we’ll talk about the back end, because I think we’re mixing both of them together which…

                                      

Lyndsay Wise         Right

                                      

Jonathan Wu         …is right on, and I firmly believe that, but if we were to just take a look at, say, you know, for Self-Service BI, the interface, the front end interface, you know, it’s supposedly, it’s easy to use. And that’s the draw for Self Service BI. Herschel, do you have any thoughts or comments regarding just the, the front end access?

                                      

                                       The interface, what makes, what’s the magic behind it?

                                      

Herschel Chandler         Well I think that the, I, it’s a little bit hard for me to separate the two to a degree because I think some of the best front end interfaces are the ones that enable users to go after additional data sets. In other words, going from a canned report, where I’ve got a certain set of data in it to actually go out and say, “well, let me look at this column or this metric, in relation to another data set”.

                                      

                                       And then, on the fly be able to pull in additional data points to see how that affects the overall answer to the question. So to me, the interface is part of enabling the user to go after and add in additional data click to help answer their questions. So, I see it somewhat tied together.

                                      

Jonathan Wu         Yeah, so, being able to add to it, so if an individual is taking a look at, say, their customer list, and then wanted to able to add to that, customer list by sales, and then analyze customers by sales, and then they realize, you know what?

                                      

                                       Let’s take a look at what they purchased, so adding that in as another data set, and then maybe taking a look at it, by profitability, adding that in. I believe that’s what you’re referring to?

                                      

Herschel Chandler         Yeah, precisely, precisely.

                                      

Jonathan Wu         Yeah, well, what about these vendors that promote self-service BI or the organization, because of all the rich features and functionality that they’re providing. What are your thoughts behind that?

                                      

Neil Raden         Well again, John, I always come back to the model because the data means nothing, and they have to understand the model behind the data, or they have to be able to express their own model.

                                      

Jonathan Wu         Yep.

                                      

Neil Raden         And that’s why people shy away from BI, because the kind of information they’re looking for today, really doesn’t match what’s presented to them. I think what they need instead are the kind of widgets and templates and pieces that they can assemble to do something of interest, rather than just, you know, drill down, or select, or restrict.

                                      

Jonathan Wu         Yeah, so, I, Neil you’re saying that with respect to the features and functionalities that the software vendors are providing it really doesn’t matter in terms of ease of use. Because what I’m finding is, with all the features and functionality that are thrown out there, with a lot of these software products, it becomes very confusing for a lot of these individuals.

                                      

                                       And if anything, it becomes an intimidation factor because it’s very overwhelming.

                                      

Neil Raden         It’s confusing for me. I think I’m a pretty sophisticated user. And I look at some of these GUI screens sometimes, and I’m supposed to click a button, and I’m really not sure what’s going to happen next.

                                      

Jonathan Wu         Right.

                                      

Lyndsay Wise         Well, I think almost one of the problems is that, you know, BI vendors, a lot of them have taken what they’ve been developing that have been directed at super users for all of these years and trying to expand that into a way that, you know, it can be deployed by everybody or used by everybody, and the fact is it can’t.

                                      

                                       And, so a lot of these…even when vendors will show me their solutions and they’ll say, “Oh, you know, this is for everybody. It’s self service.” Yes, it’s self service if you understand data, if you understand and have, you know, experience with BI and understand even how, you know, relational databases work.

                                      

                                       But if you’re just going in and you need questions answered, then no, all it is confusing.

                                      

Neil Raden         Good point.

                                      

Jonathan Wu         Yeah, yeah, that is a good point. But, you know, I also find that with a lot of the features and functionality of these front end tools, you know, it becomes very complex, almost to the point where you’ve got to go through a whole set of training in order to understand how to use these tools, even though these software vendors are saying it’s very easy to use.

                                      

                                       It’s not as intuitive, let’s just say, as a spreadsheet software. Are you finding that, Hershel, with your experiences?

                                      

Herschel Chandler         Yeah, I would agree. I mean, all the bells and whistles that we’re seeing lately here in the last few years and the tools, you know, for a techy person or someone like us that’s been doing BI, they’re really great. I mean, you look at, you know, kind of the masses, the people that consume the data, they don’t use those things.

                                      

                                       They use spreadsheets. They relate to spreadsheets. And I think all the things they’re adding are great. They’re great for data analysts, but I don’t think that they’re being used as much as they maybe claim they’re used. I don’t think they have necessarily as much value to the general user base as they kind of make it out of be.

                                      

                                       I mean, they’re great for the people that are focused on the data areas focused on their niche, but as far as the math, it’s kind of what Lindsey was saying. You know, they don’t use it. They want their report, they want their data, they want their question answered. They don’t care that much about all the bells and whistles.

                                      

Jonathan Wu         Yeah.

                                      

Neil Raden         Yeah.

                                      

                                       And let’s talk about spreadsheets for a minute. Excel of 2011 is nothing like Excel of 1995. It’s a very robust system. It has collaboration, it has pivot point, it has share point, it has connectors to data. It’s a…and the way we malign spreadsheets in BI by saying, “Oh, all they want to do is move the data to the spreadsheet.” Well, it’s actually a pretty robust system now.

                                      

                                       In fact, if you truly think about it, it is BI.

                                      

Jonathan Wu         Right. Well, I would agree. In terms of manipulation of information which you have at your disposal, without a doubt.

                                      

                                       So any other thoughts in terms of the front end interface for ease of use?

                                      

Neil Raden         Well, I like the idea of people being able to work with their software and have the software be personalized the same way good web apps are. That sort of all ready understands what the person wants, the kind of things that they do, and that they should be able to approach it, you know, maybe not in natural language but in a more natural kind of way.

                                      

                                       Because it really is easier to ask for what you want in language. You know, so let me give you an example. Where do you find a richer set of data than Major League Baseball? So suppose you had a question and you said, “I really want to know the number of wins of the five most winning pitchers of each team this year and last year.” Now that’s a very easy thing to ask for.

                                      

                                       Now try to ask for that in SQL. Or in Point and Click.

                                      

Jonathan Wu         Yeah, yeah, that’s a very good point.

                                      

Neil Raden         And we have to get closer to that. If not natural language, which I know is a very difficult problem, but, you know, phrases or this is what I mean by widgets and so forth, things that the talk about top ten in or top five or compare this to this or this to last year, it’s, you know, it’s not any easy problem.

                                      

                                       But I think that the way the vendors have tried to attack this is by adding more and more functionality, different interfaces and global interfaces and so forth, but the basic problem they need to solve is, How do you make this useful and relevant to people?

                                      

Jonathan Wu         Which then moves to the back end to a certain extent. I mean, you’ve got the front end which, you know, we’ve talked about some of the challenges there. But what is the back end got to look like in terms of data structure?

                                      

Herschel Chandler         Well, I think, kind of tagging on to what Neil just said, I think that, you know, the tool vendors are giving us, you know, adding tools to our tool kit. But ultimately, it is the, you know, whatever the role is, the architect’s role to take those tools and have them have meaning in context within a given environment.

                                      

                                       So, for instance, using your Major League Baseball example, you know, if they’re always asking these types of questions, you know, what tools do I have in my tool kit from the tool vendors that I can deploy out to my user base to help them answer their questions. All the tool, you know, the tool vendors aren’t solutions, the tool vendors are just tools.

                                      

                                       I think we still have to have the role of the person to actually develop a solution that uses that tool and then give them contacts.

                                      

Jonathan Wu         Yeah, so what is that solution on the back-end?

                                      

Herschel Chandler         Well, I think that really, I think it gets down to, I definitely don’t want to say it is a building ADO warehouse. There are a lot of, it kind of depends a little bit on what the needs are, I mean, if I’m looking for an organization in terms of, for instance going back to the Major League Baseball example you know, that’s a fairly standard question doesn’t really have a lot of timeliness needs to it.

                                      

                                       You can really develop a technical back end any number of ways to answer that question. I hate to use the old phrase, it really depends on the requirements. I don’t think that, anymore, it’s this concept of building a big back-end data warehouse. I think all of the tools in technology exist out there so that you can have any different number of differing heterogeneous infrastructure components on the backhand still answer that same question.

                                      

                                       I don’t think there’s a one answer to that, Mike, you know.

                                      

Jonathan Wu         Yeah.

                                      

Neil Raden         Hershel, I don’t know if you…I don’t know that you’ve built data warehouses, but I know that Jonathan has. Jonathan, you know that those ideas are great, but at the end of the day, if it doesn’t perform, you’re in deep do-do.

                                      

Jonathan Wu         Right.

                                      

Neil Raden         And the problem with federation at this point is it’s a nice idea. I love the idea. But you have to apply it carefully because in a lot of situations the performance is not very good. The data structures are not optimized for those queries.

                                      

Lyndsay Wise         See, that’s what I was going to say in terms of, you know, everybody can talk about the back end, but really it’s developing a back end or trying to develop a back end that really supports what the business needs. And that’s something that is still really limited just based on the structure of building out a data warehouse and the limitation that it provides based on development efforts as opposed to really instead of focusing on all of the technical requirements, really developing something backwards based on what do business people in different departments need to get their job done.

                                      

                                       And kind of having it as a supporting role so that the back-end really does support a front end that enables people to do that.

                                      

Jonathan Wu         Yeah, which requires probably a couple of aspects. That is data that’s structured so that you can analyze it in a structured sort of format, but the other aspect is somehow integrating unstructured data sets, which, you know, it ranges anything from a series of documents, to images, to everything else, which is a whole other challenge that I don’t think we necessarily need to address at this point.

                                      

                                       But when it comes to, you know, self-service BI, you’ve got to be able to create an environment which allows individuals to easily access the data and have the performance that’s required. Which means, you know, spending time and energy creating that environment that provides those capabilities.

                                      

                                       Yeah, so, what does that look like? Lindsay?

                                      

Lyndsay Wise         In terms of creating the environment?

                                      

Jonathan Wu         Yeah. Yeah. Because if you’re going to provide self-service BI, where individuals have a host of tools at their disposal to ask the questions to obtain the answers that they’re looking for, what do you have to do on the back end?

                                      

Lyndsay Wise         Right. It’s actually…I mean, to expand on Hershel in term of it really depends, but at the same time there’s also a debate in terms of, you know, once you…and I think it really…it almost expands on the point that Neil’s been making in terms of, you know, based on the way BI is now, a lot of what’s provided even as self-service really begs the question of the information or the data that’s provided and business users…Is there really an easy enough way to understand and decipher that data?

                                      

                                       So, for example, all of the different statistical models are people really using what is now self-service in a way that can be used on a business level or do you really have to understand the statistical models that go behind it? And so it’s almost something that needs to be developed on the back end that almost takes that and provides solutions in such a way that organizations can do that.

                                      

                                       So it really does go back to Neil’s example almost in terms of, How do you develop solutions on the back end that really can be used using actual language or be used by asking questions? And I don’t necessarily know if the structure of BI, the way it’s structured now, whether it really can be done without having to, you know, develop something entirely new.

                                      

Jonathan Wu         Yeah.

                                        

Neil Raden         I wouldn’t disagree with that. I think that the whole notion…you know, there’s a lot of products out there We think of the big four or five and then there’s a second tier, maybe another dozen, but there’s bunch of other products out there too. But in general, the lion’s share of people who are using BI have purchased it from the big four.

                                      

                                       And those products, their underlying concept is to inform people and to stop there. But people don’t need to just be informed. They need to be able to interact and ask questions like what if, or did you consider this, or let’s add this variable or take this variable out. And they need to get to some kind of decision, generally through some sort of collaboration, meaning they have to share this somehow with other people.

                                      

                                       It has to be group authored or a group effort. And to traditionally people have used their spreadsheets for that. Because spreadsheets are, I like to use the word subversive, meaning they don’t follow the rules of IT. They don’t even necessarily follow the organizational hierarchy. You can send a spreadsheet to somebody, you know, across the world.

                                      

                                       And current BI: I think there’s just not a line for that, and Lindsey, I do wonder if the current vendors are going to be able to do this.

                                      

Lyndsay Wise         I almost wonder because it’s interesting you see some new solutions emerging, which are mostly whether it’s front end dashboards that are starting to incorporate you know the social networking and the collaboration, but really the problem is, is that all of these vendors as we’ve mentioned, you know as they start to develop features and functions, they’re just basically adding on to the info-structure that they already have in house, which really won’t help companies answer the questions they need.

                                      

Neil Raden         Yeah. Jonathan, can we mention vendor names?

                                      

Jonathan Wu         Oh yeah, by all means.

                                      

Neil Raden         Well look at Liza-soft or Yellow Fin these are two really nice products that developed from scratch they don’t carry around the legacy of, you know, 15 or 20 years of acquired companies and products, trying to, you know, stitch them together. And it shows, they’re beautiful.

                                      

Lyndsay Wise         Well, it’s interesting, because yesterday while I was at TDWI I met with another company that’s newer that you also might want to check out, Neil. Metric insights that also has that concept and is building a lot of collaborative features, and it reminded me of the yellow fins and liza sauce.

                                      

Neil Raden         Yeah.

                                      

Lyndsay Wise         Like, so it’s starting to happen, but it’s on a small kind of plane, and these are smaller players, so. You know, it’ll be interesting to see if other vendors kind of take heed and see the value in this.

                                      

Neil Raden         And they’re like little mammals eating the dinosaur eggs.

                                      

Lyndsay Wise         Right.

                                      

Jonathan Wu         Nice analogy. But you still have to have data in a format that can be accessed. You know, if you take a look at, say, a complex organization that may have multiple different systems used to run the operations, the business itself, they’ve got five or six different systems. And in order to get a comprehensive perspective of, say, your customer, in terms of, you know, Who is responsible for that customer?

                                      

                                       What are their purchases? What is the profitability? What sort of field representation has taken place in terms of surface calls what have you. You got to access all these different systems in order to get the data because each system has a very finite, defined set of data. You’ve got to be able to create some sort of environment.

                                      

                                       And that environment must be, you know, a data warehouse. Wouldn’t you agree? Neil?

                                      

Neil Raden         Well, depending on how you define a data warehouse, yeah. Well, the analogy that I like is, you know what a Mechanical Turk is, right?

                                      

Jonathan Wu         I don’t.

                                      

Neil Raden         Well, for listeners, I’ll explain. Back in I think the seventeenth century. Some guy from Turkey invented this machine to play chess, and it beat Benjamin Franklin and some other very famous people, but it turned out there was actually a a chess master in the box itself running the machine, and that was called the Mechanical Turk.

                                      

                                       Well, Amazon has used that name now, to define their service, where you can embed a web service in an application and it goes out to actually physically people to do the work, who understand they can do a better job than a computer can. Say, you know, deduping a file, or sorting something, or whatever.

                                      

                                       And that’s called their mechanical turk. And to me, I think the best thing for people would be, I’ve got five or six or seven definitions of the same thing around the company. I need something like a mechanical turk that says, you know, “I want a mid-range customer.” And the mechanical turk goes out and says, “Oh, I’ll get it from this place.” Right?

                                      

                                       And that’s I guess federation. Although federation I think is more physical than it is logical. And I think the data warehouse will probably live forever, but probably not in the full role it was initially envisioned. Because it’s not agile enough.

                                      

Jonathan Wu         It’s not agile enough, but at the same time you’ve got to have some sort of data repository that collects information from, say, these five different systems and put into a common environment so that Individuals can easily access it to assess questions.

                                      

Neil Raden         Agreed. But the danger in that is that you take data from its source and you redefine it into a different model, and when you do that, you lose a lot of the semantics of the original data.

                                      

                                       But you have to do that for performance reasons. But once the performance reasons may be minimized then there’s probably less of a need to use a data warehouse.

                                      

Jonathan Wu         You know I would agree. I think it depends it depends on how you bring that data over, and how you structure it. I mean, clearly you are going to have some consistency, because if you’re pulling from five different transactional systems that all contain elements of, say, customer data, you’ve got to be able to put it on the same plane, so that you can access information in a meaningful manner.

                                      

Neil Raden         Well somebody still has to count the beans, right?

                                      

Jonathan Wu         You got it.

                                      

Neil Raden         Right? somebody still has to generate external reports that are right. So, that’s what I’m saying. That kind of stuff is perfect for the warehouse, but more fluffy stuff, it involves external data and other things. To get that model into a data warehouse is too difficult and too time-consuming, because, you know, once you have to change it, you have to jackhammer it up and start over.

                                      

Jonathanw Wu         Right.

                                      

Neil Raden         So, I think we’re going to see the data warehouse evolve away from the single version of the truth to a single version of the truth for certain functions.

                                      

Jonathan Wu         Yeah, I would agree, but, you know, again, going back to self-service BI. You know, the way the tools currently exist, and being able to provide individuals with this capability, you know, two things have got to take place.

                                      

                                       One is making sure that the front end meets their needs, and second, you’ve got the back end access to that data in a format that they can easily access.

                                      

Herschel Chandler         I think there’s one other component. If I can…there is one other component that we’ve not really talked about is, we’re talking a lot about tools and technology, but when you start thinking about self-service BI and meeting the information your information consumed, there’s just also a lot of process that needs to be put in place.

                                      

                                        You know we’re talking about, you know, integrating data from multiple sources into one version of the truth, or determining you know, what, what is a customer? You’ve got data governance aspects to that. You’ve got, you know, strategic needs to, you know, what are the information needs of the organization?

                                      

                                       So, we need to come up with a plan to build out an information capability in segments. You know, your multi-year phased plan where it can pull data in from these sources into a data warehouse or whatever your repository is called, to actually meet the information.

                                      

                                       And so, I think even though there are tool components that we need ways to go out and pull the data and integrate the data, there are also people components of it that actually have to get involved and touch the data that flows over to the information that we really hadn’t touched upon when we were talking about tools.

                                      

Jonathan Wu         You know, I would agree. I think it’s one of the things where, you know, it’s taking a look at say, an organization’s existing environment and say they want to move towards Self-Service BI, you know, how do you evolve? How do you go from where you’re currently at (which is very limited – it may be solely reliant upon IT to provide you access to the data) to an environment that is Self Service B.I.?

                                      

                                       Lindsay?

                                      

Lyndsay Wise         Sure, it’s actually an interesting question because I think that lately I’ve been lucky enough to be working with companies that are actually new to B.I., so they almost don’t have the stumbling blocks of having, you know, a data warehouse and some tools that they’re trying to become more efficient in.

                                      

                                       They actually right away want that self-service model. But in terms of going to a self-service model, it’s almost like you really have to reevaluate what you have in house. And some of the stumbling blocks to that is that, companies have already invested a lot of time and money and are kind of stuck into support kind of models depending on how they’ve been using these solutions in the past.

                                      

                                       So it’s like “Can you really use what you have now to get there? Or do you have to start from scratch or kind of take a subset, and look at new solutions whether it’s like using – Neil was using examples of Yellowfin and Lizasoft on top of what you already have to kind of access information in a different way.

                                      

                                       But in terms of really expanding or evolving, a lot of what’s already in-house almost becomes a roadblock to really enabling organizations to get there unless they reevaluate the way they’ve been using tools and the way they’ve actually structured their data warehousing or data infrastructure on the back end and how their using solutions.

                                      

                                       So, I’m definitely not advocating a rip-and-replace or anything like that but it definitely you know, you have to look beyond what you have if you’ve been using a traditional model for years and years and actually want to expand out and not just have superusers use the solution.

                                      

Jonathan Wu         Right. Neil, your thoughts?

                                      

Neil Raden         Well, I just think about how people get trained to use a BI tool: they go to a two- or three-day training course. And in the first day they more or less listened. And by the morning of the second day they’re leaving the room to take calls on their Blackberrys or iPhones. And then they don’t come back after lunch on the second day.

                                      

                                       And then the third day, which they miss, is basically all the things they needed to know. I think that’s why people don’t adopt a product, because within a week or two they even forget what they learned in day one and they just don’t see it as viable. So, as far as I’m concerned, the tools need to maybe bring people along.

                                      

                                       Look, we have Darwin, right? No, Watson. We have Watson. And Watson can win at Jeopardy. Now it’s gonna start doing medical diagnosis. Now, everybody can’t afford a Watson, but the point is, for God’s sake, it’s 2011. I like to say, “Where’s my robot?” You know? I was a kid in the 60s and there were all kinds of things on TV about we’re gonna get robots.

                                      

                                       Where is my robot? Why are we still doing things the same way.

                                      

                                       And we just had to get a lot smarter about it. I don’t know how To do that, I just think I know that it needs to be done. And the other thing is that people of my age – well, they’re all retired, but people who younger than me, they look at computers with kind of a jaundiced eye. But the generation of people that are coming into companies to use this stuff, they live on computers, or mobile devices or whatever.

                                      

                                       And that’s a good thing and a bad thing. It’s a good thing in that they don’t shy away from it, they are not afraid of it. The bad thing is that they expect it to just work. And when you look at the things that happen on the web and on the internet, they just work. But BI doesn’t so they are not going to be.

                                      

                                       Very happy, we have to give them a better experience.

                                      

Jonathan Wu         Right, which I think goes back to what Herschel was saying, and that is making sure that you’ve got the policies and procedures dealing with the information governance, master data management in place so that you can provide somewhat of an environment. Now, granted, the magic’s not there at this point in time, in terms of, you know, providing these robots to answer the question, people still have to go through their own discovery process Just with the data which is a challenge.

                                      

                                       So Herschel, continue talking about how do you evolve?

                                      

Herschel Chandler         Well, I think the two pieces to evolving an organization to keys are change management and communication. Most of the organizations that I worked with had some sort of BI capability, they may not call it BI but most places today have some sort of reporting infrastructure in place. One of the other fact that I work with, organizations, I typically, the way I like to go in, kind of, start to evolve organizations to Self-Service BI is I start bringing in the concept of a center of excellence or a competency center, there will be a centralized plan for BI and one of their jobs is to ascertain what it would take for that organization to change, the impact of that change and how best to communicate that change.

                                      

                                       So it’s really a process of bringing the because we can do, build the best widget ever, build the best data warehouse ever, or put the information out there as quickly as possible. But until an information worker, an information consumer actually uses that information or actually goes out and uses self-service access to BI to impact a process, we’ve not really achieved any of that.

                                      

                                        So, a big part of self-service BI is when we do deliver it, we need to show the value. And the business usually needs to go back in and modify existing process to take advantage of this new capability we’ve built. So, part of it is beyond just building the widget, it’s also going out and impacting the business process so they can take advantage of a new capability.

                                      

                                       And I think that once we show how that’s done incrementally – obviously you don’t go out and do it all at once – but pick an area, show how it’s done well and let that success kinda build upon its own and snowball from there.

                                      

Jonathan Wu         So pick an area meaning… a defined subject area, say customer, for example. Which clearly provides tremendous value to any organization understanding who their customers are, what services or products you’re selling to them, their profitability, any sort of characteristics that allows you to gain greater insights.

                                      

                                       That’s what you’re talking about?

                                      

Herschel Chandler         Exactly it’s something and it depends on the organization, pick something that has meaning to it. So for instance, one of the big mortgage companies here, mortgage backers here in D.C., they’re all about data, financial data, so they were more worried about loans or things along those lines. So, pick something that has meaning,that has broad meaning across the organization.

                                      

                                       Pick something that also has a lot of meaning but it’s easily achievable, because what you want to do is you want to have a quick venue and you want to have a success. Don’t take the hardest problem in the world as your first problem. Pick something that has broad meaning. In your example customer, most every organization out there has a customer in some way, so yeah, that would be a good way to start.

                                      

Jonathan Wu         And then from there, providing that environment, which is the data sets, the rules that are associated with it, but going back to Neil’s point that is, not changing the data so much that you’ll lose some of the inherent value that’s captured from the transactional systems.

                                        

Herschel Chandler         Exactly.

                                      

Jonathan Wu         Yes. Those are some of the challenges that are there. Which still drives the need for some sort of comprehensive information plan that then, I would think, segregates responsibility. If you’re gonna take a look at self-service BI, clearly IT is gonna have to manage the back end at this point in time.

                                      

                                       Would you agree, Lyndsay?

                                      

Lyndsay Wise         Yeah, definitely. I mean, I was almost trying to merge both what Herschell and Neil were saying in terms of developing out a competency center and really developing the processes that surround it. And then on the other side really having that younger generation that almost wants Facebook-type access to BI.

                                      

                                       And how do you get there in terms of a self-service model where right now we really do require so much that goes onto the back and an even business requirement in terms of what processes, what are the business rules, how do we integrate that and kind of merge our business needs and what we need to get done in a processed way to the technology.

                                      

                                       Versus the fact that now companies and people, just because of the way we interact with technology, want things immediately.

                                      

                                       So, there’s definitely going to be in terms of how – there’s going to have to be a way at some point to merge that gap, to be able to provide new process integration and kind of all of the development that needs to go into things with an ease of use. That really does kind of mimic the way people are using technology now and will continue to use technology and actually respect it more efficiently, quicker and just better access and more interactive access.

                                      

Jonathan Wu         Yep. Yeah, I would agree. So, let me just summarize this from the different perspectives. If you were to evolve to a self-service, business-intelligence-type capability, clearly, starting off with a subject area that provides the greatest value to the organization. You know, you don’t want to, you know, be distracted by data sets that provide limited value.

                                      

                                       Say fixed assets for example, would not necessarily be a good use, but that maybe customer that provides great value and then creating an environment with the BI tool on top of it that allows individuals to easily access it which means making sure you’ve got data standards in place. Where IT is spending the time maintaining the environment while allowing the individuals to freely access the data.

                                      

                                       Is that a good summary or other perspectives that I may have left out.

                                      

Lyndsay Wise         Sure and if I think even going beyond what you said is it’s not necessarily only just accessing it, but also sharing it and collaborating it and being able to use it make decisions and to ask other questions and to share the information and to be able to create discussions about it, you know, on a broader level.

                                      

                                       Even in some cases, you know, when you’re looking at customer, if you’re looking a product. It could be speaking with outside partners, or customers, or really being able to identify how do we make the customer experience better? How do we increase retention? How do we make our supply chain more efficient?

                                      

                                       All of these things that really do require an aspect of collaboration as well.

                                      

Jonathan Wu         Right. Well, you’re touching on a subject that is dear to me and that is: extending the value of business intelligence beyond just providing you access to the data, but being able to take a look at it from a collaborative perspective and make it more actionable. Which I think is a whole nother conversation but I appreciate what you have to say.

                                      

                                        We’ve got a series of questions that have come up. And let me go through and take a look at these and start posing it to the group. We’ve got one here which says: Does a single underlined deployment of a data warehouse platform work for all self-service BI applications? And then it goes on to say, if you need BI views for multiple functions within your business, does a single deployment of the underlying data warehouse service infrastructure of work?

                                      

                                       If so, which is it and what are the best practices?

                                      

Neil Raden         Jonathan, I’d say that the tools and technology to do that are certainly out there, but a lot of implementations I’ve seen have fallen far short.

                                      

Jonathan Wu         What do they need to do in order to be successful, Neil?

                                      

Neil Raden         They’re just poorly designed. They’re based on some limited thinking. You know, they’re constrained by previous ways of doing things and the whole idea just gets watered down as it’s implemented.

                                      

Jonathan Wu         So are you talking about in the process of extracting data out of these operational systems and put it into a counter repository there is, I would say business value, business rules that are lost in the the translation of that data into a new environment?

                                      

Neil Raden         Well, yeah, I suppose that’s it, but I mean, there’s other things that happen too, like limiting access to the data her house and, you know, putting governors on queries and, you know, under staffing and under powering the process and, you know, building a bunch of independent data marks or even just building data marks in general when if they had a better database, they could run the queries against it.

                                      

                                       Just, you know, a million things.

                                      

Jonathan Wu         Well, yeah, but I don’t think you’re advocating for an enterprise data warehouse, are you?

                                      

Neil Raden         Sometimes, depends on the company. Some companies, they’ve been very successful.

                                      

Jonathan Wu         Yeah, but trying to go back and answering this question from a single data warehouse, you know, I can see your point is, you know, case by case you gotta take a look at what you’re trying to do.

                                      

Neil Raden         Yeah, well if you create data marks you are automatically creating extra maintenance and that adds to longer development time and longer maintenance Time and so forth. Look. Every company is different. Which, by the way, this isn’t the answer to the question, but it reminds me of something else I wanted say.

                                      

                                       And that is, if you took 10 companies and you lined them up together and looked at their data requirements, what you would find is that they have a lot of stuff in common. But what we do is we go in and model every single company separately. And I’m a big proponent of systematic technologies and ontologies, where you can take common ontologies and snap them together and just do the incremental stuff you have to do that’s special to that organization .

                                      

Jonathan Wu         You know, I would agree, but I think it goes back to that 80/20 rule where you take a look at business processes by business functions and 80% are probably standard across most companies. I mean, you’ve got general accepted accounting principles as well as other requirements that are out there that require that standardization.

                                      

                                       The 20% make it unique by, I would say, industry as well as organization. And that’s where value takes place, where you may be more efficient in your business processes, or you may be able to provide your competitive advantage within that. That’s the differentiators? That I’ve seen in the past. Lyndsay or Herschel, any thoughts in terms of that last question?

                                      

Lyndsay Wise         No, I pretty much agree with what both of you have been saying. I’m trying to think if there’s anything additional that, you know, I see different, but there’s nothing really that’s coming to mind.

                                      

Herschel Chandler         Yeah, I would only say that, you know, in theory having that one single data, enterprise data warehouse that is the oracle to ask all questions is a great idea, but in practice, I, you know, there are exceptions, in practice I haven’t seen that. You know, I love going into organizations and, you know, there are three or four enterprise data warehouses.

                                      

                                       So it’s a good idea, but I think, in practice, it doesn’t always spell out that way.

                                      

Neil Raden         Well, Lyndsay and I, Lyndsay and I saw one last week from eBay. What was it? Thirty-seven petabytes?

                                      

Lyndsay Wise         Yes.

                                      

Neil Raden         One database.

                                      

Lyndsay Wise         Yeah.

                                      

Jonathan Wu         And performance?

                                      

Neil Raden         Fantastic. I mean, it’s in this place the size of eleven football fields.

                                      

Jonathan Wu         Okay.

                                      

Lyndsay Wise         It was quite impressive.

                                      

Neil Raden         With a bunch of ex-Marines guarding it.

                                      

Jonathan Wu         Yeah, which is unrealistic for most organizations.

                                      

Neil Raden         That’s right. Not everybody’s eBay, I agree.

                                      

Jonathan Wu         All right. So, here’s another question that we have, and that is: how do you overcome any sort of privateer or security concerns? Neil, you want to kick this one off?

                                      

Neil Raden         Well, I’m going to go back to ontologies and that is, if you make the data smart, you can make the applications dumb, and that means if a piece of information or a string of information can be defined in a certain way, and then its attributes are definable that can relate them to some mailed out scheme or something, then you don’t have to do anything in particular.

                                      

                                       But we insist on building metadata in relation databases where there are no semantics. The semantics are only added in the WHERE clause of a query. So I think we’re going down the wrong path and that It’s why I’m not that crazy about master data management either. I know that the Department of Defense and the intelligence community has made huge strides with ontology and I don’t know why it hasn’t had a bigger impact in the commercial sector.

                                      

Jonathan Wu         Yeah. Herschel, what are your thoughts in terms of privacy and security? First off.

                                      

Herschel Chandler         Well, I think, you know, I think that most of the, most of them are all mature tools out there today, and even going back to the database level, have this kind of built-in. It’s there. And you have to have, obviously, when we start talking about privacy and security, you’ve got to have policies in place, which is kind of outside the realm of this, but the actual implementation of those policies are part and parcel of even the mature tools out there.

                                      

                                       So I don’t think it’s a concern. I don’t think it’s something that isn’t already handled in big vista tool base.

                                      

Jonathan Wu         Right.

                                      

                                       So you’re saying the tools have the capability. Just making sure that you utilize those capabilities in order the secure the data itself.

                                      

                                        

Herschel Chandler         Correct. Defining the policies and then having the tool implement those policies.

                                      

Jonathan Wu         Yeah. Lyndsay, any?

                                      

Lyndsay Wise         Yeah, I agree with Hershel. I mean, I think that especially as organizations who, you know, have sensitive data or have different requirements or compliance really need to address these issues. I think over time, you know, the tools and products that are out there really do allow them to address it, it’s just how the organization itself actually adopts its and implements it.

                                      

Jonathan Wu         Yeah, and it forces it. I would completely agree. I think there is, taking a look at the data itself, who has access to it as well as what levels of information that they can actually take a look at. It reminds me of several organizations where we’ve built, you know, a finance solution and these were publicly traded companies.

                                      

                                       Yeah, you needed to provide a period of blackout when it came to the information, especially when they were taking a look at quarter end results because, you know, being publicly traded there was a fear of insider trading if individuals knew how well the company was performing.

                                      

                                       Yeah.

                                      

                                       Yeah, yeah.

                                      

                                       So, you do have to provide that level of security, especially when you’re, when you’ve got a self serviced environment where anyone can ask a question and ideally receive the information there. They want it now.

                                      

                                       What else do we have? So we talked about privacy concerns.

                                      

Neil Raden         Hey, Jonathan, there’s something I wanted to bring up, but I didn’t get a chance. What’s the most successful piece of software in the world? In fact, what’s the one piece of software that’s actually changed the world, from Tunisia to Egypt to China?

                                      

Jonathan Wu         Well, I would have to say Twitter and you’re?

                                      

Neil Raden         I would have said Google, but Twitter is fine too, because the question I’m going to ask would have the same answer anyway, and that is, how much time did you spend in a training class to learn Twitter or Google.

                                      

Jonathan Wu         Right.

                                      

Neil Raden         And I think that’s where we’re going wrong. We need to aim for that.

                                      

Jonathan Wu         You know, I would agree. When it comes to the self-serviced element of business intelligence, being able to allow individuals to easily access or answer the questions that they have in a manner that is very intuitive. To the point that we were making earlier.

                                      

                                       A lot of these BI tools, they’re not very intuitive. It requires extensive training.

                                      

Neil Raden         Yeah.

                                      

Jonathan Wu         Training on, you know, what are these features, the functionality that’s provided? So that’s one set, which is how that interface works. The second set is on the back end. Yeah, one of the corresponding business rules to your point, you know, a lot of these environments have inherent business rules.

                                      

                                       You know, I remember one of the first finance data warehouses that we built you know, it was self-service. Individuals had access to it and even though they went through training, a lot of them forgot that, you know, this finance data warehouse multi-currency. So, if they didn’t clearly state, you know, as a condition what currency they’re dealing with, they were getting everything.

                                      

Neil Raden         Yeah.

                                      

Jonathan Wu         So, it created results that far exceeded their expectations. When they started tuning down the currency it narrowed down the results and gave them the information that they wanted. Those were inherent business rules in the solutions which, you know, even for some of these finance professionals that we were working with, they would forget.

                                      

                                       It wasn’t that intuitive, you know.

                                      

                                       Yeah.

                                      

Neil Raden         Well, the other thing I would say is compare the success of Google with Wolfram Alpha which hit the ground like a safe, and if you look at the two of them you can see the reason why. It’s that it wasn’t at all intuitive when you went to Wolfram Alpha. I would ask ten questions and not get an answer to any of them.

                                      

Jonahan Wu         Yeah. Yep, so some of the challenges. Taking a look at the time and trying to wrap this up, concluding thoughts? Hershel, you want to kick those off. Any concluding thoughts on self-service BI?

                                      

Herschel Chandler         Well, I guess the closing thought I would have with self-service BI is again, don’t get so caught up in the tools and technology. I think self-service BI, the tools are out there to deliver. They have to architected well; they have to be set up well. But I think the piece that’s often missed is, we have to have the people processes around self-service BI to ensure that this new capability is used for best advantage.

                                      

                                       It’s not all about tools and technology. It’s also a lot about process.

                                      

Jonathan Wu         Yup, very good. Lyndsay Wise?

                                      

Lyndsay Wise         Yeah, I agree. And then, on another note, I was kind of thinking, for people that really wanted answers in terms of how to get started with Self-service BI. It’s almost been a depressing roundtable just because we’ve talking a lot about the challenges.

                                      

                                       So, I also think that it’s possible and I think that it’s definitely you know, possible right now based on certain solutions that are out there and I also think that the market’s going to really have to change, because as organizations start to demand things, and as people demand certain levels of interaction and certain ways of asking questions and expecting the answers that are relevant in a timely fashion, that, you know, solutions are going to have to be developed that can actually achieve that.

                                      

                                       Otherwise, you know, the value proposition of BI really won’t be able to be justified on a long-term continuum, and least not in a self-service perspective. Obviously there are different uses.

                                      

Jonathan Wu         Right. And Neil?

                                      

Neil Raden         Well, we’re sort of like Moses at the Jordan River, right? We can get them that far, but we haven’t gotten them across. And I think that people aren’t interested in data; they’re interested in answers. And in order to do that we have to spend some time not just on making it easier to use, whatever that means, because ease of use is dependent on getting the right result, not just how easy it is to ask the question.

                                      

                                       And I think we need to spend more time finding ways for them to understand the relationships and the models. I hate to use the term models, cause that scares people, but the relationships between the data and how it all pulls together. And then they’ll be able to ask the kinds of questions they want.

                                      

Jonathan Wu         Very good. Well thanks everyone for participating in this roundtable. All the questions will be posted to this event page on Focus so the conversation can live on. Questions that we weren’t able to answer from the audience during the time allotted will be posted for further discussion. Thank you again to our experts for this lively and engaging conversation

                                      

 

 

 

 

Herschel Chandler

Managing Principal, Visionary Solutions Today, LLC

http://www.focus.com/profiles/herschel-chandler/public

 

 

 

 

Neil Raden

Analyst & Consultant, Hired Brains

http://www.focus.com/profiles/neil-raden/public

 

 

 

 

Lyndsay Wise

Industry Analyst, President, WiseAnalytics

http://www.focus.com/profiles/lyndsay-wise/public

 

 

 

 

Jonathan Wu

President, NAVinture, Inc.

http://www.focus.com/profiles/jonathan-wu/public

 

 

 

 

 

 

 

 

 

 

 

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Forked SQL: Informatica Gets It

By Neil Raden

About fifteen years ago, Microstrategy cofounder Sanju Bansal told me, “SQL is the best hope for leveraging the latent value from databases.” Fifteen years later, it’s extraordinary how correct Bansal was. Microstrategy is still a robust, free-standing Business Intelligence company while most of the proprietary multidimensional databases have disappeared. But what about the next fifteen years?

At least for business analytics, SQL is under attack. So much so, that there is an entire emergent market segment called NoSQL. If SQL itself is under siege, what about the myriad technologies that in one way or another are part of the SQL ecosystem like Informatica? Are they obsolete? Will we need to throw away the baby with the bathwater?

This whole dustup has been brewing for a decade or more. Since we started using computers in business 60+ years ago, the big machines were managed by a separate group of people with specialized skills, now generically referred to as IT. Though separate and decidedly non-businesslike, IT eventually became bureaucratic, fixed in its mission to control everything data. Even SQL was adopted very slowly, but it is solidly the tool of choice for most applications.

About ten years ago, though, a renegade group of people I called “the pony-tail-haired guys” (PTH for short) appeared on the scene with their externally-focused web sites and gradually, development tools, methods and monitoring software. At first, IT paid no attention to them because they didn’t interfere with the inwardly-focused enterprise computing environment, but as the perceived value of “e-business” developed, a great deal of friction and turf war fighting erupted. Computing bifurcated inside the firewall.

The PTH preferred web-oriented tools, open source software and search. That’s why Big Data/Hadoop and NoSQL are so divergent from enterprise computing. Different people, different applications, different brains.

But when it comes to Business Analytics, the lines are not so clear. The PTH found, just as enterprise people did (reluctantly), analytics is key to everything else. But while enterprise apps measure sales and revenue, the PTH guys are looking at really strange things like sentiment analysis. Today, I can download a Fortune 500 general ledger to my watch, but things like sentiment analysis look at 100′s of 1000′s times more data. Loading into a relational database and analyzing the data with a language designed for set operations and transactions just doesn’t work. So is there a justification for something other than SQL for this? Of course there is.

But the PTH guys, now that they’ve grown up a little, are starting to show the same sclerotic tendencies as their IT colleagues, assuming that their tools and methodologies are the ONLY tool for analytics and SQL needs to be put in the dustbin. That’s just silly. Existing data warehouse, data integration and data analysis and presentation tools are well-suited to lots of tasks that aren’t going away any time soon, though methodologies and implementations are in dire need for renovation, something I call a Surround Strategy as opposed to the ridiculous outdated idea of the Single Version of the Truth.

Here is where Informatica enters the picture. With all the breathless enthusiasm over Big Data, one thing is often lost in the rush: no fundamental laws of physics concerning data integration have been altered. This places Informatica squarely in the middle of every trend grabbing headlines today: big data, social analytics, virtualization and cloud.

Any type of reporting or analysis, whether through traditional ETL and data warehousing, Hadoop, Complex Event Processing or even Master Data Management deals with “used data,” meaning, data created through some original process. All used data has one attribute in common – it doesn’t like to play with other used data. Data extracted from a primary source typically has hidden semantics and rules in the application logic so that, on it’s own, it often doesn’t make sense.

This has been the most difficult part of assembling useful data for analysis historically, and with the entrance of mountains of non-enterprise data, the problem has only grown larger.

Luckily for the fortunes of Informatica, this is a big opportunity and they have stepped up.

For in depth descriptions of the following Big Data, cloud, Hadoop SaaS innovations coming from Informatica, refer to their materials at http://www.informatica.com. Briefly, they include:

Hparser, at facility to visually prepare very large datasets for processing in Hadoop, saving developer/analyst a substantial amount of time from what is mostly hand coding.

Complex Event Processing (CEP) which isn’t new for Informatica, but given the expanded complexity of data integration in the Big Data era, they have incorporated CEP into their own platform to detect and inform of events that can affect the performance, accuracy and management of the data from teratogenic processes.

Informatica pioneered the GUI diagram for building integration mappings, but 9.5 implements an integration optimizer which detects the optimal mapping rather than processing the diagram literally.

With each new release of a complex software product is the time-consuming and nerve-wracking routine of upgrades. 9,5 now provides automated regression testing to dramatically reduce the time and pain of upgrades.

In Information Lifecycle Management, 9.5 provides “Intelligent Partitions” to distribute data across devices hot, warm and cold storage.

Add to this, facilities for virtualization, replication and a slew of offerings for cloud-based applications – there is an inescapable logic for Informatica exploiting the new opportunities of Big Data.

Posted in Big Data, Research | Tagged , , , | 5 Comments

BI Is Dead! Long Live BI!

 

Executive Summary

We suggest a dozen best practices needed to move Business Intelligence (BI) software products into the next decade. While five “elephants” occupy the lion’s share of the market, the real innovation in BI appears to be coming from smaller companies. What is missing from BI today is the ability for business analysts to create their own models in an expressive way. Spreadsheet tools exposed this deficiency in BI a long time ago, but their inherent weakness in data quality, governance and collaboration make them a poor candidate to fill this need.  BI is well-positioned to add these features, but must first shed its reliance on fixed-schema data warehouses and read-only reporting modes. Instead, it must provide businesspeople with the tools to quickly and fully develop their models for decision-making.

 

Why BI Must Transition From The Past Century

In this avant-garde era of Big Data, cloud, mobile and social, the whole topic of BI is a little “derriere.” BI is a phenomenon of the previous two decades, but the worldwide market for BI tools (not including services or surrounding technologies such as data warehousing and data integration) is greater than $10 billion per year. It’s still a very significant market and will, for some time, dwarf spending on the Big Data top gun, Hadoop, which is an open source distribution.  The lion’s share of revenue in the Big Data market will continue to be hardware and services, not software, unless you consider the application of existing technologies, especially database and data integration software, as part of Big Data.

Because BI is still alive, it’s worth revisiting some concepts I wrote about a few years ago — abstraction and model-driven design in BI. In the proto-BI days, when it was still known as Decision Support Systems (DSS), high-level declarative languages were used to create business models that could handle user-entered (or background-loaded) data. These systems were interactive, allowing for what-if analysis, testing sensitivity of the models and even the application of advanced statistical techniques.

Data warehousing changed all of that. Performance of relational databases for interactive   reporting from static schema was a challenge. Adding data to the warehouse interactively through iterative analysis was out of the question. Adding entities to the schema to accommodate new models required data modeling and schema changes, often reloading data and associated testing before implementation.

As data warehouses became the preferred way to provide data for reporting and analysis, BI vendors that focused on the read-only nature of data warehousing prospered and dominated the field. Business modeling fell from favor, or, more precisely, fell to Excel. The only exception was found in budgeting and planning packages, mostly based on Multi-Dimensional Online Processing (MOLAP) databases, predominantly Essbase and Microsoft BI.

 

The Era Of Big Data Means Business Analytics

The major BI vendors of the late 1990s through today are BusinessObjects (SAP), Cognos (IBM), SAS, Hyperion (Oracle), Microsoft and Microstrategy. At these six vendors (who together comprise more than two-thirds of the BI market), only about 20 percent of combined revenue came from tools that provided modeling capabilities. The rest came from strictly read-only data warehouses and marts[i].

Now that the era of Big Data is here, modeling has to move up a notch. While newer technologies are in play for capturing and massaging Big Data, the need for business analytics is greater than ever. In the past, the BI calculus more or less ended with informing people. Today, BI must enable actions and decisions that are supported by deep insight. Excel may be a good container for interacting with models, but its internal capabilities aren’t sufficient. The need for BI tools will not disappear, but it’s time to break the read-only mold.

 

Moving From Data To Decisions Through Business Modeling

Business modeling can be an imprecise term. But in general, it means creating descriptive replicas of a part of a business — such as assets, processes or optimizations — in terms that are consonant with the people (and processes) that use them. Usually, the goal is to render these models into computer-based applications. In general, a business model is created by someone who has certain knowledge about a process or function in the business. This can range from a single fact, such as how operating cash flow is calculated, to something as broad as how the manufacturing plants operate.

To be effective, businesspeople need more than access to data: They require a seamless process that lets them interact with the data and drive their own models and processes. In today’s environment, these steps are typically disconnected and, therefore, expensive and slow to maintain without the data quality controls from the IT department. The solution: a better approach to business modeling coupled with an effective architecture that separates physical data models from semantic ones. In other words, businesspeople need tools to address physical data through an abstraction layer that allows them to address only the meaning of data — not its structure, location or format.

Abstraction is applied routinely to systems that are somewhat complex and especially to systems that frequently change. A 2012 model car contains more processing power than most computers only a decade ago. Driving the car, even under extreme conditions, is a perfect example of abstraction. Stepping on the gas doesn’t really pump gas to the engine, it alerts the engine management system to increase speed by sampling and alerting dozens of circuits, relays and microprocessors to achieve the desired effect. These actions are subject to many constraints, such as limiting engine speed and watching the fuel-air mixture for maximum economy or minimum emissions. If the driver needs to attend to all of these things directly, he would not get out of the driveway.

Data warehouses and BI tools still rely on at least some of the business users’ understanding of the data models and semantics, and sometimes the intricacies of the crafting of queries. This is a huge barrier to progress. Businesspeople need to define their work in their own terms. A business modeling environment is needed for designing and maintaining structures, such as data warehouses and all of the other structures associated with it. It is especially important to have business modeling for the inevitable changes in those structures. It is likewise important for leveraging the latent value of those structures through analytical work. This analysis is enhanced by understandable models that are relevant and useful to businesspeople.

BI isn’t standalone anymore. In order to close the loop, it has to be implemented in an architecture that is standards-based. And it has to be extensible and resilient, since the boundaries of BI are more fuzzy and porous than other software. While most BI applications are fairly static, the ones of most value to companies are the flexible and adaptable ones. In the past, it was acceptable for individual “power users” to build BI applications for themselves or their group without regard to any other technology or architecture in the organization. Over time, these tools became brittle and difficult to maintain because the initial design was not robust enough to adapt to the continuous refinements and changes needed by the organization. Because change is a constant, BI tools need to provide the same adaptability at the definitional level that they currently do at the informational level. The answer is to provide tools that allow businesspeople to model.

 

Business Modeling Emerges As a Critical Skill

All businesspeople use models, though most of them are tacit or implied, not described explicitly. Evidence of these models can be found in the way workers go about their jobs (tacit models) or how they have authored models in a spreadsheet. The problem with tacit models is that they can’t be communicated or shared easily. The problem with making them explicit is that it just isn’t convenient enough yet. Most businesspeople can conceptualize models. Any person with an incentive compensation plan can explain a very complicated model. But most people will not make the effort to learn how to build a model if the technology is not accessible.

There are certain models that almost every business employs in one form or another. Pricing is a good example and, in a much more sophisticated form, yield management, such as the way airlines price seats. Most organizations look at risk and contingencies, a hope-for-the-best-prepare-for-the-worst exercise. Allocation of capital spending or, in general, allocation of any scarce resource is a form of trade-off analysis that has to be modeled. Decisions about partnering and alliances as well as merger or acquisition analysis are also common modeling problems. Models are also characterized by their structure. Simple models are built from data inputs and arithmetic. More complicated models use formulas and even multi-pass calculations, such as allocations. The formulas themselves can be statistical functions and can perform projections or smoothing of data. Beyond this, probabilistic modeling is used to model uncertainty, such as calculating reserves for claims or bad loans.

When logic is introduced to a model, it becomes procedural. Mixing calculations and logic yields a very potent approach. The downside is that procedural models are difficult to develop with most tools today and are even more difficult to maintain and modify because they require the modeler to interact with the system at a coding or scripting level; most business people lack the temperament or training, or both, to do so. It is not reasonable to assume that businesspeople, even the power users, will employ good software engineering technique, nor should they be expected to. Instead, the onus is on the vendors of BI software to provide robust tools that facilitate good design technique through wizards, robots and agents.

 

Learn From A Dozen Best Practices In Modeling

For any kind of modeling tool to be useful to businesspeople, supportable by the IT organization and durable enough over time to be economically justifiable, it must provide or allow the following capabilities:  

  1. Level of expressiveness. This must be sufficient for the specification, assembly and modification of common and complex business models without code; it should accommodate all but the most esoteric kinds of modeling.
  2. Declarative method. Such a method means that each “statement” is incorporated into the model without regard to its order, sequence or dependencies. The software, freeing modelers to design whatever they can conceive, handles issues of calculation optimization.
  3. Model visibility.  This enables the inspection, operation and communication of models without extra effort or resources. Models are collaborative and unless they can be published and understood, no collaboration is possible.
  4. Abstraction from data sources. This allows models to be made and shared in language and terms unconnected to the physical characteristics of data; it gives managers of the physical data much greater freedom to pursue and implement optimization and to improve performance efforts.
  5. Extensibility.   Extensibility means that the native capabilities of the modeling tool are robust enough to extend to virtually any business vertical, industry or function. Most of the leading BI tools are owned by much larger corporate parents, potentially limiting or directing the development roadmap of the BI offering, in alignment with the wider vision of the parent. Smaller BI and analytics pure-plays tend to have a truer vision about BI (until they are acquired). Because analysts who are thinking out of the box gain valuable insight, the BI tool cannot imposes vendor-specific semantics and functions, or lock analysts in and limit distribution of insights with expensive licenses.
  6. Visualization. Early BI tools relied on scarce computing resources, but with today’s abundance of processing power, an effective BI platform should include visualization. It is a proven fact that single-click visualization of models and results aids in understanding and communicating complicated models.
  7. Closed-loop processing. This is essential because business modeling is not an end-game exercise, or at least it shouldn’t be. It is part of a continuous execute-track-measure-analyze-refine-execute loop. A modeling tool must be able to operate cooperatively in a distributed environment, consuming and providing information and services through a standards-based protocol. The closed-loop aspect may be punctuated by steps managed by people, or it may operate as an unattended agent, or both.
  8. Continuous enhancement. This requirement is borne of two factors. First, with the emerging standards of service-oriented architectures, web services and XML, the often talked-about phenomenon of organizations linked in a continuous value chain with suppliers and customers will become a reality soon and will put great pressure on organizations to be more nimble. Second, it has finally crept into the collective consciousness that development projects involving computers are consistently under-budgeted for maintenance and enhancement. The mindset of “phases” or “releases” is already beginning to fray and forward-looking organizations are beginning to differentiate tool vendors by their ability to enable enhancement with extensive development and test phases.
  9. Zero code. In addition to the fact that most businesspeople are not capable of and/or interested in writing code, there is sufficient computing power at reasonable costs to allow for more and more sophisticated layers of abstraction between modelers and computers. Code implies labor, error and maintenance. Abstraction and declarative modeling implies flexibility and sustainability. Most software “bugs” are iatrogenic; that is, they are introduced by the programming process itself. When code is generated by another program, the range of programmatic errors is limited to the latent errors in the code generator, not the errors introduced by programmers.

10.Core semantic information model (ontology). Abstraction between data and the people or programs that access the data isn’t very useful unless the meaning of the data and its relationships to everything else are available in a repository.

11.Collaboration and workflow.  These capabilities are essential to connecting analytics to every other process within and beyond the enterprise. A complete set of collaboration and workflow capabilities supplied natively within a BI tool is not necessary, though. Instead, the ability to integrate (this does not mean “be integrated,” which implies lots of time and money) with collaboration and workflow services across the network, without latency or conversion problems, is preferable.

12.Policy. This may be the most difficult requirement of them all. Developing software to model business policy is tricky. For example, “Do not allow contractor hours in budget to exceed 10 percent of non-exempt hours.” Simple calculations through statistical and probabilistic functions have been around for over three decades. Logic models that can make decisions and branch are more difficult to develop, but still not beyond the reach of today’s tools. But a software tool that allows business people to develop models in a declarative way to actually implement policies is on a different plane. Today’s rules engines are barely capable enough and they require expert programmers to set them up. Policy in modeling tools is in the future, but it will depend on all of the above requirements.

 

Today’s BI Will Not Be Tomorrow’s BI

It is an open question whether BI has been, in the long run, successful or not. The take-up of BI in large organizations has stalled at 10 to 20 percent, depending on which survey you believe. I believe that expectations of broad acceptance of BI were overly optimistic and that the degree to which it has been adopted is probably at the right level for the functionality it delivered.

Will BI survive? Yes, but we may not recognize it. The need to analyze and use data that are produced in other systems will never go away, but BI will be wrapped in new technologies that provide a more complete set of tools. Instead of managing from scarcity of computing resources, BI will be part of a “decision management” continuum — the amalgam of predictive modeling, machine learning, natural language processing, business rules, traditional BI and visualization and collaboration capabilities.

Pieces of this “new” BI are already here, but within two to three years, it will be in full deployment.


[i] These numbers are representative estimates only and are based on our discussions with vendors and other published information. It is difficult to be precise with BI because the market leaders offer a wide variety of software and services, some for licensing and some embedded in non-BI products. In addition, BI itself has a number of fuzzy definitions. For the purposes of this discussion, we use a definition of query and reporting tools, including online analytical processing (OLAP), but exclusive of data warehousing, extract/transform/load (ETL), analytic databases and advanced analytics/statistics packages.

 Copyright 2012 Neil Raden and Hired Brains Inc

Posted in Uncategorized | Leave a comment

New World Order: Hadoop and Relational Databases

By Neil Raden Hired Brains Research  nraden@hiredbrains.com

Hadoop “data warehouses” do not resemble the data warehouse/analytics that are common in organizations today. They exist in businesses like Google and Amazon for web log parsing, indexing, and other batch data processing, as well as for storing enormous amounts of unfiltered data. Petabyte-size data warehouses in Hadoop are not data warehouses as we know them; they are a collection of files on a distributed file system designed for parallel processing. To call these file systems “a data warehouse” is misleading because a data warehouse exists to serve a broad swath of uses and people, particularly in business intelligence, which is both interactive and iterative.

MapReduce is a programming paradigm with a single data flow type that takes the form of directed acyclic graph of operators. These platforms lack built-in support for iterative programs, quite different from the operations of a relational database. To put it in layman’s terms, there are things that Hadoop is exceptionally well designed for that relational databases would struggle to do. Conversely, a relational database data warehouse performs a multitude of useful functions that Hadoop does not yet possess. Hadoop is described as a solution to a myriad of applications in web log analysis, visitor behavior, image processing, search indexes, analyzing and indexing textual content, for research in natural language processing and machine learning, scientific applications in physics, biology and genomics and all forms of data mining. While it is demonstrable that Hadoop has been applied to all of the domains and more, it is important to distinguish between supporting these applications and actually performing them. Hadoop comes out of the box with no facilities at all to do most of this analysis. Instead, it requires the application of libraries available either through the open source community at forge.com or from the commercial distributions of Hadoop, or by custom development by scarce programmers. In no case can these be considered a seamless bundle of software that is easy to deploy in the enterprise. A more accurate description is that Hadoop facilitates these applications by grinding through data sources that were previously too expensive to mine. In many cases, the end result of a MapReduce job is the creation of a new data set that is either loaded into a data warehouse or used directly by programs such as SAS or Tableau.

The MapReduce architecture provides automatic parallelization and distribution, fault recovery, I/O scheduling, monitoring, and status updates. It is both a programming model and a framework for massively parallel processing of large datasets in batch across many low-end nodes. Its ability to spread very large jobs across a cluster of ordinary servers is perhaps its best feature, certainly its most unique feature. In addition, it has excellent retry/failure semantics. MapReduce at the programming level is simple and easy to use. Programmers code only Map() and Reduce() functions and are not involved with how the job is distributed. There is no data model, and there is no schema. The subject of a MapReduce job can be any irregular data. Because the assumption is that MapReduce clusters are composed of commodity hardware, and there are so many of them, it is normal for faults to occur during a job, and Hadoop handles a few faults automatically, shifting the work to other resources. But there are some drawbacks. Because MapReduce is a single fixed data flow, has a lack of schema, index and high-level language, one could consider it a hammer, not a precision machine tool. It requires data parsing and fullscan in its operation; it sacrifices disk I/O to avoid schemas, indexes, and optimizers; intermediate results are materialized on local disks. Runtime scheduling is based on speculative execution, considerably less sophisticated than today’s relational analytical platforms. Even though Hadoop is evolving, and the community is adding capabilities rapidly, it lacks most of the security, resource management, concurrency, reliability and interactive capabilities of a data warehouse. Hadoop’s most basic components – the Hadoop Distributed File System (HDFS) and MapReduce framework – are purpose built for understanding and processing multi-structured data. The file system is crude in comparison to a mature relational database system which when compared to the universal use of SQL is a limiting factor. However, its capabilities, which have just begun to be appreciated, override these limitations and tremendous energy is apparent in the community that continues to enhance and expand Hadoop.

Hadoop MapReduce with the HDFS is not an integrated data management system. In fact, though it processes data across multiple nodes in parallel, it is not a complete massively parallel processing (MPP) system. It lacks almost every characteristic of an MPP system, with the exception of scalability and reliability. Hadoop stores multiple copies of the data it is processing, and the failure of a node can rollover to another node with the same data, though there is also a single point of failure at the HDFS Name Node, which the Hadoop community is looking to address in the long term (Today, NetApp provides a hardware-centric fail-over solution for the Name Node). It lacks security, load balancing and an optimizer. Data warehouse operators today will find Hadoop to be primitive and brittle to set up and operate, and users will find its performance lacking. In fact, its interactive features are limited to a pseudo-relational database, Hive, whose performance would be unacceptable to those accustomed to today’s data warehouse standards. In fairness, MapReduce was never conceived as an interactive knowledge worker tool, and the Hadoop community is making progress, but HDFS, which is the core data management feature of Hadoop, is simply not architected to provide the services that relational databases do today. And those relational database platforms for analytics are innovating just as rapidly with:

• Hybrid row and columnar orientation.

• Temporal and spatial data types.

• Dynamic workload management.

• Large memory and solid-state drives.

• Hot/warm/cold storage.

• Almost limitless scalability.

The ability to provide almost endless scalabilty and parallelism for batch jobs is a unique distinction for Hadoop. The only platforms that were previously able to provide this sort of massive parallelism were relational databases, and they are not limited to batch operation. So what happens next? My guess is that Hadoop survives and flourishes as the first responder to incoming data, making sense of it and handing it off to other proceses, including data warehouses, in whatever form they take. However, unless petabytes of historical data are needed for interactive analysis, Hadoop will be the favored location for storing history. The Hadoop community, and its imitators and competitors will play an important role in analytics, but not the only role.

Posted in Big Data, Decision Management, Research | Tagged , , , , , , , , , , , | 2 Comments

Decision Management on Steroids: Will Big Data Tools Trump Rules?

Decision Management on Steroids:Will Big Data Tools Trump Rules?

Can the ability to extract meaning and sentiment from previously unconventional data sources reorient the role of business rules?

In a typical customer application, scoring models are created by finding patterns and relationships from attributes using various statistical techniques and the customer records are scored for propensity or eligibility. Rules then apply policy – what to do with the scored records.

But the promise of Big Data is to deliver insight not possible with tools of even five years ago. Does the newer technology that can, to some extent, detect sentiment and propensity, examine relationships of 100’s of millions of ID’s, construct path analysis in real-time, eliminate the need for rules? In other words, does the data speak for itself?

On the other hand, can quantitative methods really implement policy or are we just in the early stages of the hype cycle. Is attended or unattended quantitative analysis of Big Data a sufficient model for implementing policy?

New information usually comes from unexpected places. Big leaps in understanding arise from unanticipated discoveries—but “unanticipated” does not imply a sloppy or accidental process. On the contrary, usable discoveries have to be verifiable, but the desire for knowledge requires a drive for innovation and the exploration of new sources of information that can alter our perceptions and outlooks. Unraveling the content of “big data” that lacks obvious structure begs for some new approaches. Big data is positioned to provide that insight.

But data doesn’t speak for itself. At some point, there will be expert failure: Solutions require data, but may degrade with too much. The largest annoyance is the overblown concept of the data scientist. Data scientists, in the traditional sense, are academic researchers. In the Big Data industry they apply existing algorithms and techniques to data from traditional and new sources. Unfortunately, they usually report to people who have no idea what they are talking about.

In subsequent research I will describe the changes in “predictive” modeling brought about by Big Data and draw some conclusions about how it affects the construction, delivery and uses of decision management.

 

Posted in Big Data, Decision Management, Research | Tagged , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

NoSQL: What’s the Buzz About Graph Databases?

I attended the NoSQLNow conference in San Jose and had the opportunity to  speak one-on-one with a number of principals of NoSQL database concerns, including Emil Eifrém of Neo Technology. For those of you who aren’t familiar with the concept, graph databases are based on an arrangement of edges, properties and nodes with relationships between them, not rows and columns with primary and foreign key relationships. In practice this allows them to traverse graphs of information more efficiently than reading pages of data and finding the rows that match the query.

Interestingly, Graph Theory (Euler) predates Set Theory (Cantor/Dedekind) on which the relational model is based by over 150 years. Of historical interest, the development of the relational database at IBM was conceived as a method to get data out of databases, not get data in. This turned out, in the early 70’s to be a problem for IBM so they redirected Ted Codd’s efforts to making relational databases fast transaction processors. Enter the concept of “normal form,” a horribly misleading term that has side-railed a zillion projects by data modelers with a thin understanding of the concept insisting on “normal” purity no matter the cost. The rest is history. The whole DSS/BI/Analytics movement grew out of the fact that the relational databases were poor performers at non-transaction processing.

According to the NoSQL movement, and I’m not entirely convinced of this but I’m listening, the rigidity of a physical schema needed in relational databases is their undoing in an era of agility, speed and volume.  Here is a quote from Wikipedia:

Compared with relational databases, graph databases are often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad-hoc and changing data with evolving schemas. Conversely, relational databases are typically faster at performing the same operation on large numbers of data elements.

The key characteristic of graph databases is this notion if index-free adjacency, meaning, each node knows the location of its adjacent nodes so an index is unnecessary. Obviously, a semantic interpretation of this is that the graph is a representation of relationship. Paradoxically, there are no relationships in a “relational” database, they are applied at run time from the query.

Emil seems to think that graph databases are superior to RDB in every way and will eventually supplant them. The concept that RDB are based on sound and proven mathematical principles is interesting, but relational theory is only 50 years old. Graph theory goes back to the 17th century!

This all sounds good, but there are only a billion applications out there that rely on things NOT changing and for which the relational model is well-suited. As William McKnight said in his keynote, look to NoSQL as additive not replacement technology.

For those old enough to remember, lots of database systems in the 80’s tried to get on the database bandwagon by getting certified as relational, and we’re already seeing this with graph databases. Just to name a few, the FlockDB for Twitter is only a thin graph database on top of MySQL and therefore lacks index-free adjacency. Microsoft’s Trinity does not store graphs natively.

Most of the NoSQL vendors are pushing the notion that their products are so much less complex than the current RDB’s. This is undoubtedly true, but they probably lack so much functionality that’s been built on the RDB model over the decades. In fact, RDB’s were pretty simple in the beginning too.

To sum it up, most of the NoSQL products I’ve seen are clearly aimed at high-speed, low-complexity transaction or streaming processing, usually with unconventional data. They are not analytical tools. But they could provide a very useful, even indispensible role in analytics: getting meaning into the process.

There has been a schism between semantic technology and graph databases, probably because the former still can’t figure out how to market their technologies while simultaneously trying to prove how smart they are. Their message in muddled and their most visible promoters are not, shall we say, enterprise ready. Oddly, the notion of a triple is fundamental to graphs, but graph database vendors are steering clear of the whole ontology/RDF/OWL thing and finding their customers in other pursuits. Good move.

Posted in Big Data | Tagged , , , , , , , , , , , , , , | Leave a comment