Forked SQL: Informatica Gets It

By Neil Raden

About fifteen years ago, Microstrategy cofounder Sanju Bansal told me, “SQL is the best hope for leveraging the latent value from databases.” Fifteen years later, it’s extraordinary how correct Bansal was. Microstrategy is still a robust, free-standing Business Intelligence company while most of the proprietary multidimensional databases have disappeared. But what about the next fifteen years?

At least for business analytics, SQL is under attack. So much so, that there is an entire emergent market segment called NoSQL. If SQL itself is under siege, what about the myriad technologies that in one way or another are part of the SQL ecosystem like Informatica? Are they obsolete? Will we need to throw away the baby with the bathwater?

This whole dustup has been brewing for a decade or more. Since we started using computers in business 60+ years ago, the big machines were managed by a separate group of people with specialized skills, now generically referred to as IT. Though separate and decidedly non-businesslike, IT eventually became bureaucratic, fixed in its mission to control everything data. Even SQL was adopted very slowly, but it is solidly the tool of choice for most applications.

About ten years ago, though, a renegade group of people I called “the pony-tail-haired guys” (PTH for short) appeared on the scene with their externally-focused web sites and gradually, development tools, methods and monitoring software. At first, IT paid no attention to them because they didn’t interfere with the inwardly-focused enterprise computing environment, but as the perceived value of “e-business” developed, a great deal of friction and turf war fighting erupted. Computing bifurcated inside the firewall.

The PTH preferred web-oriented tools, open source software and search. That’s why Big Data/Hadoop and NoSQL are so divergent from enterprise computing. Different people, different applications, different brains.

But when it comes to Business Analytics, the lines are not so clear. The PTH found, just as enterprise people did (reluctantly), analytics is key to everything else. But while enterprise apps measure sales and revenue, the PTH guys are looking at really strange things like sentiment analysis. Today, I can download a Fortune 500 general ledger to my watch, but things like sentiment analysis look at 100′s of 1000′s times more data. Loading into a relational database and analyzing the data with a language designed for set operations and transactions just doesn’t work. So is there a justification for something other than SQL for this? Of course there is.

But the PTH guys, now that they’ve grown up a little, are starting to show the same sclerotic tendencies as their IT colleagues, assuming that their tools and methodologies are the ONLY tool for analytics and SQL needs to be put in the dustbin. That’s just silly. Existing data warehouse, data integration and data analysis and presentation tools are well-suited to lots of tasks that aren’t going away any time soon, though methodologies and implementations are in dire need for renovation, something I call a Surround Strategy as opposed to the ridiculous outdated idea of the Single Version of the Truth.

Here is where Informatica enters the picture. With all the breathless enthusiasm over Big Data, one thing is often lost in the rush: no fundamental laws of physics concerning data integration have been altered. This places Informatica squarely in the middle of every trend grabbing headlines today: big data, social analytics, virtualization and cloud.

Any type of reporting or analysis, whether through traditional ETL and data warehousing, Hadoop, Complex Event Processing or even Master Data Management deals with “used data,” meaning, data created through some original process. All used data has one attribute in common – it doesn’t like to play with other used data. Data extracted from a primary source typically has hidden semantics and rules in the application logic so that, on it’s own, it often doesn’t make sense.

This has been the most difficult part of assembling useful data for analysis historically, and with the entrance of mountains of non-enterprise data, the problem has only grown larger.

Luckily for the fortunes of Informatica, this is a big opportunity and they have stepped up.

For in depth descriptions of the following Big Data, cloud, Hadoop SaaS innovations coming from Informatica, refer to their materials at http://www.informatica.com. Briefly, they include:

Hparser, at facility to visually prepare very large datasets for processing in Hadoop, saving developer/analyst a substantial amount of time from what is mostly hand coding.

Complex Event Processing (CEP) which isn’t new for Informatica, but given the expanded complexity of data integration in the Big Data era, they have incorporated CEP into their own platform to detect and inform of events that can affect the performance, accuracy and management of the data from teratogenic processes.

Informatica pioneered the GUI diagram for building integration mappings, but 9.5 implements an integration optimizer which detects the optimal mapping rather than processing the diagram literally.

With each new release of a complex software product is the time-consuming and nerve-wracking routine of upgrades. 9,5 now provides automated regression testing to dramatically reduce the time and pain of upgrades.

In Information Lifecycle Management, 9.5 provides “Intelligent Partitions” to distribute data across devices hot, warm and cold storage.

Add to this, facilities for virtualization, replication and a slew of offerings for cloud-based applications – there is an inescapable logic for Informatica exploiting the new opportunities of Big Data.

About these ads
This entry was posted in Big Data, Research and tagged , , , . Bookmark the permalink.

5 Responses to Forked SQL: Informatica Gets It

  1. Claire McFarlen says:

    Neil,
    Your article started out well and I was expecting to find out about some ground breaking work for analytics that would allow business folks to leverage their knowledge of the business into new tools that really are business user friendly. You are correct that too often IT has a closed fist when it comes to data (there’s lots of blame on the business side for this too, but that’s another matter). But as I read further your article appears to drift off into technical jargon about a few technical Informatica improvements that are geared towards IT but have little to offer for the business user. Not surprising, considering that an upgrade is only an upgrade and the only way to really address the user issue would be to start over with a completely new framework.

    I worked with many groups of what you call the PTH crowd and, for many of them, their love of open source (code for free tools, easy to use, fast, possibly untested and impossible to integrate with existing technologies) definitely insulated them from other tech’s – and trust me when I say that from the outset, they believed their tools and approach were they only way to go, this did not evolve over time, it was built into their methodologies and mindset from the beginning. The real problem though was that the speed with which they could get objects up and running made them appear to understand far more about the business than they actually did. By and large their development tools had all kinds of limitations and worked off predefined object parameters that quickly became very difficult to change to meet the needs of the business. Like most product upgrades, they could be enhanced by adding more bells and whistles, but if the core product was designed to meet a particular need then that would continue to define how the product could or could not be modified over time.

    Many tech’s originally also thought the PTH guys would deliver some amazing results with intuitive query tools that would blow our minds and allow users to get at the data in any way they wanted, but it soon became obvious that their approach was no more likely to hit that target than was canonical IT, because at the heart of it remains some query language that has to be technically robust enough to access highly technical databases. I think Sanju’s statement about SQL being the best hope was not a compliment, it was a lament. SQL was never intended for users and has always been a clunky language with high overhead and until something completely new comes along users will continue to be at the mercy of the IT shop, no matter what technically ideological group rules the roost.

    • nraden says:

      Claire, I think your comment diverged a bit too. Anyway, the PTH guys open source tools were generally not easy to use. Witness Hadoop. And as for Sanju Bansal’s comment about SQL, he did not mean that SQL coding was meant for the masses. Microstrategy generates SQL from A metamodel.

  2. interesting piece of information, I had come to know about your web-page from my friend pramod, jaipur,i have read atleast eight posts of yours by now, and let me tell you, your blog gives the best and the most interesting information. This is just the kind of information that i had been looking for, i’m already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a million once again, Regards.

  3. attain says:

    Thanks for providing the best information it’s very useful for INFORMATICA learners.we also provide the best INFORMATICA Online Training
    .

  4. Pingback: Big Data, Sporks, and Decision Frames | Guest Blogs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s