By Neil Raden
About fifteen years ago, Microstrategy cofounder Sanju Bansal told me, “SQL is the best hope for leveraging the latent value from databases.” Fifteen years later, it’s extraordinary how correct Bansal was. Microstrategy is still a robust, free-standing Business Intelligence company while most of the proprietary multidimensional databases have disappeared. But what about the next fifteen years?
At least for business analytics, SQL is under attack. So much so, that there is an entire emergent market segment called NoSQL. If SQL itself is under siege, what about the myriad technologies that in one way or another are part of the SQL ecosystem like Informatica? Are they obsolete? Will we need to throw away the baby with the bathwater?
This whole dustup has been brewing for a decade or more. Since we started using computers in business 60+ years ago, the big machines were managed by a separate group of people with specialized skills, now generically referred to as IT. Though separate and decidedly non-businesslike, IT eventually became bureaucratic, fixed in its mission to control everything data. Even SQL was adopted very slowly, but it is solidly the tool of choice for most applications.
About ten years ago, though, a renegade group of people I called “the pony-tail-haired guys” (PTH for short) appeared on the scene with their externally-focused web sites and gradually, development tools, methods and monitoring software. At first, IT paid no attention to them because they didn’t interfere with the inwardly-focused enterprise computing environment, but as the perceived value of “e-business” developed, a great deal of friction and turf war fighting erupted. Computing bifurcated inside the firewall.
The PTH preferred web-oriented tools, open source software and search. That’s why Big Data/Hadoop and NoSQL are so divergent from enterprise computing. Different people, different applications, different brains.
But when it comes to Business Analytics, the lines are not so clear. The PTH found, just as enterprise people did (reluctantly), analytics is key to everything else. But while enterprise apps measure sales and revenue, the PTH guys are looking at really strange things like sentiment analysis. Today, I can download a Fortune 500 general ledger to my watch, but things like sentiment analysis look at 100′s of 1000′s times more data. Loading into a relational database and analyzing the data with a language designed for set operations and transactions just doesn’t work. So is there a justification for something other than SQL for this? Of course there is.
But the PTH guys, now that they’ve grown up a little, are starting to show the same sclerotic tendencies as their IT colleagues, assuming that their tools and methodologies are the ONLY tool for analytics and SQL needs to be put in the dustbin. That’s just silly. Existing data warehouse, data integration and data analysis and presentation tools are well-suited to lots of tasks that aren’t going away any time soon, though methodologies and implementations are in dire need for renovation, something I call a Surround Strategy as opposed to the ridiculous outdated idea of the Single Version of the Truth.
Here is where Informatica enters the picture. With all the breathless enthusiasm over Big Data, one thing is often lost in the rush: no fundamental laws of physics concerning data integration have been altered. This places Informatica squarely in the middle of every trend grabbing headlines today: big data, social analytics, virtualization and cloud.
Any type of reporting or analysis, whether through traditional ETL and data warehousing, Hadoop, Complex Event Processing or even Master Data Management deals with “used data,” meaning, data created through some original process. All used data has one attribute in common – it doesn’t like to play with other used data. Data extracted from a primary source typically has hidden semantics and rules in the application logic so that, on it’s own, it often doesn’t make sense.
This has been the most difficult part of assembling useful data for analysis historically, and with the entrance of mountains of non-enterprise data, the problem has only grown larger.
Luckily for the fortunes of Informatica, this is a big opportunity and they have stepped up.
For in depth descriptions of the following Big Data, cloud, Hadoop SaaS innovations coming from Informatica, refer to their materials at http://www.informatica.com. Briefly, they include:
Hparser, at facility to visually prepare very large datasets for processing in Hadoop, saving developer/analyst a substantial amount of time from what is mostly hand coding.
Complex Event Processing (CEP) which isn’t new for Informatica, but given the expanded complexity of data integration in the Big Data era, they have incorporated CEP into their own platform to detect and inform of events that can affect the performance, accuracy and management of the data from teratogenic processes.
Informatica pioneered the GUI diagram for building integration mappings, but 9.5 implements an integration optimizer which detects the optimal mapping rather than processing the diagram literally.
With each new release of a complex software product is the time-consuming and nerve-wracking routine of upgrades. 9,5 now provides automated regression testing to dramatically reduce the time and pain of upgrades.
In Information Lifecycle Management, 9.5 provides “Intelligent Partitions” to distribute data across devices hot, warm and cold storage.
Add to this, facilities for virtualization, replication and a slew of offerings for cloud-based applications – there is an inescapable logic for Informatica exploiting the new opportunities of Big Data.