Archive for the 'General BI' Category

DAMA-PS Session: Forget Federated

Thursday, March 2nd, 2006

I had listen to Stephen A. Brobst, CTO at NCR Teradata about “Best Practices in Meta Data Management and Enterprise Data Warehouse Deployment” this morning. I was hoping to grok some details about new metadata management techniques, but what the presentation was much more focused on the “deployment/architecture” side. That being said, I think it was MUCH more useful for the the audience as a whole to cover the deployment in as much detail; I personally didn’t find it all that groundbreaking.

Summary: Build an EDW based on a 20,000 ft blueprint, integrate your data into a single location (say, perhaps, a massively scalable single system image shared nothing database) using a relational schema, build a subject area at a time, and star schemas only when performance is an issue. Clearly the architecture is advocating the Oracle/Teradata/etc view of the world that says ONE GIGUNDOUS RELATIONAL warehouse with various semantic (dimensional) and performance (materialzed denormalized structures) views into that world. I’m not being sarcastic; I model most of my customer implementations off the CIF and think it’s a good approach from a TCO perspective.

The key takeaway remains: if the data isn’t integrated, it won’t be useful. An EDW promotes this, but it’s not the only way. You start to realize more value as the richness of relationships and entities increase in the integrated view.

One of the things I have a beef with is that the “Semantic Layer” (metadata for dimensional modeling, data freshness, etc) can be used instead of ETL and building physical star schemas. I make no issue that the reporting tools, database, and platform can adequately do this, but rather how is it managed? For example, if I define my dimension LOGICALLY, and let the REPORTING tool build a cross tab based on that dimension that should work. BUT, how is that managed as part of the information lifecycle? I’ve seen VERY few tools that can tell you: on day X I was generating this dimension this way, and provided it to these 20 reports using this particular method. ETL tools building PHYSICAL structures are usually managed (think source code, or some sort of build and release system). In other words, if you see a report based on the “CUSTOMER COUNTRY” and a date one can say PRECISELY how that was generated because there’s a managed solution (ETL, Database Structures) in a repository somewhere that tell you what your logical and physical ETL were at that point in time. Good luck doing that when someone is able to change this on the fly in a clever “report writing” tool.

Sorry Discoverer gurus… I’ve never been a fan of “faking” real dimensional structures with clever SQL generation. Not because it doesn’t work or won’t peform, but the management life cycle of reporting tools and configuration are ages behind the ETL tools. Not saying they’re great, but… you get the point.

Overall I enjoyed Stephen’s presentation. GREAT SPEAKER, actually! My favorite line from the day: “Data Marts are kind of like rabbits in the data center. They just start multiplying!.” :)

DAMA-PS March 2nd

Wednesday, February 15th, 2006

I’ve been a member of DAMA-PS since I moved to Seattle in 2004. I wasn’t able to make last years DAMA day but I’m looking forward to DAMA day 2006 in Seattle. If you’re in the area, check it out:

Morning Keynote:
Stephen Brobst
Best Practices in Enterprise Meta Data Management
and Data Warehouse Deployments

This session provides a taxonomy of data warehouse
topologies and discussion of best practices for enterprise
data warehouse deployment. Characterization
of performance, total cost of ownership, and business
functionality will be used to describe tradeoffs
among various choices in topology and architecture
deployment. Implementation techniques using integrated,
federated, and data mart architectures will
be discussed as well as deployment of four distinct
classes of meta data (end user, design technical and
semantic) which will be described in the context of
creating a single source of truth for enterprise decision
making across multiple lines of business- and
functionally-oriented organizational boundaries.

Afternoon Keynote:
David Loshin
Master Data Management and Data Standards:
Building From Consensus

The hot topic these days is master data management,
or being able to capture and manage reference
data as a shared corporate asset that
feeds into both production operational and analytical
applications. However, in the absence of
an agreement as to the semantics of the data
objects being “master-managed,” the perpetual
problems of misunderstanding the data will infiltrate
your applications. Developing a master
data management program provides an opportunity
for the business and technical teams to
discuss and agree to standards for the structure,
form, and most importantly, meanings of the
data elements to be managed. In this session,
we’ll discuss using a data standards approach
to successfully support a master data management
program.

Mark Madsen ETL evaluation slides

Wednesday, February 8th, 2006

Mark Madsen, who has an excellent blog on various DW/BI topics, has posted slides from his Portland DAMA presentation on selecting ETL vendors. The slides are EXTREMELY informative and I suggest anyone considering doing some ETL work to browse through his work. Mark, thanks for such an excellent contribution, and I’m sorry I missed your presentation! I believe Mark works with TDWI on ETL tool topics, so consider signing up for one of his sessions.

My favorite from the slides, which had me laughing out loud, was the image to describe the proclomation that EII can be a “virtual data warehouse” on page 12! Excellent!

Must have IE to evaluate SQL Server?

Monday, November 28th, 2005

Microsoft is spending millions upon millions to launch and promote their new SQL Server 2005 release. I’m guessing they want every developer and nerdy IT type to check it out. They want to get into the VLDB and HA corporate data centers, and claim some of those vi using, I can write x86 assembly if I want to, firefox using, developers and DBAs.

The irony?

10% of the web surfing population won’t be able to evaluate it because the SQL Server 2005 homepage doesn’t load with Firefox.
http://www.microsoft.com/sql/default.mspx

Any other Firefox users able to load the page? Or is this another example of “Drink the MSFT koolaid or be gone with you!”?

Voted NUMBER ONE!

Monday, November 21st, 2005

Working daily with people who are trying to measure and understand their world through the use of technology and BI methodologies I often hear lots of “things” that are important to determine.

  • What is this years top 5 products and what is their annual sales growth for the last 5 years?
  • Which company division has the most profitable customers, and which division has least profitable customers?
  • What time of day, in a registered website visitors home time zone are pages viewed on our website split by category?

In other words, there are some very specific things people want to know and brag about both within the company and externally to investors, analysts, and the media.

This predisposes me to question numbers I hear anywhere. What’s the qualification, what little keyword allows this company to say they are the top in their cateogory? Company XYZ is the Number 1 in Sales (in Asia Pacific small to midsized healthcare providers not owned by government and groups exceeding 1billion market cap for fiscal year 2003). We’ve all seen it…

One of my online music stations had a refreshingly simple claim to fame today that made me laugh out loud:

“Total Country. Rated #1 amoungst people who really like us!”

A refreshingly honest figure!

Ingres sails from Computer Associates

Monday, November 7th, 2005

I’ve just started playing with Ingres recently (last 12 months). It’s a powerful DBMS that was released under an Open Source license last year. From what I gather about the history of the database it has been kind of a “hot potato” being passed from university to company to company to company, etc.

Feature for feature Ingres appears to be the most advanced Open Source database available. However, since it has been released under the CATOSL it has not resembled a community driven OSS project. There is still no public access to the source code repository, and as far as I know, there has not been source contributions from anyone outside of CA. The CATOSL is a “funny” OSI approved license that I think also hinders the uptake of Ingres.

However, all that could change, starting today.

A venture capital firm has purchased “Ingres” from CA and launched a company focusing entirely on the Open Source database. This company has an opportunity to capitalize on a starting point most OSS projects could only dream of (starting with a product that is deployed with mission critical applications at more than 5000 customer sites). That’s just where they start though… their future must include turning Ingres into a full scale Open Source project and community. This means public discussion forums, public source code control, welcome third party contributors, peer to peer information sharing, user based support, etc. I think Ingres (company and project) would also be VERY well served to trade the off color CATOSL license for a commercial friendly OSI approved license.

Welcome Ingres, Inc. to the marketplace! It’s an interesting one with Oracle, Microsoft, and IBM all providing “free” versions of their DB now and passionate communities in the MySQL and PostGres projects.

As a die hard Oracle consultant I need much more information to draw conclusions about Ingres… I’ve been in touch with CA and Ingres, Inc. I hope to provide more information and a more detailed evaluation as time permits. Stay tuned for more!

Thoughts on “BI for the masses”

Monday, October 10th, 2005

Oracle has a star database. As Charles Phillips refers to the database, it’s the 747 of databases. The products that the Oracle Data Warehousing/Business Intelligence teams pump out are quite capable and are feature rich. There is little that I can NOT provide for my customers using this stack of powerful tools (Oracle DB, Oracle Warehouse Builder, Oracle Discoverer/Portal/BI Beans).

That being said, I’ve realized how inaccessible these tools are for “the masses.” The qualified, smart, analytical masses that need easy to use tools to build help them collect, analyze, and report on their organizations information. They are complicated, require rather extensive knowledge of “Oracle-isms” etc. To date, there are very few BOOKs on any Business Intelligence specific Oracle product. Books reflect a large network of solution providers and consultants. ie, other providers who have picked their preferred tool and are committing to learning, using, and teaching it to customers. Large communities of providers, training resources, and books reflect a support network and makes uptake of a technology MUCH MUCH easier for customers. They don’t have to learn from the manual which is VERY difficult… They can learn from the distilled knowledge of others, in a more participatory manner.

From a idealogical perspective, I’m not exactly drawn to Microsoft products. However, they are having significant success in building this ecosystem around their BI/SQL Server offering. While there exists NO BOOK on Oracle Warehouse Builder and only ONE BOOK on Discoverer here is a list of the scheduled books for Analysis Services 2005 release in BETA.

That’s right, there are 11 books being written about similar Microsoft products while their product is still in BETA! What does the Oracle BI/DW community think about this? I REALLY REALLY have to get comments going on my site… :(

Pentaho Milestone 2 release

Sunday, August 7th, 2005

Since I probably piqued some interest with this blog, I figured I should post an update…

The folks at Pentaho have released some actual software. I’m head deep in an OWB Paris project so I’ve had ZERO time to have a look. I’d love for anyone who’s had a look to email me and let me know their impressions.

From their release briefing:

Using this release, you will be able to experience the streamlined install process and interact with a number of components and samples.

  • Reporting
    how to run reports, burst different content to different users, and parameterize reports.

  • Business rules
    how to include and use business rules in the creation and delivery of content.

  • Email
    how to send the results of a business rule or report creation to an email address, and how to do email bursting.

  • Printing
    how to print a report to a selected printer, how to do batch printing, and how to print bursting (applying different report parameters to individual printers).

  • Workflow
    how to initiate a workflow and pass parameters to it.

  • Bursting
    how to deliver customized versions of a generic report to different email addresses or printers

  • Scheduler
    how to schedule the actions of the Pentaho BI Platform

  • Web Services
    how to access the actions of the Pentaho BI Platform using web services

  • Navigation
    how to organize and describe content to users using Java Server Pages or portlets  

  • Many of the visual features such as wizards - you may have heard discussed or seen demonstrated are not scheduled for delivery until the next milestone release. Please bear this in mind as you use the product.

Open Source BI - I like Pentaho

Friday, June 10th, 2005

Business Intelligence software, databases, and their supporting hardware are expensive. I mean really, really expensive (hundreds of thousands to millions of dollars). Many people working in the Business Intelligence/Data Warehousing fields have seen their “operational application” colleagues adopting open source solutions (Linux, JBoss, Eclipse, Apache, etc.) but have seen little attention paid to the software required to build and deliver Business Intelligence. That is beginning to change.

I’ve blogged about this before, specifically my experiences with downloading and testing Mondrian, an open source ROLAP server written in Java. It appears as if there is some gaining momentum and maturity of projects suitable for BI in the Open Source(OS) world. I’ve felt for some time that the open source community had not embraced BI in quite the same way they have other applications of technology. It is, in earnest, a technology stack to make bigger companies bigger and smart companies smarter. While these precepts aren’t in opposition of open source ideals, they aren’t what typically motivates communities of developers to band together to make software for free (ie, change the world, provide a framework used by 10,000 websites, etc.).

The state of open source BI was relatively slim not too long ago. There were a variety of possible toll sets one could use for ETL (Clover, Enhydra Octopus), some initial OLAP components (Mondrian, JPivot), some portal frameworks for dashboards (JetSpeed, JBoss Portal), and some databases with maturity for DW situations with smaller volumes (MySQL, Postgres). Things have been heating up this past year, and we should review whats going on in the Open Source BI realm. The lead is buried, make sure you check out Pentaho at the bottom.

CA’s Open Source release of Ingres
Albeit a funny OSI approved license (there are many provisions which will scare away the OS purists, and make others at least think twice about including it in their products or service) Ingres is officially open source and free. Ingres has some pretty significant “enterprise” features including replication, partitioning, and “in the works” linux clustering (a la RAC). This is great news because Ingres is a rather mature database and is better suited for large DW volumes than MySQL and PostGres. It is noticeably (and perhaps critically) lacking the vibrant community required to increase uptake. At this point it feels like CA is still the only one “interested” in Ingres. This might change, but I believe the funny CATOSL has hindered acceptance from open source communities.

Netezza/DATAllegro are using open source
These two providers of DW appliances are using open source databases as part of their solution. It’s a mixed technology stack, which means that unless you purchase the appliances you will benefit from none of the work that these two companies have put into their implementations. One uses Postgres, the other uses Ingres. There must be quite a bit of technology surrounding it to make it actually work for corporate DW environments. Netezza is actually doing rather well I believe, and some of the bigger vendors are starting to “see them on the radar” as a player in the space.

GreenPlum (aka Metapa) takes another shot
When Metapa wasn’t getting the traction with marketing their inexpensive proprietary Clustered DB implementation they figured they needed something to get more traction. Open Source is powerful enough that even a few years into the hype it still attracts attention. They relaunched themselves as an Open Source solution and are sponsoring the BizGres project (a few extensions to PostGres that are useful for BI environments) along with allowing the single instance version of their product to be used for free. I don’t think they’ll get the OS community embrace they desire because people are discerning these days; the only interesting work GreenPlum is doing is related to their MPP and shared nothing clustering technology which is very much NOT open source. I don’t think they’ll get the OS thrust they expected, because they are only opening their kimono an inch, not even a halfway mark.

Mondrian/JPivot releases
These two projects underwent new releases this year that provided the most visible part of an open source DW/BI system their legs. While not comparable to commercial OLAP interfaces they are certainly suited for ISV/Developers to embed in their application. These are great components for including in a project, and if your report consumers don’t really care to write their own reports (a la graphical report builder) and just want to pivot and page this could be an excellent, inexpensive solution.

BIRT and JasperReports are actually pretty good
Two commercially backed (one by Actuate, the other by JasperSoft) projects that are building the basis for business quality reports. Don’t turn off your Crystal installation yet because these both have a way to go, but they’re improving at a steady pace.

Pentaho Nation
This is truly the most exciting thing I’ve found in the Open Source BI space, and they’ve just begun their work so I’m running on faith at this point. Industry veterans who are passionate about BI and open source have pooled their minds and money (they’ve made $$ from previous entrepreneurial activities) to build a pure, 100% open source distribution for BI. They are collecting various open source projects, building their own components and releasing the whole thing as open source. A partial list of the projects they are planning (no official distro yet): Mondrian OLAP server, JPivot, Firebird RDBMS, Enhrydra ETL, Shark and JaWE, JBoss, Hibernate, JBoss Portal, Weka Data Mining, Eclipse, BIRT, JOSSO, Mozilla Rhino.
The company will follow in RedHat footsteps and make money on support, training, and consulting. Their plans are ambitious, but they are focused on assembling and configuring all these disparate projects into a comprehensive platform that will be at least comparable to the “big boys” at Hyperion, Cognos, Microstrategy, etc.


They are engaging the community, clearly understand the need in the space, and are committed to the ideals of getting paid for solutions instead of software. They are certainly strong in the presentation, dashboard, BPM/workflow, OLAP end of the spectrum but don’t appear to be including much in the ETL/DW end (there is some, but it appears to be for data movement and loading as opposed to building a DW). I’m not sure if it’s strategic or not, but it might makes sense. Most people adopting an open source BI platform for their reporting users will feel comfortable rolling their own ETL/DW for the backroom. It should also be noted that they haven’t made any releases yet, so what we’re seeing is all conceptual now but they’ll be rolling something out sometime in 2005. It appears as if the founders have a track record of “doing what they say they’ll do.”

What does this all mean?
There are three things that will happen as the Open Source and BI worlds start dating.

  1. Hardly anything for your current BI project and technologies. It is still emerging and is just now being utilized by early adopters.
  2. Cost pressure on the “big boys” will occur as the maturity of these components provide at least comparable options. Currently the small number of vendors along with their constantly increasing prices will show up as an area to be trimmed (ironic enough probably in a financial report provided inside the software in question). I don’t believe that it will have a significant impact, but will have a small impact over the next 3-5 years. It will also affect prices of BI OEM and inclusion of BI capabilities in vertical applications (more BI in existing products).
  3. Increased adoption of BI at small and mid sized business who can now afford to enter into the BI space. Previously inhibited by the exorbitant software costs business can now spend a few thousand dollars to start their foray into BI.

BI Data goes Interactive

Thursday, June 9th, 2005

Netflix has been using the old Amazon trick for a while now. You know, people who liked book X also like this rubber spatula. Infamous and lucrative…

Netflix has taken this to a whole new level. They’ve mined the suggestion data, cross referenced it with my friends data, and provided me recommendations based on what my friends liked and I like. As if that wasn’t hip enough, they’ve turned this mined information into an interactive experience:

Two things that keep people engaged in your site: relevant content (mined) and interactive media (participation). Netflix hit this one out of the park, in my opinion!