OWB PARIS : Early Review

Oracle was kind enough to invite bayon technologies to their World Headquarters in Redwood Shores, CA for testing and feedback on their next OWB release, codenamed Paris. During the week, a panel of expert partners and customers were asked to review new features, test these new features, and provide feedback to the OWB product team ahead of their release to the public later this year.

I will post “bite-size” reviews/observations about a new or improved area in OWB. This series will evaluate the new features as they would be applied in practical, ongoing development and management of traditional Business Intelligence and Decision Support Systems. At times it will be anecdotal, subjective, and opinionated. Other times they will merely convey acronyms and bullet lists of information. Whether they’re informative or rantings, I hope they are useful to you. Please feel free to be in touch with me with any comments/corrections to the series.

Howto: Make If/Then/Case with OWB Processflows

There are significant limitations to the implementation of Workflow processing in OWB 10g r1 and prior. There are methods available to developers that can add at least some amount of logic and conditionals such that it might be able to behave like an application. This section will enumerate how to use the facilities in OWB currently to build an IF/ELSE workflow pattern and a CASE workflow pattern.

One might ask, why use the tool for this particular processing pattern when it was not originally intended to operate as such? In a new BI environment there is a significant need to simplify the tools for developing the BI solution. Standardizing on one development tool and one platform has certain “soft” benefits in terms of project staffing, training, maintenance, extension. Why recruit and train developer on multiple tools and packages when one package provides 95% of the requirements and the remaining 5% can be accomplished with some “creative” application of the general tools available? Encapsulating nearly all application work into the OWB repository has significant benefit in regards to change management, metadata maintenance, and minimizing TCO by consolidating the development environment for BI to one tool.

OWB developers are anxious for improvements in upcoming releases of OWB. Engineers know you can never “wait for the perpetual next version” and while we all look forward to the improvements in future versions we need to be able to meet our project requirements with what we have today. Knowing some projects need standardization on OWB for lower TCO of BI we must be able to accomplish it with current versions.

Accomplishing and IF/ELSE and CASE workflow pattern is contingent on the ability of OWB to do the following:
Use the return value of that function as the “Status”. Flow developers in OWB are very familiar with the Success/Error/Warning flows that all OWB WF activities have and are the flow control for changing execution paths. When this configuration setting is on the function or procedure must return a type NUMBER and one of the follow three values (1 = SUCCESS, 2 = WARNING, 3 = ERROR).

CASE PATTERN :

A classic example of the need for dispatching is as it relates to a Job Management System. Nearly all OWB projects require the ability toexecuteseveral process flows and mappings in an orchestrated fashion to produce their end result: data transformed and loaded. These systems might have severaldifferent types ofjobs they might need to execute, each requiring a different set of process flows and mappings.The CASE pattern isa common way to handle this need to dispatch a job to the correspondingprocess flow that will execute and accomplish the job.

In pseudo-code, the logic would look like:
CASE
WHEN JOB_TYPE = 101 THEN EXECUTE_101;
WHEN JOB_TYPE = 102 THEN EXECUTE_102;
DEFAULT THEN ERROR (should always be a known job type)
END CASE

Consider the following diagram in the OWB process flow GUI. Those familiar with OWB will recognize the PROCESS FLOW and TRANSFORMATION operators used to accomplish the pattern.

A custom function (IS_JOB_TYPE) is built to check the job type of the currently running job, and return a 1 if the job type matches the parameter. This indicates that the current running job is of the specified type and the system should execute the process flow that corresponds to that type. In the above example the job will be checked if it is 101. If it is not, then it proceeds to the check if it is 102, etc. If it is it proceeds to the workflow designated for that job type (102 say) and continues. Note: the transformation MUST be configured as previously mentioned in the configure portion of the OWB GUI or else it will not dispatch according to return value, but rather dispatch according to the successful running of the PL/SQL. The PL/SQL will almost always “run” sucessfully so this ends up nearly always choosing the “SUCCESS” branch.

The PROCESSFLOW is all the logic for a particular job type, and is the “body” of what you want to do for the CASE PATTERN.

Having explained the more detailed CASE pattern, one can easily see how it might be used to build a simple IF/ELSE pattern using the configuration already mentioned…

Open defined by customer

Found an interesting blog on Jonathan Schwartz’s Weblog about the true definition of Open. While I don’t agree 100% with everything in his article, I think he’s hit the nail on the head.

He’s absolutely right.

…the definition of open that matters most is the one experienced by a customer or end user – “openness” is defined by the ease with which a customer can substitute one product for another

It’s interesting reading, and I hope to have some more time to add my own thoughts here…

Take a look in the Mirra

Found a neat looking network application today. Mirra is a network backup appliance that does a continuous backup, from wherever you are on the network to your Mirra drive.

It stores up to 8 previous versions of changes to a file. It does so continuously, and in the background so that you are always backing up. As a consultant at client sites often, my notebook computer is my lifeblood. This could save me from having to lug my 250 gb Maxtor OneTouch with me. According to the manual, the Mirra software queues up data changes (ie, backup information) when detached from the network and then sends those along once reconnected. Perhaps I’ll buy myself this for Christmas, or my Birthday; heck, Arbor day would probably be enough of an excuse. 🙂

 

This wheel invented for the ten billionth time

One of my customers recently initiated an expenditure tightening excercise. They had clear goals for a reduction of operational expenditures. While this particular excercise did not cut directly into the BI project I’m architecting for them, they have taken a comprehensive look at their web site infrastructure.

This customer had always been willing to pay for labor saving software, and it appears they own about one of everything from most vendors. Interwoven, ATG Dynamo, JRun, Oracle, Oracle AS, Veritas, Red Hat, Windows Server, IIS, etc. When they took an earnest look at how much of these high powered applications were costing them they realized they weren’t leveraging the features of the applications to warrant their ongoing upgrade/support.

All seems logical and straight forward to this point right? No huge gaps in common sense, yet. The technical architecture steering group put forth a philosophy to move off of vendor (closed source, purchased software) to more straight forward, simple, free software when appropriate for the situation. Still, there is no common sense alarms yet.

The web development team decided they wanted a general, flexible way to build and deliver content to their various web pages spread across several internet properties. Prepare to cringe — they decided to build, from the ground up, a web application framework. I’ve been around the block when it comes to web applications (I developed my first Java delivered Web Application in 1998 with the original Servlet API) and there have been great advancements to the way developers build apps on the web.

Approximately 80% of what one web site does, there are 10000 that are doing something nearly identical. Display some content, all over the page. Content can be varied formats from rich media to tabular numbered data. Which content displayed might vary by certain rules (time of day, user viewing, scheduled for that section, referring page, etc). Nearly all forms are identical from the GUI perspective, including a handful of wizards for step by step processing and error checking (or just a wizard with one page, the most simple form). There is usually some interface for business users/non techie types to manage the content of the site. None of this will sound foreign to you — it’s nearly standard for every website ever built. Which is why…

There are MANY MANY open source web application frameworks available to accomplish nearly all of heavy lifting for the most common 80% of website requirements. There are so many of these projects (because it is so common and so many developers are aware of this) that there are even projects to evaluate the projects. Some of the frameworks and APIs are considered rather mature (Tapestry, Struts, JSF, Cocoon)

I’m not an expert on these packages… I haven’t even used them on a real production project (too busy these days building BI systems). However, as most open source projects are, the most useful and most mature are the ones that are borne from developers resenting having to rebuild the wheel on project after project.

Given the state of the availability and maturity of these frameworks one wonders why IT departments would still be trying to roll their own. In my humble opinion, I’d take 100,000 lines written by 100 people working part time than 100,000 lines written by 1 person working full time. I urge IT departments to give the research of open source and yes, even vendor packages their due diligence when developing their TCO and ROI evaluations.

From what I can see, the wheel is good enough. Remember, perfect is the enemy of good enough!

Why bother over bits?

I have a customer that recently experienced the tense situation of one of their most crucial Oracle production databases running out of disk space. They had archive logs that were growing larger with the amount of access/updates that were occuring and the logs were starting to chew up some of their already scarce space. While not involved in their OLTP oracle environment, I observed their DBA group manage these Oracle instances and had some thoughts intended to broaden the perspective of groups in similar situations.

The rate of archive log growth peaked at 1 gigabyte/hr (gb). They keep 8 days of archive logs so figure there could be a maximum log requirement of 192 (24 hrs/day). This is the maximum because it assumes that all periods behave like the peak which will almost certainly not be the case. Ok, so we’ve now determined a very conservative maximum log storage requirement of (round up) 200gb. Setting aside their specifics for a second, perhaps we generalize:

My advise to DBA groups out there is that DISK SPACE IS CHEAP, over provision liberally and ADD MORE WHENEVER POSSIBLE.

Why?

  • There is no cost for Oracle to use more space. Oracle is licensed on a CPU/user basis and there is no metric that is directly affected by the amount of storage in terms of the Oracle software cost.
  • Backup applications don’t store unused disk space. The cost of unused over provisioned disk space is just that. The multiplier effect that is often determined for backup (full/incrementals/etc) do not apply for over provisioned unused disk space.
  • Disk Space is Cheap. Disk Space is Cheap. Disk space is Cheap.
  • Disk Space can be VERY cheap. Google has established an operational miracle with fast, redundant, massively scalable disk space costs them approximately $2.33/gb on an annual basis! WOW!

    According to resources mentioned, mirrored fast disk space can be procured for $2.33 to $2.60 per gigabyte on an annual basis. Assuming that most IT departments can’t achive these great operational efficiencies double that number so that we paid a bit extra for some vendor provided services/training/packaging. At $5.20/yr for a gigabyte my customer could have purchased some peace of mind for approximately $1000/yr. If you look at the operational budgets for these systems and applications you’d understand how nominal a figure this becomes…

    The number of hours lost to clients, admin time in patchwork to make the gigabytes stretch, DBA time to change DB parameters, and the list goes on… It’s just not worth it when, you guessed it, DISK SPACE IS CHEAP!

  • Competitive Advantage for Open Source?

    I was reading on slashdot this morning that Mozilla has been officially recognized as a 501(c)(3) by the United States federal government. Getting qualified as a charitable non-profit, giving software to the world can be a significant competitive advantage for the Mozilla directly and Open Source in general.

    Being a non-profit can provides significant advantage to Mozilla, and it’s respective aims. There will be opportunities to both decrease outlays on goods (hardware, servers, etc) and services (professionals donating time can reasonably deduct the hourly rate for those pro-bono). Mozilla could, depending on how far they wish to stretch the limits of the non-profit, provide tax breaks to open source developers in the US contributing at a reasonable rate. I have no idea if they plan on doing this, but it’s an interesting premise all the same and I think it would be just brilliant. There are also advantages from a revenue perspective.

    Companies wishing to support Open Source initiatives had to do so previously by funding that internally through developers time, etc. While this time is deductible as a business expense, it does appear to deduct against directly the business unit/department/project that is making said contribution. Companies now have the ability to make a greater contribution and have that contribution to the world of science, and humanity reflected in their tax bill. In theory, if Mozilla manages their fundraising efforts properly they may be able to significantly increase the amount of $$ they could spend on a central development team adding clarity and continuity to projects that are full of heart, but sometimes lack focus.

    I’m not saying that Open Source is just as much a worthy cause as many of the other humanitarian and charitable organizations. At the end of the year, I’d still likely spend a few hundred dollars that has a direct effect on saving and improving lives. Open Source does that, but in a different and proportionately smaller ways. However, providing this logistical benefit to companies wishing to support Open Source is a move in the right direction.

    UML from vi

    Picked this up from orablogs.com this morning.

    The many UML editors have a lot of whizbang features. Useful for sure, especially for environments that are heavily building intricate applications using methodologies that benefit from using the varied UML diagrams (Activity, Collaboration, Component, Deployment, Model Diagram, Sequence, Statechart, Static Structure, Use Case).

    In practice, most of us use just a few and use them in a reptitive fashion. Build a sequence diagram that documents a use case, print it, include it in documentation. Repeat for all 10 use cases. In my humble opinion, doing repetitive tasks in a gui is time wasted.

    Consider using this tool rather than mess with a GUI for building simple sequence diagrams. Since it takes text as input, you can even run your current .java files through it to render HTML.
    It generates the image for documentation, and since the “graph” is actually just a text file one could check it into CVS. I’ve not worked on Java for some time now, but if I ever need to get back on that bicycle, I’d strongly consider using a “UNIX shell programmers” UML tool.

    2gig + 2gig = 50gig

    I was recently writing up a volume and performance specification for a customer project when a discussion arose with the current DBA staff about volume projections. The intuitive thinking was the volume requirements for the BI/Data Warehouse would be the sum of the systems from which it sourced data. The group was thinking this would be a good way to approximate the required space for the system. I asserted this method is flawed, and had to suggest why that BI volume is proportionate but not necessariliy directly proportionate.
    BI systems required much greater storage than the sum of their sources because:

    • Data are denormalized. With denormalizing data to increase query performance one increases storage from 1-10000 times (it depends on the data, perhaps more).
    • New Data are created. There are many analytically significant items that occur that never even show up in source systems. For instance, a “Sales Fact” will be closely related in volume to “Order Line Items” in a source system. However, many BI solutions have business events like “Customer Acquired”, “Customer Lost”. These new business events are the result of the source system data but had not previously existed anywhere else (it was just created).
    • Summaries/Aggregates for query performance. Much of the data storage requirements is a factor of how much performance is required from commmon data access patterns (ie, user reports and ad-hoc analysis). If there are small data sets and end user performance requirements are minimal then summaries won’t require a great deal of space.

    So, make a point! A good way to estimate the actual storage requirements is to run sample datsets. Load a month or two of data using the summary/aggregate parameters you think will be required in your depoloyment. Examine the database for it’s storage utilization and take measurements. Measure it on one day, one week, one month, one year (if possible). Graph it. See what the data looks like. You could even build a function to calculate out the future based on the curve of growth.
    Most importantly, be ready for it to be different than what you expect! A few reports will require a summary that will make your predictions seem way off base. It’s to be expected; BI is a system not an application!