Category Archives: Oracle

owa_pattern.amatch gotcha

I’ve been using regular expressions for years. A few handy cut, sorts, egreps, and uniqs work wonders on datafiles for quick easy troubleshooting. I spent an hour or so tracking down what I think is a funny implementation of regular expression matching. Thought I would quickly post it, and hope google picks it up so that perhaps I can save some poor soul the couple hours I just lost to it.

There is limited information about owa_pattern available; one of the better resources I’ve found is this page on stormloader.com. One of the bits of advice on that page is to:

Or use the second function in owa_pattern, amatch. This function has an IN parameter that dictates where you want to start to match. For our purpose, we don’t care so set it to 0. What we want from it is the type of the return value, integer.

Examine the next two commands and their results, and perhaps you can glean what the issue is with setting the offset to 0
select DATE_UTC, owa_pattern.amatch(date_utc, 0, ‘2004’) from xxxx yields

[12/Jul/2004:00:07:27 0
[12/Jul/2004:00:07:37 0

select DATE_UTC, owa_pattern.amatch(date_utc, 9, ‘2004’) from xxxx yields

[12/Jul/2004:00:07:27 13
[12/Jul/2004:00:07:37 13

The offset, or where to start, doesn’t necessarily mean that that is where the pattern matching will begin to scan for the pattern. It most literally means that the pattern must start at that offset. I’m not a huge expert on regex, but I wonder why a function that is supposed to tell YOU where the regular expression exists in the string requires that it start on an offset specified. Anyone have a perspective or a different experience (perhaps I’m looking at this improperly)?

OWB10g Paris : UI Improvements

Of course software vendor product teams products shots always show their UI configured in the most efficient way possible leading you to below that the development tool their pitching will be the greatest thing for your efficiency since Venti quadruple non-fat lattes. Many times, the actual product usually falls short and requires a significant curve to “grock” how the product designers thought that YOU would be going about your job. The OWB product team has experience with doing a significant rewrite of their GUI (9.0.3 to 9.2 was significantly different). If those OWB verterans thought that was significant, prepare to be shocked.

The OWB10gR2 GUI will delight those that have not used OWB before as it has embraced some UI paradigms of other IDEs. Those familiar with JDeveloper, JBuilder, NetBeans, etc. will have a much easier time learning OWB. The OWB product team has accomplished this by making two significant UI changes:

  • Smaller Fonts are pervasive through the product. : I dread having to return the OWB 10gR1 with it’s monstrous fonts that make displaying even the most simple of mappings difficult. This smaller fonts on dialogs, wizards, editors, and navigation are a great enhancement to the product. In the short list of features I would have submitted for the 10gR2 version of the product this would not have been on it. I am very thankful that I was not the one composing the shortlist; the smaller fonts make the entire product much more usable on the whole.
  • Dockable Panels throughout : The editors and OWB navigator have created a much improved display to support the real work that occurs in OWB. They have created a set of screens that allow the focus to be on the primary work at hand (process flow, mapping, table definition, etc) but allow for the peripheral, oft utilized parts of the definition available in a fast refreshing set of panels surrounding the main event. One panel display the properties for the objects highlighted thus eliminating MANY MANY redudant clicks and scrolls, and highlights, and drills to the attribute. Another panel displays a palette of objects that can be added to the process or mapping. The panels are dockable (make them float), and collapsable (running out of space quickly expand and collapse them).

While most of the benefit of their UI efforts are captured in the two major changes there are other beneficial UI changes. Overall the OWB feels like it is quicker, and more attuned to the work at hand. Birds Eye Views of your edited object, automatic layout buttons all add a certain refinement and really lead me to believe that OWB is really maturing into a robust comprehensive product. Of course, it must be for us BI consultants, ETL developers, architects, DBAs to spend entire days with our noses plugged to the screen.

This blog is part of the OWB Paris Early Review series which reviews and comments on several new Paris features.

OWB PARIS : Early Review

Oracle was kind enough to invite bayon technologies to their World Headquarters in Redwood Shores, CA for testing and feedback on their next OWB release, codenamed Paris. During the week, a panel of expert partners and customers were asked to review new features, test these new features, and provide feedback to the OWB product team ahead of their release to the public later this year.

I will post “bite-size” reviews/observations about a new or improved area in OWB. This series will evaluate the new features as they would be applied in practical, ongoing development and management of traditional Business Intelligence and Decision Support Systems. At times it will be anecdotal, subjective, and opinionated. Other times they will merely convey acronyms and bullet lists of information. Whether they’re informative or rantings, I hope they are useful to you. Please feel free to be in touch with me with any comments/corrections to the series.

Howto: Make If/Then/Case with OWB Processflows

There are significant limitations to the implementation of Workflow processing in OWB 10g r1 and prior. There are methods available to developers that can add at least some amount of logic and conditionals such that it might be able to behave like an application. This section will enumerate how to use the facilities in OWB currently to build an IF/ELSE workflow pattern and a CASE workflow pattern.

One might ask, why use the tool for this particular processing pattern when it was not originally intended to operate as such? In a new BI environment there is a significant need to simplify the tools for developing the BI solution. Standardizing on one development tool and one platform has certain “soft” benefits in terms of project staffing, training, maintenance, extension. Why recruit and train developer on multiple tools and packages when one package provides 95% of the requirements and the remaining 5% can be accomplished with some “creative” application of the general tools available? Encapsulating nearly all application work into the OWB repository has significant benefit in regards to change management, metadata maintenance, and minimizing TCO by consolidating the development environment for BI to one tool.

OWB developers are anxious for improvements in upcoming releases of OWB. Engineers know you can never “wait for the perpetual next version” and while we all look forward to the improvements in future versions we need to be able to meet our project requirements with what we have today. Knowing some projects need standardization on OWB for lower TCO of BI we must be able to accomplish it with current versions.

Accomplishing and IF/ELSE and CASE workflow pattern is contingent on the ability of OWB to do the following:
Use the return value of that function as the “Status”. Flow developers in OWB are very familiar with the Success/Error/Warning flows that all OWB WF activities have and are the flow control for changing execution paths. When this configuration setting is on the function or procedure must return a type NUMBER and one of the follow three values (1 = SUCCESS, 2 = WARNING, 3 = ERROR).

CASE PATTERN :

A classic example of the need for dispatching is as it relates to a Job Management System. Nearly all OWB projects require the ability toexecuteseveral process flows and mappings in an orchestrated fashion to produce their end result: data transformed and loaded. These systems might have severaldifferent types ofjobs they might need to execute, each requiring a different set of process flows and mappings.The CASE pattern isa common way to handle this need to dispatch a job to the correspondingprocess flow that will execute and accomplish the job.

In pseudo-code, the logic would look like:
CASE
WHEN JOB_TYPE = 101 THEN EXECUTE_101;
WHEN JOB_TYPE = 102 THEN EXECUTE_102;
DEFAULT THEN ERROR (should always be a known job type)
END CASE

Consider the following diagram in the OWB process flow GUI. Those familiar with OWB will recognize the PROCESS FLOW and TRANSFORMATION operators used to accomplish the pattern.

A custom function (IS_JOB_TYPE) is built to check the job type of the currently running job, and return a 1 if the job type matches the parameter. This indicates that the current running job is of the specified type and the system should execute the process flow that corresponds to that type. In the above example the job will be checked if it is 101. If it is not, then it proceeds to the check if it is 102, etc. If it is it proceeds to the workflow designated for that job type (102 say) and continues. Note: the transformation MUST be configured as previously mentioned in the configure portion of the OWB GUI or else it will not dispatch according to return value, but rather dispatch according to the successful running of the PL/SQL. The PL/SQL will almost always “run” sucessfully so this ends up nearly always choosing the “SUCCESS” branch.

The PROCESSFLOW is all the logic for a particular job type, and is the “body” of what you want to do for the CASE PATTERN.

Having explained the more detailed CASE pattern, one can easily see how it might be used to build a simple IF/ELSE pattern using the configuration already mentioned…

Why bother over bits?

I have a customer that recently experienced the tense situation of one of their most crucial Oracle production databases running out of disk space. They had archive logs that were growing larger with the amount of access/updates that were occuring and the logs were starting to chew up some of their already scarce space. While not involved in their OLTP oracle environment, I observed their DBA group manage these Oracle instances and had some thoughts intended to broaden the perspective of groups in similar situations.

The rate of archive log growth peaked at 1 gigabyte/hr (gb). They keep 8 days of archive logs so figure there could be a maximum log requirement of 192 (24 hrs/day). This is the maximum because it assumes that all periods behave like the peak which will almost certainly not be the case. Ok, so we’ve now determined a very conservative maximum log storage requirement of (round up) 200gb. Setting aside their specifics for a second, perhaps we generalize:

My advise to DBA groups out there is that DISK SPACE IS CHEAP, over provision liberally and ADD MORE WHENEVER POSSIBLE.

Why?

  • There is no cost for Oracle to use more space. Oracle is licensed on a CPU/user basis and there is no metric that is directly affected by the amount of storage in terms of the Oracle software cost.
  • Backup applications don’t store unused disk space. The cost of unused over provisioned disk space is just that. The multiplier effect that is often determined for backup (full/incrementals/etc) do not apply for over provisioned unused disk space.
  • Disk Space is Cheap. Disk Space is Cheap. Disk space is Cheap.
  • Disk Space can be VERY cheap. Google has established an operational miracle with fast, redundant, massively scalable disk space costs them approximately $2.33/gb on an annual basis! WOW!

    According to resources mentioned, mirrored fast disk space can be procured for $2.33 to $2.60 per gigabyte on an annual basis. Assuming that most IT departments can’t achive these great operational efficiencies double that number so that we paid a bit extra for some vendor provided services/training/packaging. At $5.20/yr for a gigabyte my customer could have purchased some peace of mind for approximately $1000/yr. If you look at the operational budgets for these systems and applications you’d understand how nominal a figure this becomes…

    The number of hours lost to clients, admin time in patchwork to make the gigabytes stretch, DBA time to change DB parameters, and the list goes on… It’s just not worth it when, you guessed it, DISK SPACE IS CHEAP!

  • 2gig + 2gig = 50gig

    I was recently writing up a volume and performance specification for a customer project when a discussion arose with the current DBA staff about volume projections. The intuitive thinking was the volume requirements for the BI/Data Warehouse would be the sum of the systems from which it sourced data. The group was thinking this would be a good way to approximate the required space for the system. I asserted this method is flawed, and had to suggest why that BI volume is proportionate but not necessariliy directly proportionate.
    BI systems required much greater storage than the sum of their sources because:

    • Data are denormalized. With denormalizing data to increase query performance one increases storage from 1-10000 times (it depends on the data, perhaps more).
    • New Data are created. There are many analytically significant items that occur that never even show up in source systems. For instance, a “Sales Fact” will be closely related in volume to “Order Line Items” in a source system. However, many BI solutions have business events like “Customer Acquired”, “Customer Lost”. These new business events are the result of the source system data but had not previously existed anywhere else (it was just created).
    • Summaries/Aggregates for query performance. Much of the data storage requirements is a factor of how much performance is required from commmon data access patterns (ie, user reports and ad-hoc analysis). If there are small data sets and end user performance requirements are minimal then summaries won’t require a great deal of space.

    So, make a point! A good way to estimate the actual storage requirements is to run sample datsets. Load a month or two of data using the summary/aggregate parameters you think will be required in your depoloyment. Examine the database for it’s storage utilization and take measurements. Measure it on one day, one week, one month, one year (if possible). Graph it. See what the data looks like. You could even build a function to calculate out the future based on the curve of growth.
    Most importantly, be ready for it to be different than what you expect! A few reports will require a summary that will make your predictions seem way off base. It’s to be expected; BI is a system not an application!

    Database Operation Complexity Reference

    I mentioned this previously, but I’ve been reading “Principles of Distributed Database Systems.” I’m enjoying it, and it’s helping me solidify many of the concepts that I apply daily in my capacity as a Principal BI Solutions consultant. Database theory, and specifically as it relates to tuning is part of any professionals work with Oracle. We’ve all deciphered the performance implications and clues for improvements from EXPLAIN PLAN. I’ve always been told you want to be able to give Oracle the clues/configurations to enable filter of results (selection) before joining. It always made sense but I had never understood fully the concepts behind these recommendations. Until now…

    I don’t espouse to understand all of the complexities behind these issues but I ran across a great reference chart. I wanted to post it here along with some numbers for demonstration as a quick reference for any professionals with understanding of quadratic, logarithmic, and linear scales to match those up with the operations we use on a day to day basis.

    This is from the book mentioned above, however I’m sure it’s rather common knowledge.

    OPERATION COMPLEXITY
    SELECT O(n)
    PROJECT (without dedup)
    PROJECT (with dedup) O(n*log n)
    GROUP
    JOIN
    SEMIJOIN
    DIVISION
    SET OPERATORS
    CARTESIAN PRODUCT O(n²)

    A quick look at some numbers in the orders mentioned yield the following “costs” in n operation times.

    n O(n) O(n*log n) O(n²)
    5 5 3.49 25
    10 10 10 100
    100 100 200 10000
    1000 1000 3000 1000000
    1000000 1000000 6000000 1E+12

    Hope this helps provide a useful way to match up the database operations we use on a daily basis with the theoretical cost of such operations. Cheers!

    Quick Search Solution in Oracle

    Found a nice post on rittman.net in reference to a simple search solution posted on Eric Mortensen’s blog.

    It allows for a simple interface and a leverages the basic Oracle SQL wildcard (%) to implement a search on a full table basis. It’s a nice straightforward solution that takes all fields and concats them into one field that is delimited. Using this format, one can quickly right simple SQL that does simple searches based on a simple field replacement.

    I wonder if this couldn’t be extended to use regular expressions, and if so which would provide better performance. I’m certain that oracle is tuned for text scanning on %, so I’m not sure that regular expressions would beat out the wildcard searching.

    Perhaps this could even be implemented as a view, but I’m sure there might be trade offs by actually having to access the records in to the other table. Perhaps a fast refresh materialized view or an actual materialized view could solve that problem.

    10g, Productivity with Choice

    During a recent conversation about Open Source/Industry Standards/Vendor solutions I ended up spouting out the fact that Oracle’s philosophy on it’s offering is “productivity with choice.” I further realized I had no idea what that actually meant. So I looked it up… I am by no means very proficient on JDeveloper (I’ve fired it up a time or two to look at some BI Beans components). I must say I don’t fully know the practicalities about the JDeveloper package but I can say I understand what Oracle is doing. For 80% of what you will do on a day to day, make that wizard based and leverage RAD environments. For the rest, roll your own and Oracle has a laundry list of acronyms to suit.