Category Archives: Oracle

OWB10g Paris : Data Profiling

The new Data Profiling features of the upcoming OWB release will knock your socks off. Other new features are applicable, useful, and successfully augment the product and make it better. Data Profiling opens up a whole new world to OWB with it’s wide applicability outside of Oracle Business Intelligence. In other words, readers should note that this feature is not just for OWB Developers but rather could merit firing up OWB purely to use this feature. OWB developers might be a bit disappointed with how OWB works to manage corrections, which are based on the output from Data Profiling but that shouldn’t dissuade anyone from using this feature.

Most BI and Data Warehousing professionals fancy themselves experts of their organizations’ data; they think they understand their structures, volume, and understand how they live and breathe. In other words, if you are curious which data in your enterprise is least clean consult your DW operations staff and they’ll provide anecdotal tales of why the POS system is a POS (different acronym) and that they “often get erroneous transactions with special adjustment code 99.” What is typical, and ironic given the DW mandate of providing insight and information, is that the DW team can not quantify and measure the quality of their data. What kind of questions could you use to drive data quality initiatives in your enterprise? Knowing the cost of scrubbing and manually restating (Sarbanes-Oxley consultants cringe here) data and the overall cost of making decisions by delayed or improper data wouldn’t you want to know where to start working?

Data Profiling addresses many needs by providing a set of functionality to examine, profile, and measure the quality of data. It provides a set of screens that allow you to point the application at a Data Source and immediately get some valuable information about your data. The Data Profiling features of OWB fall into three broad categories:

Attribute Analysis : This type of profiling revolves around the content and structure of particular attributes in your data sets. This involves numerical precisions, distinct values, outlier identification, min/max/avg, declared types versus actual types, and pattern typing.
Referential Analysis : Understanding how data relates to other data is a very important component of data profiling. They wouldn’t call it a relational database if the relationships amoungst data weren’t one of the fundamental and most important concepts of data organization. Identifying orphans, joins, childless entries, and redundancy identification all fall under the umbrella of Referential Analysis.
Functional Analysis : Beyond the pure join relationships, there are also ways of surmising how the data functionally relates to each other. One can surmise the relationship between attributes and relationships for determination and dependencies. Normalization candidate identification can be achieved by leveraging this analysis (identify which attributes might be prime candidates for factoring out and normalizing into an exterior table).

While it might not be as small as a Fax Wizard the Data Profiling features of OWB require a sort of linear progression, especially early on in the process. This makes sense; you have to connect to your data, identify which data is of interest, retrieve and analyze your data, explore and understand your data, and optionally deploy code to monitor and make corrections to your data.

I’ve included a few screenshots, which speak quite clearly to the types of analysis that OWB can perform on your data. It should be noted, that since Oracle retrieves and analyzes the data in a temporary workspace that one can perform this analysis on data contained in nearly any heterogenous supported connection, including plain old flat files.

Figure 1. Screenshot of some of the aggregation statistics available. Note the ability to actually see the detail rows here

Figure 2. If you don’t feel like consuming the analysis in numerical form OWB provides a way to visualize characteristics of your data

Figure 3. Domain stats are very useful for determining status codes, valid product categories, etc. It also shows details and percentages

Probably the feature Oracle is hoping you’ll use most, and why Data Profiling is included in OWB instead of an alternate tool or standalone product is OWB’s ability to monitor and make corrections to data based on the work one does in the Data Profiling section of the product. OWB endeavors to allow you to quickly, and programatically leverage the results of an analysis into working monitors and data cleansing routines. It’s a noble start at implementing easy rule based automated ETL generation and cleansing of data but has some difficulties in ongoing management. See Note Number 1 Below

Figure 4. Based on a rule you’ve defined as part of your profiling, launch the Create Correction wizard

Figure 5. One can define several actions that are implemented to “correct” the data. There are some excellent capabilities for the correction action (Match Merge, Remove, etc)

One can create rules and corrections from inside the Data Profiler that can be deployed to correct and audit your data. Consider a real life example of a data source that contains order information, including order status history records. An order will contain information about the Customer, order amount, etc. Order History records contain information about what occurred to the order and when (received by warehouse, shipped, etc). You’ve analyzed your current data set using the data profiler and you’ve found that Order History records arrive with status codes that are two letter codes, but occasionally) when orders are forced to a different status an operator uses mixed case (“Sh” instead of “SH”) for the status codes. You’ve got 100k order statuses conforming to your dictionary of status codes, and a couple hundred that are off by one lower case character. Your company is entitled to a 10k discount from your fulfillment partner company every month the error rate exceeds .5% off the agreed upon data formats.

How can you leverage the new OWB feature to: audit the error rates of the received data; and correct the simple case of mixed case status codes?

Connect to and identify the data of interest. Depending on the arriving format for the data, you’ll need to decide how to get it imported into the land of OWB and the Data Profiler. If it’s in a SQL Server or Oracle database you can enter in some connection information and connect directly and browse the data dictionary. If it’s in a flat file, you’ll need to sample and describe the data file.
Run the profiler. Review the information that you’re seeing about the particular attribute(s) in question. In our example we’re interested in the STATUS CODE column of the order history records. The profiler will indicate a suggested domain (valid uppercase status codes) and then indicate how many (and %) records do not conform to those specifications. You can drill into these (VERY USEFUL feature) records to see that nearly all are still valid two letter codes, but have mixed case.
Derive a Rule based on the domain of the records. You select the attribute, and use the Derive Rule wizard to create a rule that says the STATUS CODE must be one of the valid upper case codes.
Create a correction based on the rule. You’ve now created a rule that specifies valid values for that column. How does one make corrections to these records so that they can be processed along with their compliant peers? Using the create correction wizard you can specify how to react to the rule violation. In this case one would use some of the provided logic (match merge capabilities of OWB) and define the logic for identifying the mixed case codes, and correct them by applying some TO UPPER CASE logic. This will create a mapping and other requisite structures for executing the logic. Note: The fact that the corrections are themselves OWB objects is a blessing and a curse. More on that toward the end of this article
Add the mapping to your process flows. This will execute, in the place you deem it necessary where the data will be evaluated and possibly corrected.
Schedule an auditing process. Outside of the correction execution you might just want to evaluate the state of the data so that you have a record of data received, it’s error rates, why it failed it, etc. You can do an audit ad-hoc, scheduled, or within a process flow. Naturally many locations will like to retrieve data from their source systems, perform the audit as part of the process flow, and then proceed with any correction and further warehouse processing.

Data Profiling is a very worthwhile feature in the upcoming release of OWB. It is certainly useful to a BI consultant who needs to quickly learn and understand a customers data. I hope readers find it as useful as I have.

This blog is part of the OWB Paris Early Review series which reviews and comments on several new Paris features.

NOTE 1: Corrections are generated in a one time process and it generates all sorts of OWB objects (mappings, tables, etc) to accomplish a correction. While this is handy, I’m guessing the ongoing management will be difficult. OWB creates one of their new “nested” mappings for implementing the cleansing rule and then the OWB developer includes this pluggable mapping in your ETL(you can run it standalone if you so choose). It’s a one time generation and you can’t manage the rule. If you change your correction (include this NEW particular code or type) you have to regenerate the correction and possibly re-add it to your ETL.
NOTE 2: Thanks to Paul and Nikolai for their contribution to this blog.

OWB performance, start here

There are literally millions of bits of data and pages in books on how to tune Oracle and PL/SQL. Most important to BI professionals is the need to beat the clock, and have their data ready by start of business(8am) at a minimum. Of course if you’re a right time environment you have even more stringent requirements. It’s likely your data processing takes a while and OWB can give you some great information on where it’s spending time. This is useful if you’re trying to pick a few mappings to “tune” and get the most bang for the buck.

Connect to the OWB_RUNTIME schema, or use select access from another use such as OWB_USER.

select
OBJECT_NAME name,
trunc(min(elapse_time)) min,
trunc(max(elapse_time)) max,
trunc(sum(elapse_time)) sum,
trunc(avg(elapse_time)) avg,
count(elapse_time) count
from all_rt_audit_executions
where 1=1
AND task_type = ‘PLSQL’
AND created_on >= to_date(’12/10/2004′, ‘MM/DD/YYYY’)
group by OBJECT_NAME
order by avg desc;

Expressed in Seconds… Yields the following results (maps changed to protect the innocent)

NAME	MIN	MAX	SUM	AVG	COUNT
map1	0	31988	165372	20671	8
map2	5494	68905	135111	19301	7
map3	1672	3509	20542	2567	8
map4	316	3511	14502	1812	8
map5	1018	2170	13089	1636	8
map6	436	1353	6784	848	8
map7	478	1272	6476	809	8
map8	309	2243	5351	668	8
 ..................

You’ll see this identifies several candidates for tuning. You also have an idea of what kind of performance gain you can hope to expect. If you improve the avg time of map1 by 10% you’ll save yourself approximatley 20671 * .9 / 60 = approx 35min per execution. This is not hard and fast, seeing as though some tasks run in parallel (if map 4 is always waiting for map3 in a process flow, you won’t cut down your time).

Howard Rogers is blogging

Howard Rogers has always had some excellent articles posted on his site but now offers additional published nuggets on his new blog. Always a fan of great content no matter the form…

50K SQL Statements for 100 rows of data

Unlike a great deal of Oracle professionals I started my career building Software using OO methodologies and Java. I approach the Oracle world with an application developer paradigm, and have had to unlearn certain OO precepts. Mostly I’ve had to concede that in practical development of technology solutions (greatest benefit, least cost, highest ROI) that the persistence layer can not be a big dumb chalkboard exclusively. Perhaps some of my Java colleagues may disagree but there are significant cost, implementation, and maintenance advantages to leveraging the power of a full featured RDBMS.

As I was reading Mark Rittman’s article on his Performance Tuning Excursion I was reminded of a SWAT task I was given working with the DBA group at an online retailer. The data access patterns of OO, their OO-Relational mapping components, etc can sometimes be downright counterproductive towards building a sound technology solution. I was working with an original implementation of this and witnessed a seldom used feature end up nearly bringing the entire site to it’s knees. The page received a brand name, and was meant to display the products and their respective primary categories. The result sets would range from 1 to about 100 products. I witnessed a similar behavior to what Mark describes in that the persistence layer was querying thousands of times for smaller atomic units to build the composite objects on the application server. I’ll leave it to the reader to delve into Mark’s thorough explanation…

Consider using the extended SQL trace instructions from Mark or further consider p6spy if you don’t have access to Oracle (and it’s a Java application of course).

OWB10g Paris : Experts

In my opinion the name does not adequately capture the functionality but I’m not sure I can suggest anything better. The Expert feature is a combination macro language, best practices scripting, OWB extension SDK, and labor saver. While it might be tough to name the new feature which does a great deal, it is not tough to see the direct applicability of Experts for OWB customers.

OWB Experts is a new OWB feature that allows for building guided experiences within OWB. Experts allow one to expose limited pieces of OWB, and OWB functionality a guided manner. Many programs have this type of functionality, with Microsoft Office being famous for it’s effective use of Fax Wizards, Mail Merge Wizards, etc. NOTE: The OWB product team assures me they have no plans to consider, ever, never in a millions years add a dancing paper clip to OWB.

Why would Oracle provide Experts functionality, to throttle their product back and provide views of it that make it appear to do less? While OWB is an immensely powerful tool it’s learning curve is not insignificant, and I’ve witnessed many competent Engineers/DBAs users overwhelmed by the complexity of the product. Beyond that, as part of their modest Developer Suite seat it’s an inexpensive product to involve many hands throughout an enterprise to make light work.

There are two things you can do with experts, and there are a variety of options on how to do both. You can design an expert, and you can run experts. Designing an expert is typically done by power OWB users. These are your Data Warehouse Architects, ETL Engineers, and DBAs. Experts can be run by anyone, from domain experts who happen to be novices in OWB to ETL engineers adhering to certain best practices. Obviously experts have to be designed before they can be run but it’s helpful to understand where they’ll be run and used before discussing how they’re built.

RUNNING EXPERTS:
Experts can be run by either executing them inside of OWB, or embedding them in third party applications. Embedding them in third party applications opens up a whole slew of possibilities for advanced customer implementations and Oracle partners. While you can embed them in your applications, most OWB experts will be run inside of OWB. There is a nice feature of the Expert implementation that allows execution of a specific expert on OWB client startup. This would be useful if you wanted to provide a limited view so that users would only see the expert, and not really see OWB at all. Of course, you can run an expert the good old fashioned way by executing it from within the OWB tree. There are plenty of options to suit, and suffice to say that you’ll have the capability to provide your experts to users in a method that suits your users.

DESIGNING EXPERTS:
Experts are designed inside of OWB using a graphical interface that is similar to the editors for other OWB objects. An expert is composed of Tasks (things to do) and transitions (process and flow control). In my opinion, it most closely resembles the Process Flow editor in that you are defining a sequence, combination, and processing logic. There is a palette for adding tasks to the expert, a method for creating transitions between Experts, a properties box for setting specific values for task. There are forks and dispatching transitions, which allow the wizard to take a different path based on user input. This feature allows for screens 2 and 3 to request different information, based on the input the user entered on the first screen. The transitions are instrumental in defining order and flow control, however the real usefulness comes from the palette of possible Tasks.

There are numerous tasks that are on the Beta version palette, and I’m not sure the list has been entirely finalized or publicized(there were approximately 20 in the build I used). These are the actual “steps” of the Expert and are usually observed by a particular screen in the application. A few to help you mull over possibilities of Experts in your projects include:

Custom Dialog : This is immensely useful, as it allows for a quick way to collect responses from the user. It basically pops up a screen that is configured by the Expert designer and receives information from the user (perhaps a table name, or some Oracle connection information).
Data Viewer : The OWB product has exposed their “data viewer” component on the palette. In some use cases, one needs to display the data from a data table to help the user either confirm they’ve got the correct table, or understand a bit more about the contents of the table.
Impact Analysis : This helps the user understand what the impact is of modifying an object. This would be useful to guide a user through the modification of a dimension, to ensure they are thinking and noting any impacts before actually making the change.
Java : This is really a “blank check” in the Expert functionality. This task allows you to embed Java application code directly into this point within the Expert. The possibilities really become quite endless when you consider the full power of Java available in your Expert creation. There will likely be some limitations of how you interact with OWB (or more accurately, how easily one interacts with OWB) but this feature demonstrates the OWB understanding that they are never going to be your only tool/system.

Perhaps it’s appropriate to wrap up with an example use of an expert:

Let us consider a situation where you are using a third party ROLAP tool that has a metadata repository. This is where the OLAP tool goes to understand the structure, hierarchy, cubes and dimensions that are stored as tables inside Oracle. It has it’s own specific metadata format/API and one can make Java calls to a web service to define and describe the metadata of your marts. When your BI team adds a dimension in OWB they must also run through some series of steps to add it to ROLAP tool metadata so that end users can see it in their reporting tool. Every time they follow the same process, add dimension, go to tool and add it.

Consider creating an “Add Dimension” expert in the upcoming release of OWB. The expert would simply:

Launch the Add Dimension Wizard
Allow the user to fill in the OWB Dimension, hierarchies, levels, attributes, etc
Use Java to call some web service and describe the new Dimension to the ROLAP tool

Steps 1 and 2 are easy to just drop into the Expert and configure. Step 3 would require the authoring of custom Java code that would be executed. The key here though is that it is done so in the context of OWB and so has immediate knowledge of what dimension was just added, at what location, with what underlying table name, etc. The benefits of automating these types of tasks has always been clear; minimize actual time spent to accomplish repetitive tasks and minimize errors by programatically enforcing standards and best practices.

Experts hold great potential for power customers, consulting firms, and Oracle Partners to add best practices and efficiencies to an already efficient and powerful tool.

This blog is part of the OWB Paris Early Review series which reviews and comments on several new Paris features.

NEED A LOT OF ROWS?

Ok, this is not earth shattering, but perhaps google might find this blog for someone in need of a quick and dirty method for generating a bunch of “dummy” rows in oracle without heading into the land of PL/SQL.

Use the table ALL_SOURCE, which always has several hundred thousand rows (just checked one DB and it has 435608).

insert into dummy_table (key, value)
select rownum, type from all_source where rownum

Oracle on VMWare Linux, Part 2

I’ve written about Oracle on VMWare before, but thought I would share an alternative perspective with readers. Howard Rogers has published a nice article on how to install VMWare, install White Box Linux, and suggests it works perfectly well. While I wouldn’t now suggest that production environments consider it suitable, I’m glad that someone has had a better experience with this product than I’ve had.
Note: There’s also an interesting discussion about his favorite font, for those looking for a digression.

PSOUG : 10/20/2004 : DBMS_RECTIFIER, Instead of Triggers, ORA_HASH

As a recent transplant to Seattle, I’m keen on getting connected with my local community of Linux, Open Source, BI, and Oracle brethren. I took in my first Puget Sound Oracle Users Group meeting last night. The group draws between 10-50 people per week, appears well organized, and based on my observations from my first meeting, provides substantial substance in a peer-to-peer setting.

Daniel Morgan, from the UW extension school, presented three mini-lectures on DBMS_RECTIFIER, Instead of Triggers, and ORA_HASH. I highly recommend you visit these pages on PSOUG, they have examples that demonstrate the following features quite nicely.

DBMS_RECTIFIER is a rather interesting package, especially for someone who spends a great deal of my Oracle time building CDC routines for Data Warehousing. At a basic level, it’s two procedures that provide a quick method to easily identify differences in data tables, and correct those changes. The rectify has an interesting behavior, and it’s not particularly well suited to CDC. However, the differences procedure quickly identifies differences (can be scoped to only a few columns as well) and places the different ROWIDs in a seperate table. In a pure Oracle environment, a BI professional could make use of this since ROWID provides a very quick way to access rows.

Instead of Triggers apparently have been availabe since Oracle 8. Some views are updateable, if they match a certain criteria. However, those that do not match this criteria, there is still a way to allow for updates/deletes/inserts on these views. Oracle allows you to make a special trigger, an Instead Of trigger that only applies to Views. They allow someone to specify exactly the DML to occur, and can include procedure and function calls, and DML on completely different tables. This could be used to :

Audit all inserts/updates through a defined view to a seperate audit table. I know this is old news, but this is a pretty slick way to accomplish this.

Provide an interface where certain DML statements become impossible. One can provide a view to a table that limits the DML so that certain columns are never updated.

Interface with an external system. One could create views in Oracle that provide an interface to a remote system. Data inserted/updated/deleted from these views could send these inserts/updates/delete to an external system via anything you can write in PL/SQL.

ORA_HASH. Quick. Flexible. Hash Function. Useful for tons of things. If you know how to use HASHing, you’ll find this convenient. It too, has been in Oracle for ages. 🙂

OWB Runtime : How to purge audit data easily

The OWB runtime generates runtime audit data for every process, mapping, and file loading that it invokes. If you have intricate flows and load your warehouse/mart several times a day (every 5 minutes perhaps?) you’ll see these pile up quickly. The easiest user method for purging this audit data is through the OWB Runtime Browser that allows you to select (checkbox).

Sometimes you don’t want to manually do this, but rather have a way to do this programatically. Perhaps because you want to keep a clean house (always purge greater than 30 days old). Perhaps you generate a lot of audit data in a development environment that’s a hassle to delete from OWB RT Browser. Perhaps you rarely visit the OWB RT and don’t want to fuss with keeping it up.

Whatever your reason, please feel free to use the following little script to systematically purge old audit data. It must be run as your OWB Runtime Owner. I’ve not run this by Oracle, or seen them document this API so use it at your own risk. It works, though.

begin
for purge_ids in (select distinct top_level_execution_audit_id id
from ALL_RT_AUDIT_EXECUTIONS
where updated_on

OWB10g Paris : Installation and Security

The OWB team has clearly heard the masses and responded to their difficulty in installing and getting “up and running” with OWB. I have to think back to those first few days of head scratching over the installation and “quick” start guides to remember how tough it was to just get to a point where you’re building tables and composing mappings. Thinking back it seems painless but as I’ve observed at customers and conferences alike there are great difficulties and a learning curve to installing OWB. Smart, intelligent DBAs and BI architects had to spend some amount of time to understand the concepts to get going.

Previously there were numerous steps (I believe the OWB team mentioned a total of 60 steps, but those included “double click on this” and the like). They’ve been simplified by the consolidation of the two repositories. OWB veterans are aware there were previously two primary repositories that one interacts with to get things done. There was the DESIGN repository which held the metadata, ETL definitions, table structures, etc Then there was the RUNTIME repository which invoked and audited the acutal flows, mappings, and deployments being executed. Beyond that, every schema that was going to execute any mappings, or have any data objects also required the installation of a TARGET runtime components so the runtime repository could execute and monitor ETL in that schema. In a stage/warehouse/multiple mart setting, this sometimes meant stepping through the Repository Assistant many times (24 times at one of my customers for a set of development/stage/production environments).

OWB10g Paris consolidates the two primary repositories (DESIGN and RUNTIME) into one unified repository. This greatly eases the burden of installation, which can be done from a simple “New User” wizard on the client. During my testing of the product, we were working with a single database so I was not able to test a multiple database installation. The OWB product team indicates that the original topology is still available for customers wishing to use only portions of certain repositories (ie, use one design, with development/testing/production runtimes).

There is also an augmented OWB user security, permissions, and role based system that supports another great feature of OWB installation. Target schemas (which often make up the bulk of installation time in new environments) allow for new target schemas to be added as a simple afterthought with on the last screen of an add User wizard. OWB objects now have permissions and can be assigned throttled permissions (full ctl, edit, read) by both user and role.

It was noted by OWB product team that the security provided in OWB is enforced for access to the metadata, not the actual deployed objects. If one restricts the “Customer” table in OWB to only those with “INTERNAL_EMPLOYEES” this means that only those users in INTERNAL_EMPLOYEES are allowed to utilized, examine, and otherwise edit the CUSTOMER OWB table. When that table is generated and deployed, it is then subject to the normal Oracle permissions and OWB does not have a method for managing that.

My fellow testers echoed the sentiment that if OWB is really positioning itself to be an inclusive, central location for BI work in an enterprise it will also need to figure out how to capture security and access rules. OWB team product team acknowledged need… We’ll see if it ends up in a major revision or two out.

This blog is part of the OWB Paris Early Review series which reviews and comments on several new Paris features.

Goodman on BI

Thoughts on Open Source, Analytics

Category Archives: Oracle

OWB10g Paris : Data Profiling

OWB performance, start here

Howard Rogers is blogging

50K SQL Statements for 100 rows of data

OWB10g Paris : Experts

NEED A LOT OF ROWS?

Oracle on VMWare Linux, Part 2

PSOUG : 10/20/2004 : DBMS_RECTIFIER, Instead of Triggers, ORA_HASH

OWB Runtime : How to purge audit data easily

OWB10g Paris : Installation and Security