Database Operation Complexity Reference

I mentioned this previously, but I’ve been reading “Principles of Distributed Database Systems.” I’m enjoying it, and it’s helping me solidify many of the concepts that I apply daily in my capacity as a Principal BI Solutions consultant. Database theory, and specifically as it relates to tuning is part of any professionals work with Oracle. We’ve all deciphered the performance implications and clues for improvements from EXPLAIN PLAN. I’ve always been told you want to be able to give Oracle the clues/configurations to enable filter of results (selection) before joining. It always made sense but I had never understood fully the concepts behind these recommendations. Until now…

I don’t espouse to understand all of the complexities behind these issues but I ran across a great reference chart. I wanted to post it here along with some numbers for demonstration as a quick reference for any professionals with understanding of quadratic, logarithmic, and linear scales to match those up with the operations we use on a day to day basis.

This is from the book mentioned above, however I’m sure it’s rather common knowledge.

OPERATION COMPLEXITY
SELECT O(n)
PROJECT (without dedup)
PROJECT (with dedup) O(n*log n)
GROUP
JOIN
SEMIJOIN
DIVISION
SET OPERATORS
CARTESIAN PRODUCT O(n²)

A quick look at some numbers in the orders mentioned yield the following “costs” in n operation times.

n O(n) O(n*log n) O(n²)
5 5 3.49 25
10 10 10 100
100 100 200 10000
1000 1000 3000 1000000
1000000 1000000 6000000 1E+12

Hope this helps provide a useful way to match up the database operations we use on a daily basis with the theoretical cost of such operations. Cheers!

Quick Search Solution in Oracle

Found a nice post on rittman.net in reference to a simple search solution posted on Eric Mortensen’s blog.

It allows for a simple interface and a leverages the basic Oracle SQL wildcard (%) to implement a search on a full table basis. It’s a nice straightforward solution that takes all fields and concats them into one field that is delimited. Using this format, one can quickly right simple SQL that does simple searches based on a simple field replacement.

I wonder if this couldn’t be extended to use regular expressions, and if so which would provide better performance. I’m certain that oracle is tuned for text scanning on %, so I’m not sure that regular expressions would beat out the wildcard searching.

Perhaps this could even be implemented as a view, but I’m sure there might be trade offs by actually having to access the records in to the other table. Perhaps a fast refresh materialized view or an actual materialized view could solve that problem.

10g, Productivity with Choice

During a recent conversation about Open Source/Industry Standards/Vendor solutions I ended up spouting out the fact that Oracle’s philosophy on it’s offering is “productivity with choice.” I further realized I had no idea what that actually meant. So I looked it up… I am by no means very proficient on JDeveloper (I’ve fired it up a time or two to look at some BI Beans components). I must say I don’t fully know the practicalities about the JDeveloper package but I can say I understand what Oracle is doing. For 80% of what you will do on a day to day, make that wizard based and leverage RAD environments. For the rest, roll your own and Oracle has a laundry list of acronyms to suit.

Web Analytics meets SimCity

I have a customer with a few web properties. I am helping them sort out some of their web reporting needs and integrating them into a BI Environment. A few weeks back I mentioned to them, mostly jokingly, they should take a look at VisitorVille. They took the advice (althought it twasn’t really advice, just water cooler talk) and gave it a go. They pulled it straight away for technical reasons but I think they are still planning on hooking it up at some point.

It’s a rather unique product/service, and I’m not sure it’s applicable for many businesses, but I must say it is impressive looking. I think it just goes to show, there are many different and appropriate ways to view data enriched by a Business Intelligence system. I think there will always be wonderful and impressive ways to view data based on the particular data set, metaphor, and audience. I’m betting that the in between (data consumption, extraction, transformation, derivation, application of real time business logic, etc) will remain much the same, but presentation is where the creative types can add immense value and impact.

Too bad the business users can’t use VisitorVille like SimCity. Build another skyscraper (web page), a bus terminal from out of town (referral from an outside site) as easy as the real SimCity game allows. 🙂 Website owners would LOVE that… Manage your website from a game terminal.

CVS is pretty cool

Ok, this is by no means timely or newsworthy… But all the same I wanted to put some thoughts together about release management using CVS. I had a recent conversation with one of the Application Developers for a customer of mine in Boston, MA. This client had been using some pieces of CVS for some of their website release management. With the recent departure of their BUILD-MEISTER for greener pastures the group was left without a real CVS guru. It never hurts to write about a few of the great features of CVS that are powerful when embraced by developement teams.

CVS is Concurrent as it’s name suggest. OK, so this isn’t really a surprise, but everything about CVS is built around the fact that there are going to be multiple people needing to do work on the contents of the repository.

CVS builds on some great UNIX tools. Diff/merge and the likes are leveraged as part of the CVS system so that you can use some pretty cool packages and viewing mechanisms to resolve conflicts. Because it leverages some of these things, you have some flexibility for extensions. Also, one can use it remotely and over ssh to further enhance the availability and security of the system.

CVS is widely used, so it will be easy to use with other products. Developers/Vendors will likely provide support for CVS if they are building packages that are meant to integrate with a Source Control System. As it pertains to this particular client, the easy integration with Ant for building, jarring, and deploying their applications. I’ve built some rather robust Ant build scripts that can centralize most of a QA/build process. There’s plenty of web/windows/*nix clients for interfacing with a CVS repository which means your not going to get locked into just one vendors particular client/repository.

CVS can help you manage releases. Do development on 2.0 and be able to fix critical bugs in the 1x branch. Then, fold it in using some of the merging capabilities. The CVS tool is immensely useful for the administrator who is aware of branching, merging, and tagging.

Now, if only Oracle Development could be “managed in text files” so I could put it into CVS… 🙂

Jini, the silent coming of age

I have always been fascinated by Jini. I’ve attended two of the Jini community meetings and kicked the tires on many of the research projects at jini.org. In many ways it was a technology ahead of it’s time and since it didn’t make a huge SPLASH during the dot com boom, it hasn’t been adopted en masse.
NOTE: many of these links may require registration for the jini.org community and acceptance of the Sun SCSL (which is also a deterrent to the growth of this wonderful technology)

There are many community members much more familiar than I am on the state of Jini adoption. However I do continue to hope for something to happen with regards to it’s uptake. It still seems to be very much on the fringe. Surprising, there are fortune 100 companies using Jini.

There are more airline reservations made on Jini based systems (orbitz, aa.com, nwa.com) than any other electronic system (according to some information from Orbitz). They even won the

Duke’s Choice Awards — Orbitz has been selected as a winner of the 2004 Duke’s Choice Award, recognizing the “best of the best” from among all the cool projects going on in the world of Java technology. Orbitz team members are presenting TS-2614 at JavaOne. See why they won this award.

It’s a good technology that didn’t originally come with the whizbang set of installation wizards that the current frenzy of the dot com era required. It originally required the skill and aptitude of distributed computing engineers to recognize it’s benefits which were firmly placed on the fringe. Some of the projects that are being built on top of Jini offer some great additions to problems that Jini has a competitive advantage in solving.

My personal interest, and if I ever have one of those things that consultants refer to as “Extended Research Periods” would be of how it could address some of the data warehouseing issues I face on a day to day basis. As someone knowledgable with the domain knowledge of a problem (OLAP, huge fact tables, distributed query processing) could I use the current Jini technology and enhancements (such as rio, computefarm, etc) to build a wonderful distributed BI infrastructure? 🙂 Stay tuned… perhaps I’ll have one of those periods of time coming up!

Time and Money MUST be part of an IT architecture recommendation

Why – good architecture is good architecture right? I don’t ever remember the advanced OO design and analysis courses I’ve taken covering cost as part of their curriculum. OO, Design Patterns are supposed to be universal. The scope of the project shouldn’t matter – encapsulating data is good, no matter what!

GOOD ARCHITECTURE IS GOOD ARCHITECTURE NO MATTER THE CONTEXT, right?

  • That’s what the engineers say.
  • That’s what computer science and programming books teach us.
  • As a practitioner of good OO I used to evangelize that.
  • Academics and architects say that.

How would one arrive at the conclusion that good architecture concepts, such as encapsulation, polymorphism, etc are good architectural concepts regardless of the context? We must FIRST look at the definition of what GOOD ARCHITECTURE is.

ACCORDING TO WHOM DEFINITION OF GOOD ARCHITECTURE EXAMPLES
ENGINEER A collection of systems, programs, data that supports leading application and system development methodologies, concepts, patterns, and leading thinkers to minimize defects and increase reliability and extensibility Polymorphism, encapsulation, web services, component based architecture, standards based development, etc.
CEO/SHAREHOLDER (BIZ) Capital investment(of time/money) to:

  • decrease deployment and maintenance costs and increase reliability
  • decrease time to market for internal and external applications
Virtual Clusters, enterprise monitoring, standardize on J2EE, investing in coding standards, building infrastructure services and components

Fundamentally these are not in opposition. They are complementary and hardly ever are they directly opposed. The engineers view has pieces that will enable many of the BIZ requirements for good architecture. If they line up, and they are mostly in line and are complementary, when might they be in conflict?

BIZ users evaluate their capital expense in a variety of methods… Nearly all boil down to a very simple question of ROI. What is the benefit to me over time if I spend this money today? The strategy will be (no real surprise here), to invest capital that will have maximize benefit and minimal investment (do more with less). There is no shortage of the differing calculations, tweaks, estimations involved in this process.

So let us say that as an engineer or an organization engaged in commerce (ie, an IT department at a corporation) that given there are no motives for academic excellence or intellectual achievement that the true definition of GOOD ARCHITECTURE outside those endeavors is that of the BIZ:

  • Decrease costs
  • Increase reliability
  • Decrease time to market for new technology

Again, the engineers view is not in direct contradiction. They line up quite nicely and indeed are mostly congruent. Good engineering principles, from an academic perspective, can have immense benefits to a business including decreased costs, increased reliability, and shorter implementation times.

Let us, for the sake of clarity, label these two camps that are nearly always in agreement the Academic(predominant view of an engineer) and Applied(view of a business owner or executive management) architectures.

Yeah… but you promised to tell us when they are in conflict… We’re getting there…

Ok, so when might an architectural decision be good Applied architecture but bad Academic architecture? Let’s take a practical example:

Two particular architectures supporting two separate methods (A & B) for producing widgets:

  • METHOD A : Requires X staff to produce Y widgets. Rates an 8.5 on an abstract ACADEMIC QUALITY SCALE.
  • METHOD B : Requires X-10% staff to produce Y widgets. Rates a 9.5 on the ACADEMIC QUALITY SCALE.

Let’s look at some example numbers. There is a proposal to expend capital C (10k/year) on architecture/infrastructure to enable method B. X is 5k per year.

The engineer inherently evaluates on the ACADEMIC quality, and straightforwardly evaluates (according their engineer precepts) that CAPITAL C is money well spent on GOOD ARCHITECTURE because it increases the academic quality and saves 5% on X per year.

The biz evaluates on the APPLIED model, looking for ROI. In this case, it would take 20 years to recoup the cost of the capital with no increased revenue (still Y widgets). In this example the cost required to implement method B, while significantly improving productivity and the academic quality of the architecture, is actually greater than the benefit. In this case, doing the right thing (from an academic perspective) is actually a bad architecture based on the BIZ definition.

In cases where the projected COST OF BAD ACADEMIC QUALITY (decreased output, slower response times, difficult QA periods etc) is less than the estimated/projected COST SAVINGS OR BENEFIT from the invested capital in the GOOD ACADEMIC QUALITY ARCHITECTURE then it’s a better business decision to go for poor academic quality.

GET TO A POINT!!!

I’m not saying engineers striving for good architectural concepts is bad. It’s noble in precept and may also make great business sense. One can’t put a solution up on a white board and say, trust me, it’s good architecture because depending on what your IT charter is, it might not actually be. You have to prove it – add projects estimates, efficiencies, use statistics from trade organizations. Tell the people funding the cost of the shiny well tuned machine that it’s in their best interest because they’ll get 30% more application throughput, 30% reduction in time to delivery for requests, 75% cost of maintaining systems X,Y, and Z.

UPDATE: I was thinking it would be prudent to point out that I’ve observed the most successful implementations of sound engineering techniques has always been done by groups of people who fundamentally understand the concepts, where they are currently are, where they want to go, and then just consistently and opportunistically “realize” that vision through projects over time. If you see opportunities to make things better; do well.

Oracle 10g on VMWare

I’m working on a customer site that is using VMWare. While the regular DBA group insisted that none of their production databases are hosted on VMWare, a new BI project was scheduled to be the “test” to see if Oracle products (AS/EE DB) could run successfully on these “virtual machines.”

As a developer who is aware of the burstable nature of resource utilization, the promise of VMWare is immense. Our applications often use immense amounts of I/O, CPU, and Memory while doing their real work. However, a great deal of the time they sit waiting for human operators to catch up. The idea of housing many enterprise class servers on one physical system with a central location of management/provisioning/performance evaluation/etc sounds brilliant.

Now I’ve read the VMWare product material, and an overall positive set of opinions and benchmarks from people running on vmware. From where I sit, the VMWare stack is a good solutions for most systems. If I were running a farm of JBoss, Apache, MySql, IIS or other commodity-esque servers VMWare would be an ideal solution. Actually, I should rephrase, the programs that I refer to are those that have a VERY WIDE installation base and have been installed on a diverse and very hetergenous set of “Linux” or “Windows” hardware combinations. These programs are the ones that I believe would do very well in VMWare because they are used to “funny” hardware readings, etc.

Again, this is the experience of one consultant on one project, but I feel it forms a valid recommendation all the same…

Oracle on VMWare is possible but not recommended, currently. Oracle’s official position (note 249212.1 on Metalink) is that it is supported as long as you prove to them the same bug exists on a regular PC (having to have an identical physical system to prove to Oracle that it’s not VMWare sort of defeats the purpose of VMWare now doesn’t it!)

My primary reasoning being that “Oracle is an application built to determine intimately, the structure and performance of the underlying hardware in an effort to increase it’s own performance.” It has hooks into system areas trying to poll and read Memory Usage, I/O performance, CPU utilization, etc. Oracle endeavors to understand, at a low level, the configuration and the performance of the platform that supports it. VMWare, in my experience, tends to report some “funny” readings when it comes to Hardware.

    On my project, the laundry list of “funny” behavior include:

  • df commands that hang with no Ctrl-C response. Must log in on another term and kill -9.
  • applications do not use swap space. Linux sees the swap space but does not use it.
  • CPU readings show no idle time, full system time, with oracle Idle.
  • random startup/shutdown errors with Oracle with regards to obtaining network information
  • 10g AS EM Website has difficulty keeping track of processes (nearly always, the control panels show the component as down/unavailable/etc) when the components are operating properly.
  • Server can be “rolled back” to a save point in the past without any visible warnings of such an event (when a physical server is lost, the backup team typically notifies users that it was restored from X backup). VMWare can be rolled back with a simple reboot (obivously this is configured but it has to play into the physical deployment of your Oracle to ensure you get the right behavior).

Note: I’m no OS expert, and many of these can likely be resolved by a RH 2.1 Admin with proper VMWare knowledgable. Really though, are the admin/consulting dollars well spent resolving issues?

Some may flame me for being such a “business solution” biggot, but hardware is a cheap commodity and should be factored as such The price of a few 2way Xeon servers with RAID and copious amounts of RAM could easily be overshadowed by enough admin/consulting days lost on tracking down “funny” behavior.

The saving grace for VMWare may be a feature which I have not used. Live, development snapshotting. What I wouldn’t give for a great, easy way to take live snapshots throughout time while doing my work. Development is many times, exploration, and the ability to “play it safe” with the click of a button is brilliant.

I’d love to hear from anyone out there on their experiences with Oracle on VMWare. Perhaps I’ve just had a bad go at it… Hope, for reasons of happy Oracle professionals, your experience has been better!

UPDATE: The servers have also “FROZEN ENTIRELY” a la Blue Screen of death however this is RH Enterprise Linux. Just another item on the laundry list.
UPDATE 2: Servers have frozen up again, and the systems staff can not, for the life of them, figure out why the phase of the clock is so awful.

Distributed Database Systems

dds_coverI am currently making my way through the book, “Principles of Distributed Database Systems.” It reads like an academic course textbook, as I imagine was the authors intent.

I find it fascinating… It is also a bit challenging to try and remember what it’s like to be learning new notations and abstracted academic concepts. My day to day is so grounded in building customer solutions (very practical, with good applied technique and concepts) that I have to be deliberate to keep the mind sharp.

One thing that I’m particularly enjoying about this book is that I am seeing some of the concepts that I use my role as a BI Consultant from their starting points. I’m accustomed to interpreting Oracle plans, statistics, etc and now I’m able to relate that to the abstract concepts they represent.

I have to admit though, my interest does have a particular project in mind… I’m always wanting to build something that is more clever, and better than what’s out there now… This book might help me solidify some of those thoughts and add direction to my company R&D focus.