Are Star Schemas actually De-Normalized?

May 16, 2013

When we discuss differences between Star Schemas and ER models, one of the first points that gets to the mind is that “Star Schemas are de-normalized”.

Is this statement completely accurate?

Star schemas are built with 2 major types of building blocks; Facts and Dimensions.

When we take dimensions, yes, some data items are duplicated. When a dimension describes a data set that is organized as a hierarchy (e.g.: Location; Country – State – District – City – Post code), there will be a lot of data duplications. That would keep the dimensions valid within the definition of Star Schemas.

What about Facts? We always try to de-normalize facts to the maximum by describing measurable contents of facts with dimensions. There are no data duplications within a detailed fact table. Of course, there are occurrences that we keep some dimension attributes in the fact table itself for performance. But that’s not as a requirement. So, fact tables are normalized.

That makes a star schema a partially de-normalized model. But we can call it de-normalized since it’s not a 100% normalized model.


How to Push BI Project Success Rates Up?

April 17, 2013

Researchers say it and a generally accepted fact that over 60% of BI projects fail. To be fair, a high percentage of overall IT projects fail. But why BI is a top domain to produce failed projects?

BI has to win 2 worlds; technical and business. As any other IT project, BI has to win CIO approval. Then we struggle to mitigate all sorts of technical issues. They would start from simple network connectivity issues between servers and go on to unsupported data types to missing links between data sets. But all such technical issues are resolvable even though at additional expenses and additional time.

But harder to conquer would be the business world. Even with 100% successful technical implementation backed up by strong business sponsorship, if a BI system isn’t used by the business users actively, then it is not successful.

How should we win users? We should work hard to get the acceptance from the crowd. By nature, for many of the users a new BI system may look like as another additional tool/ additional work/ additional burden. They have all sort of work to manage during the day. And BI is not there to help day-to-day business directly.

Of course they will be thrilled when they see the benefits they will get using BI. But we have to get them started. What we can do? A little additional effort on;

  • Making the BI applications look nice. Look and feel is important as much as the accuracy of the contents delivered. After all it’s natural to look at nicer things.
  • Delivering high performance. BI systems eliminate the conventional Business-IT-Business report request cycles. But as soon as the BI system starts to fire those “timeout” error business users go back to the old ways.
  • Making it easier to get something done. Going towards self service BI won’t be easy. It would give disastrous results. Performance will be a key concern. One user can run a query to select all the rows of a multi terabyte transaction table and kill the whole BI server. To avoid that administrator will have to make extra effort. But that’s an investment we have to make protect our BI system.

OWB Error; RTC-5325

December 7, 2012

Another error caused by little mistakes.

Sometimes, when deploying a process flow, Control Center Manager will complain;

RTC-5325: The generator was unable to produce any valid objects, please correct the source object(s) before trying to deploy again.

There isn’t much help even from the detailed error message.  Actually it’s little misguiding too; at least in some cases.

Process flow is validated successfully.

All the dependant objects; mapping/ tables/ procedures are fine. Re-deploying all the dependant objects also won’t solve the problem.

Issue is not with the dependant objects of the process flow being deployed.

When a process flow is deployed, whole process flow package that the process flow belongs to is deployed. Therefore even if the process flow we are deploying is perfectly fine, different process flow from the same process flow package can cause errors.

That’s exactly what happened. There was a different process flow package that won’t be validated.

[Solution] Easiest way to track the error is to validate the whole process flow package. In the validation log, trouble maker will be highlighted.


More OWB Mapping Variables

November 30, 2012

In addition to “get_audit_id” discussed in a previous article; Retrieving Audit ID within an OWB Mapping itself, much more information can be retrieved through variables defined in OWB mappings.

These variables are not detailed in the documentation. At the same time, wrapped OWB repository packages are not readable. So we get only a little about the mapping variables right away. But we can learn a lot with a little experimenting.

As same as “get_audit_id”, rest of them also can be retrieved through Expression operators.

Here’s some of the variables that can be useful.

Auditing variables;

  • get_runtime_audit_id
  • get_model_name

Processing variables (regarding the data processed by the mapping);

  • get_selected
  • get_inserted
  • get_updated
  • get_deleted
  • get_merged
  • get_errors
  • get_logical_errors
  • get_processed
  • get_total_processed_rowcount

“get_model_name” is a very useful one. It returns name of the mapping being executed. With this we can eliminate the need for passing mapping names as constants within mappings to be used in further processing.

But we have to remember mapping name returned by “get_model_name” is enclosed with double-quotes. So to be useful double-quotes should be removed first. Replace function/ transformation can be used to remove double-quotes from mapping name.

Complete list of mapping variables can be checked by opening DB packages created when mappings are deployed.


Modern Business Intelligence Landscape

November 21, 2012

Today, Business Intelligence is complex. Many new technologies/ techniques/ subjects has been developed. BI is really high-tech nowadays.

At times it can be confusing; at different places, different domains/ subjects are referred under Business Intelligence.

BI product vendors have a lot of influence on domain categorization. E.g.:Some of them refer data warehousing is out of BI domain. Some vendors refer data warehousing and data integration also as part of BI.

In my opinion, I think data integration and data warehousing also are parts of BI.

With that, this is how I see ( or I like to see :)) BI in high level.

At least enterprise level, large-scale implementation will fit into this model. In such implementation, each of the high level domain should be considered independently but keeping enough space for integration.


Installing ODI 11.1.1.6.0 on Windows 7 64bit

November 19, 2012

I’m going to try ODI again. Current ODI version is 11.1.1.6.0. Now, it has come under Oracle Fusion Middleware domain. A major difference from 10g.

My environment for ODI installation is a Intel Core i5 machine with 8 GB memory. It’s running on Windows 7 Professional 64 bit.

ODI is 11g is supported on my system according to the System Certification matrix; http://www.oracle.com/technetwork/middleware/data-integrator/odi-11gr1certmatrix-163773.xls

I got the installation media downloaded from; http://www.oracle.com/technetwork/middleware/data-integrator/downloads/index.html?ssSourceSiteId=ocomen

Since I’m running a 64 bit OS, required installation files are;

ofm_odi_generic_11.1.1.6.0_disk1_1of2.zip

ofm_odi_generic_11.1.1.6.0_disk1_2of2.zip

According to the new Middleware Architecture, now we have to create a database schemas to store a repository before starting the actual ODI installation. To create database schemas Oracle provides RCU; Repository Creation Utility.

RCU installation is available at the same URL mentioned above.

ofm_rcu_win_11.1.1.6.0_disk1_1of1.zip

A pretty structured Installation Guide is available at; http://docs.oracle.com/cd/E23943_01/core.1111/e16453/overview.htm#CEGJJIFH

So there are 2 parts of my installation.

  1. Repository creation
  2. ODI installation

The repository is created in a RDBMS. So having access to a running database is a prerequisite.

To start with the repository creation, ofm_rcu_win_11.1.1.6.0_disk1_1of1.zip is extracted.

When extracted, within bin directory, rcu.bat can be found.

rcu.bat is executed;

Welcome screen appears and “Next >” is clicked.

We can use RCU to create or drop repositories. We want to create one here.

Next >

We define where we want the repository to be created. I’ve got an Oracle 11g DB on my machine. So we can use that.

I prefer to use SYS account in scenarios like this as it can avoid a lot of issues ( caused by bugs :) ).

Next >

RCU checks the DB for requirements for creating the repository. And here, it complains that the DB hasn’t been created with character set it supposed to be for repository creation.

My DB was not created solely for the ODI installation So I cannot do much about it now. Hope we can ignore this error.

Ignore

Other than the character set issue, my DB is ready for the repository creation.

OK

We have to select which repositories to be created. Since RCU is a common tool that supports many products, we has to select what we want; Oracle Data Integrator in out case.

I’d leave the default prefix as it is. I’m building a development environment anyway.

Next >

Prerequisites are checked in another level.

OK

Passwords for the repository schemas are defined. We have only one schema here. So the it doesn’t matter the option selection here.

Next >

Some ODI internal parameter settings; I’d accept the default value except the passwords.

Next >

The tablespaces where the schemas are created are defined. Even though we can change the tablespaces through “Manage Tablespaces”, I’m going to work with the default tablespaces RCU suggests.

Next >

Of course the tablespaces RCU suggested were not on my DB. They should be created fresh.

New tablespaces are created.

OK

All set to launch the repository creation.

Create

Repository creation is going on. It won’t take much time.

It’s done. No errors :)

Close

RCU has created a tablespace of 200MB out of which 15.6 MB has been used for the repository.

Repository creation is completed. Now, ODI installation can be started.

Also, according to the Certification Matrix, I should have Oracle JDK 1.7.0+ 64bit installed on Windows 7 64 bit. So I had to download the JDK and install since I had only JRE before starting ODI installation.

Now the system is ready for the installation; ODI installation files should be extracted.

When both the files are extracted; setup.exe in Disk1 is executed. In my case, I needed to execute it with “Run as administrator” as the account I had logged in with is not in the Administrators group.

When setup.exe is executed, an error is fired;

“ERROR: Provided the JDK/JRE location while launching the installer. Invoke the setup.exe with -jreLoc parameter(s)”

According to the error message, it seems to be the setup.exe cannot find the JDK installation.

So I called the installer as instructed in the error message; from the command prompt (started with “Run as administrator :) )

setup.exe -jreLoc “C:\Program Files\Java\jdk1.7.0_09″

Nothing happened. Prompt was just returned.

In the JDK path, there is a space. This could be the issue here.

So I tried the path eliminating the space.

setup.exe -jreLoc “C:\Progra~1\Java\jdk1.7.0_09″

Same :(. Just returned to the prompt.

Another way; without the double-quotes;

setup.exe -jreLoc C:\Progra~1\Java\jdk1.7.0_09

Now something is happening.

ODI installer was launched :)

It’s a regular “Welcome” screen.

Next >

I’d skip the software updates for the moment.

Next >

Selecting what’s needed to be installed. The machine I’m installing on is my development environment as well as the data locations (sources and targets). So I’d select both “Developer Installation” and “Standalone Installation”.

Next >

My system is through with installation prerequisites check.

Next >

I usually keep big installation away from C: drive. I changed installation directory to E: drive.

Next >

Better to let the installer do as much as work. So I select to “Configure Repositories”.

Next >

Connection detailed to the repository created by RCU is provided.

Next >

Supervisor password is provided as defined during setting ODI parameters.

Next >

I’d need a work repository as I have to do my developments here.

Next >

I defined my agent as “ODIAgent1″ with the port 60000.

Next >

I’m skipping security updates for the moment.

Next >

I’m sure about it.

Yes

Almost there. Better to save the configuration details for future reference.

Install

installation goes on. Hope it’ll complete without errors. Hardly have that comfort.

So far so good…

Installation completes. And configuration process is started automatically. Configurations also has been completed successfully.

Next >

Installation completed. Better to save configurations details also.

Finish

Successful installation in the first attempt itself !!!

Now I have an ODI environment with a Master Repository and a Work Repository named WORKREP ready for development. Both repositories were created in a single database schema.

I wish ODI development also will be this easy :)

Elapsed time is showed here as 45+ minutes. But this time is with a lot of disturbances. Actual installation only took around 15 minutes on my Core i5, 8GB machine.


Retrieving Audit ID within an OWB Mapping itself

November 16, 2012

OWB generates and stores a lot of Audit data on deployments and executions in its own repository. All OWB audit data is accessed through an “Audit_ID”.

We can find the Audit_ID of a mapping execution by searching audit tables using mapping name and execution time. But at times it can be not so accurate. It’d be much better if we can record the Audit_ID within each mapping execution itself.

How can we retrieve Audit_ID within a mapping? Not that hard; OWB generates/ processes the information we need. We just have to pick what we need.

Here’s a simple mapping to retrieve Audit ID within a mapping itself.

Primarily, this mapping loads data from T1 to T2. It’s a simple one-to-one loading.

At the same time, it loads Audit_ID into a separate audit table; tAudit.

EXPRESSION operator retrieves “Audit_ID” as “OUT1” Output attribute. “OUT1” is loaded into tAudit.

CONSTANT operator returns a constant “C1” as a dummy input attribute for “Expression” operator. C1 is never used actively.

The picture below, shows details the EXPRESSION operator.

OUT1 attribute has been defined as “get_audit_id“; “get_audit_id” is a variable defined in the mapping (the PL/SQL package created on the database for the mapping). During the mapping execution, Audit_ID generated is passed to OUT1.

When the mapping is executed; it shows the total number of records loaded by the mapping; (data records + audit records).

Note the Audit ID of the mapping execution; 335. This should be available in the tAudit table too.

There it is… :)


Follow

Get every new post delivered to your Inbox.