Tuesday, November 17, 2009

More on Matching Techniques…

Since I mentioned in a previous post that I would be documenting various matching techniques for multiple data sources at the same time. I thought I would post an update on my research so far.

If you have worked with any sizable multi-data-source OLAP, Data Marts and Data Warehouses, we are talking more than 10 sources here. You will have encountered Merge-Match techniques before and more importantly rule based Merge-Match processing.

As I research more and more, I am leaning towards creating a simple API that enables rule based Merge-Match processing across both SQL Server and Oracle database’s initially.

This weekend I will post a potential fake out version of some simple Merge-match rules that could be used in this type of API and the kind of results and usage an API such as this could provide.

Key features I suspect would be:

  • Simplistic data focused logic to evaluate Merge-Match candidates (notes on possibilities for this in my next post on merge-match rules/conditional logic)
  • Multi-threaded for performance reasons
  • Full exception and error reporting/logging
  • Potential hooks for extending logic for things like soundex, fuzzy and type-mismatch merge-match logic
  • Performance metric recording
  • Very simple to implement API for developers to use

As of yet I have not decided which language to do this in, I would like to do it in F# but am not sure my coding skills in F# are sufficient.

Further information on Merge-Match techniques can be found by searching for ‘data warehouse merge match’ on Google or Bing.

Or if I am requested enough I will do a few blog entries on the subject ^^.

No comments: