Since I mentioned in a previous post that I would be documenting various matching techniques for multiple data sources at the same time. I thought I would post an update on my research so far.
If you have worked with any sizable multi-data-source OLAP, Data Marts and Data Warehouses, we are talking more than 10 sources here. You will have encountered Merge-Match techniques before and more importantly rule based Merge-Match processing.
As I research more and more, I am leaning towards creating a simple API that enables rule based Merge-Match processing across both SQL Server and Oracle database’s initially.
This weekend I will post a potential fake out version of some simple Merge-match rules that could be used in this type of API and the kind of results and usage an API such as this could provide.
Key features I suspect would be:
- Simplistic data focused logic to evaluate Merge-Match candidates (notes on possibilities for this in my next post on merge-match rules/conditional logic)
- Multi-threaded for performance reasons
- Full exception and error reporting/logging
- Potential hooks for extending logic for things like soundex, fuzzy and type-mismatch merge-match logic
- Performance metric recording
- Very simple to implement API for developers to use
As of yet I have not decided which language to do this in, I would like to do it in F# but am not sure my coding skills in F# are sufficient.
Further information on Merge-Match techniques can be found by searching for ‘data warehouse merge match’ on Google or Bing.
Or if I am requested enough I will do a few blog entries on the subject ^^.
No comments:
Post a Comment