Yet another key aspect in relation to matching is that of cleansing data. By cleansing I mean normalizing/correcting data as much as possible. Where possible you want to have reliable data in the data sources being matched.
Should this not be the case, all matches are potentially incorrect.
yes it seems obvious, but many seem not take this into account even when doing simple matching processing for duplicate data in a single source, let alone across multiple data sources.
As part of the merge-match blog entries I am making I will also elaborate on various cleansing and data quality checks that could and can be made. there are a great many techniques here, most are from hardcore ETL systems and Data warehousing systems.
And as always a lot of reference data and indeed systems can be found online.
No comments:
Post a Comment