I was asked yesterday, just how much Data Warehousing and indeed BI knowledge I have. And how long have I worked on issues relating to workflow systems, logic systems, expert systems, compilers, interpreters and data consolidation technologies and products. So here goes a brief overview.
I started out in IT a good 20 years ago, and during that time I must admit I have seen a great many technologies.
I was fortunate in that I was introduced to large and very large data (and sets of data) when I worked as a Principle consultant for a market leading Workflow and Document Management solutions provider in the 90’s during this time I worked on what was at the time regarded as one of the leading workflow systems used in some banks for document management and also business process.
Remember that we are talking 1990’s here, this is before the advent of tools like WWF (Windows Workflow Foundation), what was also remarkable about that system which I also helped to architect solutions with and work with its code base; Is the fact that it worked natively in Java and also on the Microsoft platforms also. It was a ‘C’ based core API, with extended API’s in Java and at the time Visual Basic of all things.
I will say this however, it opened my eyes to logic/rules engines in a very big way and started my learning process on how to write:
- complex text parsing/scanning systems
- Interpreters (script engines and logic engines)
- workflow systems (automated, and exception driven)
- data integration from multiple sources into a single master source/copy of data.
It was also while working at this company, I helped a large group integrate many data sources of information we are talking between 50-100 separate and disparate sets of data into a large normalized data structure and sub-structures that had a single master copy of data, audited data related to how the master copy was created and references to all source’s used in that master. ( I had emails, data feeds, satellite feeds, and direct entry to match/merge for a specific business driven purpose). I also worked on implementing a full workflow system for a national postal service which used our workflow/rule engine API’s/UI’s to manage its postal sorting/routing and delivery –> an immense amount of data and processing that most people take for granted, it was a very humbling process)
At the time it caused me no end of pain to figure out with the teams I was working with, but in the end I learnt the true power of how integration of multiple-sources into one could be done, and indeed how a real workflow system and rules engine can be applied.
Ironically, a good 12 years later I am still working in the same area of technology, I have implemented interpreters/compilers and rule engines in many places, as-well as parsing engines, integration engines and workflow systems. (And far to many metric based systems related to data imported/transformed or used over time…)
I guess that’s why I am now documenting some of the more basic and publically common architectures for doing these kind of systems now.
I do admit that personally, I really enjoy the challenges in these systems, especially combining interpreter architecture with workflow, rules engines and parsing of data. and this blog is based on a new journey that I am making as I look further into these area’s.
It is also nice to see that the industry as a whole in the past 3 years has moved to adoptions of DLR’s, Workflows and business rule engines for most aspects of software delivery.
One of the more notable data warehousing systems I have a lot of understanding and respect for is Oracle’s 10g,11g OWB product (data warehouse builder), it has in my opinion one of the most flexible matching and single master copy of data processing/rule engine systems I have ever encountered.
In-fact the match/merge API I am thinking of creating here, is driven a lot by what I have seen in that product and many other referenced products in the EDM/MDM/ETL/E-TL and BI market place.
All of the techniques and process I am showing here is actually pretty old techniques (quite a lot originated in the golden computing era of the 70’s), I am just using more up to date technology.
If you Google on any of these technologies and techniques, you will find many solutions to these problems, almost all are now public domain and food for thought for those creating their own systems
No comments:
Post a Comment