Muse of the Mind...: Very Large Transaction systems…

Well for the past year I have been working on a data management product that can deal with a ridiculous amount of data and data transactions.

The product is a finance industry related product and is designed to work with Financial Market Data and also financial reference data feeds. Some of which can potentially have multi-million entries per file in some extreme cases.

And of course the more feeds you process and match-merge the amount of data transactions (insert, update, removal, indexing, relational mapping and querying) can mean you process exponentially more data.

Now the issue is simply there is only so much processing that can be done at certain parts of a transactional process before you hit some pretty major latency issues and processor performance issues, shortly after these are resolved you tend to hit physical infrastructure issues.

A casing point is using SQL server bulkcopy on an optimized table you can import data at an incredibly speed, however if you need to process that data, mark it up in some way and merge it with other data, performance will nose dive. Add in basic auditing/logging support it will drop again.

Most people resort to two main architectures in this light

BRBR – Bloody Row by Row Processing

This is the most flexible way of processing large data and provides many ways to improve speed, but is still along way behind batch processing in all but a couple of scenarios.
BRBR can be optimized the following ways

Multi-threading
Parallel Processing
Multi-database server
Cloud and Grid computing and processing

The above is pretty new, and not that mature yet

Custom Data storage and retrieval based up data and how it is obtained

There are a few of these out there including a few very powerful and fast data management and ETL like tools

BRBR can suffer greatly based upon

Skill of engineers
Database technology
Database latency
Network latency
Development tool used

Batch Processing

If you are able to just use a single database server you can look at the various ways of doing batch processing

Batch processing can be Optimized by

Having a great god like DBA
Database tool selection
Good Cloud/grid computing support

batch Processing can suffer greatly

Anytime you need to audit data processing that is done in batch data. most of the time you may have to run another batch process or a BRBR process to create audit data that can be generated easier at point of change in a BRBR based system
Database dependency
System upgrades
Visibility into data processing/compliance

That's it for now kind of a messy post.

I will clean this up later, and add a post or two on advances in cloud computing in relation to Very large database processing issues.

Muse of the Mind...

Tuesday, December 01, 2009

Very Large Transaction systems…

No comments:

About Me

Labels

Blog Archive