Sounds obvious, yet many forget this…
Take an example that is pretty recent, a prototype spike of functionality was created that for a certain reason had to read a lot of data from many sources, and then write aggregated results a row and sometimes a column at a time. Now there could be millions of pieces of data being written real-time.
The issue here was how to increase performance, especially when doing many thousands of writes to a single database server at a time. In-fact we hit 400ms latency issues with SQL server running out of ports and being able to process requests and connections.
Of course we got performance increase buy running multiple threads of updates at once, but was still at the mercy of the single update/command latency.
Another way around this type of processing was to do all work as batch/set based updates, but there are issues in auditing data.
The best was so far was to create functions inside of SQL Server as native .NET but was platform fixed but also was not as useful as expected.
The above scenario and the many other times I have run into it, is one of the reasons I am looking at Cloud/parallel data systems and also BIG TABLE or NoSQL based data management systems both in memory and on disk.
The landscape is getting interesting, but as always in most cases teh performance we got with SQL is more than enough, however I wanted to see just how much more performance we can get. Especially when looking at very large data integration/merge/match projects or data management and data updates from many 100’s,1000’s, 100000’s of potential systems or users at once.
No comments:
Post a Comment