Friday, November 7, 2008

Event Stream Processing: Scalable Alternative to Data Warehouses?

I read in Infoq a text called Event Stream Processing: Scalable Alternative to Data Warehouses?, posted by Sadek Drobi, where he is mentioning a post that Dan Pritchett wrote in his blog, suggesting an alternative solution to data warehousing applications. Pritchett acknowledges that sometimes data needs to be aggregated in order to be analyzed, but the way Extract, Transform and Load software (ETL) functions induces costs in terms of scalability and reactivity. He said: "First, the ETL places a significant load on your production databases. If your business has nice offline windows for the ETL, that's great, but if not, managing the scale becomes a challenge. Second, the freshness of the warehouse is typically 24 hours behind or more. As your business grows this lag will grow as well."

Dan Pritchett believes that there could be a solution that would be less expensive and more scalable: processing streams of events using an Event Stream Processor (ESP) solution.

He also said: "ESP analyze streams of events using a language similar to SQL. In the same manner that databases and data warehouses use SQL to perform analysis of data tables, ESP use their query language to analyze streams of events. The simplest way to understand ESP is to think of events as rows in a table and the attributes of an event as the columns. Each event type is the equivalent of a table."

Dan highlights however that this approach does not allow performing historical analysis in order to get on the business activity a perspective that is different from the one considered at real time.

I think it could be an interesting approach, but as written in a comment in the post, you also can use ESP to replace ETL while keeping the data warehouse intact, to preserve the ability to do historical analysis, that you lose if you don't use the data warehouse.

No comments: