What is HTAP?
An htap database supports both workloads in one database, providing speed and simplicity. And today, “cloud-native HTAP” is a thing; users want an HTAP database that they can mix and match smoothly with Kafka, Spark, and other technologies in the cloud. Use cases include fraud prevention, recommendation engines for e-commerce, smart power grids, and AI.
HTAP databases work with – and, to a certain degree, are designed for – integration with streaming data sources, such as Kafka, and messaging systems used for advanced analytics, AI and machine learning such as Spark. They serve multiple analytics clients, from business analysts typing in SQL queries, to BI tools, apps, and machine learning models, which generate queries in the scores or thousands per second.
Before HTAP – Separate Transactions and Analytics
HTAP combines different kinds of data processing into one coherent whole. The two types of processing differ considerably. Transaction processing – adding and updating records in a database – demands a very high degree of reliability for single-record operations, along with accuracy and speed. “Update Sandy Brown’s current address” is an example of a transactional update.
Analytics processing, on the other hand, means looking very quickly through one or more database tables for a single record, or many records, or total counts of a type of record. “Find me all the subscribers who live in Colorado and own their own home” is an example of an analytics request.
The first effective databases, first widely used in the 1970s and 1980s, were transaction-oriented. They came to be called online transaction processing (OLTP) systems. OLTP systems were optimized to work on underpowered computers with small hard disks – by today’s standards, of course. The only analytics was through printed reports, which might be sorted on various key fields, such as by state or ZIP code.
When analytics was added on later, the transactional systems were already busy, so the data was copied onto a separate computer, running different software. These databases are called online analytics processing (OLAP) databases. Data warehouses and data marts are specialized OLAP databases that house non-operational data for analysis.
Data on OLAP systems was queried using various languages, which coalesced around structured query language (SQL). At first, analytics queries were entered directly by individual analysts; eventually, business intelligence (BI) programs were used to make querying easier. More recently, software applications generate queries of their own, often at the rate of thousands per second.
An entire process and discipline called extract, transform, and load (ETL) was created, simply to move data from OLTP to OLAP. As part of the ETL process, data owners may mix different databases of their own, externally purchased data, social signals, and other useful information. However, the use of three different silos means that data in the OLAP databases is always out of date – often from one day to one week old.