Change Data Capture vs. Change Tracking: Three Real-World Examples
In today’s world, ensuring that the data your business depends on for its daily operations is always reliably up-to-date is an absolute necessity. That’s why two relatively new technologies, change tracking and change data capture (CDC), have become vital tools for tracking changes made to the database systems that are at the heart of modern corporate IT.
Although they are similar in concept and function, change tracking and CDC differ in important ways. Let’s take a brief look each of them and see how they compare.
Change Tracking | Change Data Capture |
---|---|
Creates a hidden, internal table so users can query the time and type of any change made to a row in the database table | Identifies and captures just the most recent production data and metadata changes, and then enables data replication software to copy those changes to a separate data repository |
Only stores the most recent change to each row – doesn’t maintain change history | Maintains a history of row changes, including the actual data that was changed |
Operates synchronously, so that change information is refreshed in real time, coincident with the incorporation of that change into the database | Asynchronously reading the database’s transaction log to detect when a change has occurred |
Change Tracking (CT)
When Change Tracking is enabled for a particular database table, it creates a hidden internal table that can be queried to ascertain the time and type of any change made to a row in the database table. CT is often described as a “lightweight” solution because it only stores the most recent change to each row – no change history information is maintained.
A particular advantage of CT is that it operates synchronously, so that change information is refreshed in real time, coincident with the incorporation of that change into the database.
Change Data Capture (CDC)
In contrast to CT, CDC maintains a history of row changes, including the actual data that was changed. It does this by asynchronously reading the database’s transaction log to detect when a change has occurred. Since CDC is not directly involved in database update activity, its use does not impose any significant performance penalty.
A potential disadvantage is that because CDC retains history information, storage overhead is higher than with CT. Also, due to CDC’s asynchronous update mechanism, there can be a brief delay between the time when the database is changed and when that change is reflected in the CDC change tables.
Read our eBook
Streaming Legacy Data for Real-Time Insights
Understand the challenges to streaming legacy data and how Connect can help your businesses stream real-time application data from legacy systems, such as mainframes, to mission critical business applications and analytics platforms that demand the most up-to-date information for accurate insights.
CT and CDC examples
One advantage common to both CT and CDC is that in distributing or replicating data from a source database, only change data (rather than entire databases) must be transmitted.
CT is well suited for applications that require notice of a database change, but which don’t need a change history. A good example is an application that is run regularly (as opposed to continuously). Each time such an application is executed, it can query a CT-enabled database to see which rows were changed, and then, if necessary, retrieve data only from the rows that were updated.
On the other hand, CDC is most useful for transactional applications where maintaining historical data is important.
Here are some examples:
Data Extraction and Synchronization – When only current data is needed for analytics applications, or to keep separate databases in sync, CT may be the best solution. The target application or system can stay current by retrieving data only from those source database rows that have changed.
Data Warehouse and ETL – In general, a data warehouse that incorporates transactional or business intelligence (BI) information is required to maintain historical as well as current data. For these, the ETL processes used to initially load and continuously refresh the warehouse will normally benefit from incorporating CDC.
Business Intelligence (BI) – CDC is particularly useful in extracting data updates from a mainframe into a Hadoop cluster so that BI analytics doesn’t consume costly mainframe CPU cycles.
Our CDC solution
Precisely Connect captures data from IBM Db2 for z/OS, IBM Db2 for i, VSAM data sets, and other sources, and reliably replicates it, in near real time, to data warehouse and database targets such as Hadoop and Microsoft SQL Server. And it does so with minimal impacts on database performance and network bandwidth.
To learn more, read our eBook: Streaming Legacy Data for Real-Time Insights