Solution Sheet
Databricks and Precisely Connect
Databricks and Precisely Connect
Breaking down data silos by integrating legacy, mainframe and IBM i data into Databricks Unified Data Analytics platform for cloud-based AI and ML projects
Pipelines for Legacy Data to the Cloud
Liberate data from legacy sources for use within Databricks Unified Data Analytics Platform and Delta Lake by building data pipelines with Connect. Connect offers a design once, deploy anywhere approach to ETL workflows, and the interface allows you to define both batch and streaming ETL workflows from the same view. This visual approach to data integration means there is no reformatting or code generation needed to construct high-performance data pipelines. With point-and- click transformations, you can free time and attention to focus on your rapidly expanding and evolving technology stack.
Build a Data Lakehouse with Databricks and Precisely Connect
Connect helps you to build a data lakehouse by efficiently offloading data from legacy data stores to Databricks Unified Analytics Platform. Onboard data from almost any source, including:
- Mainframe data: VSAM, COBOL Copybooks, mainframe fixed and sequential files
- RDBMS: Oracle, SQL, Db2, MySQL, Sybase, PostgreSQL
- Semi-structured data: JSON, XML
- Enterprise data warehouses: Teradata, IBM Netezza, Vertica, Greenplum
- Cloud: Amazon AWS, Microsoft Azure, Google Cloud Platform
- Big Data: Hadoop, Hive
- Streaming platforms: Apache Kafka
- Flat files: Fixed length, variable length, delimited
Connect takes an end-to-end managed approach to offloading data. Regardless of which source you choose, you can replicate hundreds or thousands of tables – including whole database schemas – into Databricks quickly and easily.
Scale for Machine Learning Projects
Your machine learning projects require collection and native integration of complex legacy data stores at scale. Connect collects the data you need from all your legacy data stores and sends it to Databricks, which provides a scalable framework for machine learning. Connect has native integration with the Databricks Runtime. Directly in your Databricks cluster, use Connect’s intelligent engine to cleanse, transform, and prepare petabyte-scale datasets for analytics within your tight SLAs.
Download this solution sheet to learn more.