Manage your HDFS data in Databricks with Talend

Apache HDFS (Hadoop Distributed File Systems) provides a software framework for distributed storage and processing of big data. In combination with tools such as MapReduce, Yarn, and other core modules, HDFS lets organizations build Apache Hadoop clusters of hundreds or thousands of nodes that can handle datasets of terabyte size. A robust ecosystem of other tools can take advantage of data stored in HDFS. Manage HDFS data in Databricks with Talend's suite of data integration tools.

Connecting to HDFS

To connect to HDFS, use the Component tab of the tHDFSExist component. Enter the Hadoop distribution and version, the HDFS directory, and name of the file you want to use.

Learn more about connecting to HDFS

More about integrating HDFS data

Talend has detailed documentation on how to ETL your HDFS data for a better view of the business.

Connecting to Databricks

Databricks is the leader in unified analytics and founded by the original creators of Apache Spark™.

The Talend tDBFSConnection module connects to DBFS (the Databricks Filesystem) system. DBFS components are designed for quick and straightforward data transferring with Databricks. For more sophisticated scenarios you can also use Spark Jobs with Databricks.

Work with your HDFS data

Computing data with Hadoop distributed file system Create a file in a defined directory, get it into and out of HDFS, store it to another local directory, and read it at the end of the Job. See how >

Get more from your HDFS data

Deliver data your organization can trust... Get started today.

Explore Talend's full suite of apps