Manage your Apache Impala data in HDFS with Talend

Apache Impala is an open source massively parallel processing SQL query engine for data stored in Apache Hadoop. It allows data analysts to run low latency and high concurrency queries on Hadoop for business intelligence. Manage Apache Impala data in HDFS with Talend's suite of data integration tools.

Connecting to Apache Impala

To connect to Impala, use the tImpalaConnection component. Choose a property type (built-in or repository) and additional details such as the host, port, database, and username.

Learn more about connecting to Impala

More about integrating Apache Impala data

Talend has detailed documentation on how to ETL your Apache Impala data for a better view of the business.

Connecting to HDFS

Apache HDFS (Hadoop Distributed File Systems) provides a software framework for distributed storage and processing of big data. In combination with tools such as MapReduce, Yarn, and other core modules, HDFS lets organizations build Apache Hadoop clusters of hundreds or thousands of nodes that can handle datasets of terabyte size. A robust ecosystem of other tools can take advantage of data stored in HDFS.

To connect to HDFS, use the Component tab of the tHDFSExist component. Enter the Hadoop distribution and version, the HDFS directory, and name of the file you want to use.

Learn more about connecting to HDFS

Get more from your Apache Impala data

Deliver data your organization can trust... Get started today.

Explore Talend's full suite of apps