Manage your Hive data in HDFS with Talend

Apache Hive is a data warehouse built on top of Apache Hadoop and used for querying, summarizing and analyzing large datasets. Businesses can run Hive on hardware in their own data centers or on cloud platforms. Its HiveQL query language uses syntax similar to that of SQL. Manage Hive data in HDFS with Talend's suite of data integration tools.

Connecting to Hive

To connect to Hive, use the tHiveConnection component. Connection parameters differ depending on where your Hive data is hosted.

Find out more about connecting to Apache Hive

More about integrating Hive data

Talend has detailed documentation on how to ETL your Hive data for a better view of the business.

Connecting to HDFS

Apache HDFS (Hadoop Distributed File Systems) provides a software framework for distributed storage and processing of big data. In combination with tools such as MapReduce, Yarn, and other core modules, HDFS lets organizations build Apache Hadoop clusters of hundreds or thousands of nodes that can handle datasets of terabyte size. A robust ecosystem of other tools can take advantage of data stored in HDFS.

To connect to HDFS, use the Component tab of the tHDFSExist component. Enter the Hadoop distribution and version, the HDFS directory, and name of the file you want to use.

Learn more about connecting to HDFS

Get more from your Hive data

Deliver data your organization can trust... Get started today.

Explore Talend's full suite of apps