Manage your Greenplum data in HDFS with Talend

Greenplum uses massively parallel processing (MPP) architecture and a database based on PostgreSQL to deliver analytics on large datasets. Greenplum uses MPP to distribute the big data workload to data warehouses and maximize a system’s resources in parallel. The Greenplum Database software can be deployed on on-premises hardware or cloud servers, and parent company Dell EMC also sells it as part of a hardware bundle. Manage Greenplum data in HDFS with Talend's suite of data integration tools.

Connecting to Greenplum

To connect to a Greenplum database, use the tGreenplumConnection component. Choose a property type (built-in or repository) and additional details such as the host, port, database, schema, and username and password.

Learn more about connecting to Greenplum

More about integrating Greenplum data

Talend has detailed documentation on how to ETL your Greenplum data for a better view of the business.

Connecting to HDFS

Apache HDFS (Hadoop Distributed File Systems) provides a software framework for distributed storage and processing of big data. In combination with tools such as MapReduce, Yarn, and other core modules, HDFS lets organizations build Apache Hadoop clusters of hundreds or thousands of nodes that can handle datasets of terabyte size. A robust ecosystem of other tools can take advantage of data stored in HDFS.

To connect to HDFS, use the Component tab of the tHDFSExist component. Enter the Hadoop distribution and version, the HDFS directory, and name of the file you want to use.

Learn more about connecting to HDFS

Get more from your Greenplum data

Deliver data your organization can trust... Get started today.

Explore Talend's full suite of apps