Manage your Excel data in HDFS with Talend

Virtually every business user knows Microsoft Excel, the spreadsheet component of Microsoft Office. Its ubiquity makes Excel files the most common format for data sharing, and data analysts appreciate Excel's advanced statistics functions and ability to automate actions with macros. Manage Excel data in HDFS with Talend's suite of data integration tools.

Connecting to Excel

To connect to an Excel spreadsheet as either a source or as a destination for your data, use the tFileInputExcel component and specify the file’s location.

Learn more about connecting to Excel

More about integrating Excel data

Talend has detailed documentation on how to ETL your Excel data for a better view of the business.

Connecting to HDFS

Apache HDFS (Hadoop Distributed File Systems) provides a software framework for distributed storage and processing of big data. In combination with tools such as MapReduce, Yarn, and other core modules, HDFS lets organizations build Apache Hadoop clusters of hundreds or thousands of nodes that can handle datasets of terabyte size. A robust ecosystem of other tools can take advantage of data stored in HDFS.

To connect to HDFS, use the Component tab of the tHDFSExist component. Enter the Hadoop distribution and version, the HDFS directory, and name of the file you want to use.

Learn more about connecting to HDFS

Work with your Excel data

Read an Excel file and extract data Use tJavaRow to enter customized code and the tLogRow component to display data in the Run console. How to extract data from specific Excel cells

Get more from your Excel data

Deliver data your organization can trust... Get started today.

Explore Talend's full suite of apps