The Tidal Workload Automation Sqoop Adapter provides easy import and export of data from structured data stores such as relational databases and enterprise data warehouses. Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. Sqoop Adapter allows users to automate the tasks carried out by Sqoop.
The Sqoop Adapter allows for the definition of the following job tasks:
Hadoop MapReduce is a software framework for writing applications that process large amounts of data (multi- terabyte data-sets) in-parallel on large clusters (up to thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. A Tidal Workload Automation MapReduce Adapter job divides the input data set into independent chunks that are processed by the map tasks in parallel. The framework sorts the map’s outputs, which are then input to the reduce tasks. Typically, both the input and output of the job are stored in a file- system. The framework schedules tasks, monitors them, and re-executes failed tasks. Minimally, applications specify the input/output locations and supply map and reduce functions via implementations of appropriate interfaces and/or abstract-classes. These, and other job parameters, comprise the job configuration. The Hadoop job client then submits the job (jar/executable etc.) and configuration to YARN. The client then assumes the following responsibilities:
The MapReduce Adapter serves as the job client to automate the execution of MapReduce jobs as part of a Tidal Workload Automation managed process. The Adapter uses the Apache Hadoop API to submit and monitor MapReduce jobs with full scheduling capabilities and parameter support. As a platform independent solution, the Adapter can run on any platform where the Tidal master runs.
The MapReduce Adapter provides real-time information on the execution of MapReduce job as it is running.
Figure 11 Jobs detail for a Maps Reduce program.
The Tidal Workload Automation Hive Adapter provides the automation of HiveQL commands as part of the cross-platform process organization between Tidal Workload Automation and the Tidal Hadoop Cluster. The Adapter is designed using the same user interface approach as other Tidal Workload Automation adapter jobs, seamlessly integrating Hadoop Hive data management into existing operation processes.
The Hive Adapter allows you to access and manage data stored in the Hadoop Distributed File System (HDFS™) using Hive’s query language, HiveQL. HiveQL syntax is similar to SQL standard syntax. The Have Adapter, in conjunction with Tidal Workload Automation, can be used to define, launch, control, and monitor HiveQL commands submitted to Hive via JDBC on a scheduled basis. The Adapter integrates seamlessly in an enterprise scheduling environment.
The Hive adapter includes the following features:
The Tidal Workload Automation HDFS Data Mover Linux Agent helps to manage file transfers in and out of the Hadoop file system.