Hadoop (Cloudera) Administrator
Location: Buffalo, NY
Duration: Contract to Hire, Direct Hire
Pay Rate: TBD
•Responsible for Capacity Planning, Infrastructure Planning based on the workloads and future requirements
•Interact with the business users, Enterprise Architects and Project Managers to gather the requirements.
•Install Cloudera-Hadoop from scratch for different environments (Dev, Test, Cert, Production, Disaster Recovery).
•Install and configure Kafka to facilitate real-time streaming applications.
•Provide support and maintenance and its eco-systems include HDFS, Yarn, Hive, Impala, Spark, Kafka, HBase, Informatica BDB, and Tableau.
•Work on issues and provide EBF’s (emergency bug fixes) for Informatica to meet the SLA.
•Work with delivery teams for Provisioning of users into Hadoop.
•Implement Hadoop Security like Kerberos, Cloudera Key Trustee Server and Key Trustee Management Systems.
•Enable Sentry for RBAC (role-based access control) to have a privilege level access to the data in HDFS as per the security policies.
•Enable data encryption at rest and at motion with TLS/SSL to meet the security standards.
•Performe upgrades to Cloudera Manager, CDH along with support for Linux Server Patching from RHEL 7.1 to 7.4 (maipo).
•Install of Informatica Big-Data-Management edition from scratch at development, Test, Certification and Production environments.
•Perform upgrades to Informatica BDM from time to time as needed.
•Establish of connections between Hadoop and Informatica BDM to perform Dynamic Mappings and Hive updates.
•Enable Informatica as a data ingestion tool for Hadoop by creating and testing the connections from different databases like Mysql, Microsoft Sql, Oracle, Hive, Hdfs, Teradata.
•Design and Implementation of Backup and Disaster Recovery strategy based out of Cloudera BDR utility for Batch applications and Kafka mirror maker for real-time streaming applications.
•Enable the consumers to use the Data in Hive Tables from Tableau desktop as part of the requirement.
•Establish the connection between Teradata Studio Express and Impala, so as to enable the consumer group for an easy migration to Hadoop query engines.
•Integrate CA-7 enterprise scheduler to run the Jobs in both Hadoop/Informatica.
•Align with development and architecture teams to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
•Perform Capacity Planning of Informatica Big Data Management along with implementation design for grid execution.
•Optimize and Performance tuning of the cluster by changing the parameters based on the benchmarking results such as Teragen/Terasort.
•Implement GIT version control basing out of NFS shared drive for Hadoop and also integrate it to the Eclipse IDE.
•Enable Sub-Version (svn) as version control
•Enable connectivty of Tableau and SAS Grid to Hadoop
•A minimum of bachelor’s degree in computer science or equivalent.
•Cloudrea Hadoop(CDH), Cloudera Manager, Informatica Bigdata Edition(BDM), HDFS, Yarn, MapReduce, Hive, Impala, KUDU, Sqoop, Spark, Kafka, HBase, Teradata Studio Express, Teradata, Tableau, Kerberos, Active Directory, Sentry, TLS/SSL, Linux/RHEL, Unix Windows, SBT, Maven, Jenkins, Oracle, MS SQL Server, Shell Scripting, Eclipse IDE, Git, SVN
•Must have strong problem-solving and analytical skills
•Must have the ability to identify complex problems and review related information to develop and evaluate options and implement solutions.
Proud to be an EEO Employer.