Hadoop Admin
Job Responsibilities
- Manage the Hadoop distribution on Linux instances, including configuration, capacity planning, expansion, performance tuning and monitoring
- Work with data engineering team to support development and deployment of Spark and Hadoop jobs
- Work with end users to troubleshoot and resolve incidents with data accessibility
- Contribute to the architecture design of the cluster to support growing demands and requirements
- Contribute to planning and implementation of software and hardware upgrades
- Recommend and implement standards and best practices related to cluster administration
- Research and recommend automated approaches to cluster administration
- Responsible for Capacity Planning, Infrastructure Planning based on the workloads and future requirements
- Install Cloudera-Hadoop for different environments (Dev, Model, Production, Disaster Recovery).
- Install and configure Kafka to facilitate real-time streaming applications.
- Provide support and maintenance and its eco-systems include HDFS, Yarn, Hive, Impala, Spark, Kafka, HBase, Cloudera Work Bench, Splunk and Power BI.
- Work with delivery teams for Provisioning of users into Hadoop.
- Implement Hadoop Security like LDAP, Kerberos, Sentry, Cloudera Key Trustee Server and Key Trustee Management Systems.
- Enable Sentry for RBAC (role-based access control) to have a privilege level access to the data in HDFS as per the security policies.
- Enable data encryption at rest and at motion with TLS/SSL to meet the security standards.
- Support OS patching of cluster nodes.
- Design and Implementation of Backup and Disaster Recovery strategy based out of Cloudera BDR utility for Batch applications and Kafka mirror maker for real-time streaming applications.
- Enable the consumers to use the Data in Hive Tables from Power BI desktop as part of the requirement.
- Establish the connection to external clients and Impala, so as to enable the consumer group for an easy migration to Hadoop query engines.
- Work with Control-M enterprise scheduler to run the Jobs in both Hadoop.
- Align with development and architecture teams to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Perform Capacity Planning of Hadoop Cluster .
- Optimize and Performance tuning of the cluster by changing the parameters based on the benchmarking results such as Teragen/Terasort.
- Implement GIT version control basing out of NFS shared drive for Hadoop and also integrate it to the Eclipse IDE.
- Enable Sub-Version (svn) as version control
- Enable connectivity of Power Bi and SAS Grid to Hadoop.
Qualifications
- A minimum of bachelor's degree in computer science or equivalent.
- 5+ years' experience with administering Linux production environment
- 3+ years' experience managing full stack Hadoop distribution (preferably Cloudera), Including monitoring
- 3+ years' experience with implementing and managing Hadoop related security in Linux environments (Kerberos, SSL, Sentry,Encryption)
- Strong knowledge of Yarn configuration in a multi-tenancy environment. Candidate should have experience with Yarn capacity scheduler.
- Strong working knowledge of disaster recovery related to Hadoop platforms
- Working knowledge of automation tools
- 3+ years administration experience with HBase, Spark, Hive, Impala
- Strong written and verbal communication skills
- Excellent analytical and problem-solving skills
- Experience with administering Podium Data
- Must have the ability to identify complex problems and review related information to develop and evaluate options and implement solutions
----------------------------------------------------------------------------------------------------------
Manage large scale Hadoop clusters including design, capacity planning , partner with Ops admin for cluster set up etc
Performance Tuning of the Hadoop cluster & it’s components to meet business SLA in on-prem and cloud environments
Automate monitoring & alerting
Manage the security set up – Kerberos authentication & authorization using Ranger
Performance Tuning on Hive queries and manage Hadoop Filesystems
Experience in upgrading Hive etc
Manage multi-tenant business users in the same cluster
Experience in Hortonworks and Azure/GCP is a Plus
------------------------------------------------------------------------------------
Bachelor’s degree or technical college diploma in an a technical field or equivalent experience
Working experience in Networking (subnets, vLAN, firewall, security, interface configuration, bonds, DNS, proxy, TLS) and Networking tools
Strong working experience and understanding of Linux based operating systems
Strong working experience Linux based systems authentication and authorization (LDAP, SSSD, Kerberos, PAM)
Proven working experience in installing, configuring and troubleshooting UNIX /Linux based environments.
Proficiency in shell scripting and Python
Knowledge of Hadoop and the Hadoop ecosystem
Familiarity with configuration automation tools (Chef, Puppet, Ansible, Salt Stack, etc. (We use Chef))
Familiarity with continuous integration and delivery
Strong communication skills
Absolutely must be a team player (DevOps)
Boundless passion for technology andd a willingness to learn and teach
Strong work ethic, attention to detail and problem solving skills
In addition to the required skills above, the following qualifications are highly desired:
DevOps focused engineer with experience in LEAN principles
Openstack or other Cloud Management Platform experience
Experience working within an agile framework
Familiarity with encryption at rest and transit.
This is an excellent contract opportunity where your experience will be recognized and acknowledged. The Big Data Infrastructure team is a small group of highly-focused, hard-working professionals who love what they do. Your Big Data / Hadoop skills will be respected and embraced by the team, and at the end of this assignment, you will walk out with your head high and with enhanced Big Data experience that will have new clients competing for your talents!
--------------------------------------------------------------------------------------
Supporting Hadoop upgrade, Install, Configure and Troubleshoot Hortonworks - HDP 2.6.4.
-------------------------------------------------------------------------------
Ability to lead and implement solutions with AWS Virtual Private Cloud, Elastic Compute Cloud, AWS Cloud Formation, Auto Scaling, AWS Simple Storage Service, Route 53 and other AWS products.
Build new ETL pipelines using Spark or SQL-based engines on Hadoop to load data into the data lake, transform it, and load it to Redshift and other target platforms.
Good experience with the Hadoop stack (MapReduce, Spark, Sqoop, Pig, Hive, Impala, Python) and Airflow
Knowledge of NoSQL platforms such as Hbase, Mongo or Cassandra.
4+ years in Hands on in Unix /Linux Environment
4+ Years in Hadoop (Cloudera/Hortonworks/Mapr)
4+ Year with Programming language( SQL, Python, Java)
Hive, Sqoop, Spark, Flume, Kafka, Java
Experience in Source control tools like BitBucket and DevOps
Experience lead teams, architecting and designing
-------------------------------------------------------------------------------
Hadoop ecosystem experience.
4+ years of admin hands-on experience in all or most of the below components.
Zookeeper
HBASE
HDFS
Map Reduce
Hive
Oozie
Flume
Yarn
Spark
Strong knowledge of Linux systems (RHEL/CentOS/OEL 5.x) 5+ Years
Able to create and manage big data pipeline, including Pig/MapReduce/hive jobs 4+ years
Solid troubleshooting abilities and able to work with a team to fix large production issues 2+ years. Preferences
Ability to learn new technologies quickly (including vendor APIs)
Strong verbal and written skills
Team oriented individual who can work well with others
Help in creating effective reports/dashboards for both operations and high level executives
Write documentation to assist team members
BS degree in Computer Science.
Streams (Kafka, Storm) ,Hortonworks HDP ,Puppet/Chef – Big Plus.
--------------------------------------------------------------------------------------
Install and configure Hortonworks clusterso
Apply proper architecture guidelines to ensure highly available services
Plan and execute major platform software and operating system upgrades and maintenance across physical environments
Develop and automate processes for maintenance of the environment
Implement security measures for all aspects of the cluster (SSL, disk encryption, role-based access via Apache Ranger policies)
Ensure proper resource utilization between the different development teams and processes
Design and implement a toolset that simplifies provisioning and support of a large cluster environment
Review performance stats and query execution/explain plans;
Recommend changes for tuning Apache Hive queries
Create and maintain detailed, up-to-date technical documentation
Responsible for implementation and ongoing administration of Hadoop infrastructure.
Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
Working knowledge of core Hadoop services including HDFS, Sentry, HBase, Impala, Hue, Spark, Hive, Kafka, YARN and ZooKeepero Monitor Hadoop cluster connectivity and securityo Manage and review Hadoop log files.
File system management and monitoring.
Collaborating with application teams to install the operating system and Hadoop updates, patches, version upgrades when required.
Point of Contact for Vendor escalationo Collaborating with application teams to install the operating system and Hadoop updates, patches, version upgrades when required.
------------------------------------------------------------------------------------------
Experience working in a Government mission Data Exploitation environment (e.g., acquiring data, storing data, processing data, analyzing data, visualizing data, turning data into intelligence products, disseminating data, and knowledge management).
Experience supporting a cloud computing environment (e.g., private, hybrid, and/or public).
Experience leading and managing cross-functional technology team.
Experience leading major programming/scripting languages like Java, Linux, C++, PHP, Ruby, Python and/or R.
Experience working with ETL tools such as Informatica, Talend, and/or Pentaho
Experience designing solutions for multiple large data warehouses with cluster and parallel architecture as well as high-scale or distributed platforms.
Experience in the research and development of applications/services in disciplines such as: Natural Language Processing, Machine Learning, Conceptual Modeling, Statistical Analysis, Predictive Modeling, and Hypothesis testing.
------------------------------------------------------------------------------------------
Provide Tier 2 production support for on premise Hadoop cluster
Experience in ETL (Syncsort DMX-h, Informatica, Data Warehouse), mainframe skills, JCL
Experience in a range of big data architectures and frameworks including Hadoop ecosystem, Java MapReduce, Pig, Hive, Spark, Impala etc.
Experience with Hadoop Administration (Cloudera Framework).
Experience with Linux Administration (Unix and Red Hat)
Experience in Big Data Storage and File System Design.
Experience in performing troubleshooting procedures and designing resolution scripts
Experience with Mainframe development including Java Batch development, Architectural Design/Analysis and Database development.
Experience in analytic programming, data discovery, querying databases/data warehouses and data analysis.
Experience in data ingestion technologies with reference to relational databases (i.e. Teradata, DB2, Oracle, SQL etc)
Experience with advanced SQL query writing and data retrieval using “Big Data” Analytics.
Experience with enterprise scale application architectures.
Strong Communication Skills (Written and Verbal)
-----------------------------------------------------------------------------------------
Roles and responsibilities:
Hadoop cluster set up, performance fine-tuning, monitoring and administration
Hands-on administration experience with distribution like Cloudera and Hortonworks.
Required Experience, Skills and Qualifications
Knowledge on Distributed Hadoop System on Linux / Technical Knowledge on YARN, MapReduce, HDFS, HBase, Zookeeper, Pig and Hive / Hands-on Experience as a Linux Sys Admin
Knowledge on Spark and Kafka
Knowledge for Relational database like Oracle, Mysql, Sqlserver.
Nice to have experience with AWS and Azure.
------------------------------------------------------------------------------------------
Strong experience with Hortonworks& Cloudera Administration (prefer Hartonworks)
Experience working with File formats – AVROand PARQUET
Experience with public cloud, as such as Microsoft Azure/ AWS
Strong experience with Spark,Spark Streamingand Hive
Experience configuring the clustersand performance Tuning
Experience in Kerberos is MUST - Authentication
Performance Tuning HIVEand SPARKJOBS is MUST
Debug the performance issues of the jobs which are failingand aborting due to memory issues or others
--------------------------------------------------------------------------------------------------
Responsible for implementation and ongoing administration of Hadoop infrastructure
Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments
Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, HBase and Yarn access for the new users
Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise, etc.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines
Screen Hadoop cluster job performances and capacity planning
Monitor Hadoop cluster connectivity and security
Manage and review Hadoop log files
File system management and monitoring
HDFS support and maintenance
Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability
Collaborating with application teams to perform Hadoop updates, patches, version upgrades when required
Work with Vendor support teams on support tasks
General operational expertise such as good troubleshooting skills, understanding of system's capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks
The most essential requirements are: They should be able to deploy Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name-node high availability, schedule and configure it and take backups
Solid Understanding on premise and Cloud network architectures
Additional Hadoop skills like Sentry, Spark, Kafka, Oozie, etc
Preferred Skills:
Experience with Impala
Kerberos administration skills
Experience with Cloudera distribution
Familiarity with open source configuration management and deployment tools such as Puppet or Chef and Shell scripting
Familiarity with DevOps tools like Ansible, Nagios and HAProxy
Knowledge of Troubleshooting Core Java Applications is a plus
Proven track record in incorporating latest operational tools into big data platforms
What You Need for this Position:
Bachelor's in Computer Science or related discipline
5 years of Linux system administration
3 years of Hadoop infrastructure administration
HortonWorks & MapR distributions will also be considered
-----------------------------------------------------------------------------------------------------
Responsibilities and Duties
Strong experience with Hortonworks& Cloudera Administration (prefer Hartonworks)
Experience working with File formats – AVROand PARQUET
Experience with public cloud, as such as Microsoft Azure/ AWS
Strong experience with Spark,Spark Streamingand Hive
Experience configuring the clustersand performance Tuning
Experience in Kerberos is MUST - Authentication
Performance Tuning HIVEand SPARKJOBS is MUST
Debug the performance issues of the jobs which are failingand aborting due to memory issues or others
-------------------------------------------------------------------------------------------------------
hands-on administration experience on Large Distributed Hadoop System on Linux / Technical Knowledge on YARN, MapReduce, HDFS, HBase, Zookeeper, Pig and Hive
Experience in Hadoop cluster set up, performance fine-tuning, monitoring and administration
Hadoop Platform Support Experience in optimization, capacity planning & architecture of a large multi-tenant cluster.
Expertise in at least one commercial distribution of Hadoop.
Preferably Cloudera Experience with multiple open source tool sets in the Big Data space
Experience in tool Integration, automation, configuration management in GIT
Must have deep understanding of distributed systems
-----------------------------------------------------------------------------------------------------
What You Will Be Doing:
Responsible for implementation and ongoing administration of Hadoop infrastructure
Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments
Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, HBase and Yarn access for the new users
Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise, etc.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines
Screen Hadoop cluster job performances and capacity planning
Monitor Hadoop cluster connectivity and security
Manage and review Hadoop log files
File system management and monitoring
HDFS support and maintenance
Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability
Collaborating with application teams to perform Hadoop updates, patches, version upgrades when required
Work with Vendor support teams on support tasks
General operational expertise such as good troubleshooting skills, understanding of system's capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks
The most essential requirements are: They should be able to deploy Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name-node high availability, schedule and configure it and take backups
Solid Understanding on premise and Cloud network architectures
Additional Hadoop skills like Sentry, Spark, Kafka, Oozie, etc
Preferred Skills:
Experience with Impala
Kerberos administration skills
Experience with Cloudera distribution
Familiarity with open source configuration management and deployment tools such as Puppet or Chef and Shell scripting
Familiarity with DevOps tools like Ansible, Nagios and HAProxy
Knowledge of Troubleshooting Core Java Applications is a plus
Proven track record in incorporating latest operational tools into big data platforms
What You Need for this Position:
Bachelor's in Computer Science or related discipline
5 years of Linux system administration
3 years of Hadoop infrastructure administration
HortonWorks & MapR distributions will also be considered
------------------------------------------------------------------------------------------------------
As a key member of our fast-paced IT team, you will work closely with Application Development team in DevOps model in capacity of Hadoop Administrator. Primary responsibilities includes performing routine Hadoop Admin tasks such planning, provisioning and maintaining Hadoop cluster environment. Monitoring and troubleshooting incidents, enabling security policies, capacity planning, performance tuning and upgrades. Managing data storage and compute resources for HDFS, HBase, Reporting cluster, YARN, QUARTZ and other components in the Hadoop ecosystem.
Responsibilities
1. Contribute to the design, implementation, and support multiple Cloudera Distributions of Apache Hadoop (CDH) in support of full-lifecycle application development, analytics, and data management.
2. Advise manager of platform risks, issues, and concerns.
3. Provide Production Support during and after business hours, as needed.
4. Work directly with external vendors to resolve issues & perform technical tasks.
5. Provide subject matter expertise on the capabilities and use of cluster components.
6. Project and recommend periodic incremental and organic growth based on ongoing and forecasted workloads.
7. Collaborates with other departments on requirements, design, standards, and architecture of applications.
8. Work closely with infrastructure, network, database, business intelligence and application teams to ensure solutions meet requirements.
9. Contribute to the design, implementation, and support of software and hardware environments and configurations that yield non-functional requirements, such as security, accessibility, auditability, compliance, data retention, usability, and performance.
10. Install, configure, monitor, tune, and troubleshoot all components of the CDH environments, including but not limited to, Cloudera Manager, Cloudera Management Services, HDFS, YARN, Zookeeper, Hive, Spark, Hue, Kudu, Impala, HBase, Key Management Server, Kafka, Flume, Solr, SSL, Sqoop, and Sentry.
Skills:
1. Performing Backup and recovery using Cloudera BDR tool.
2. Effective oral, written, and interpersonal communication skills.
3. Experience working with RDBMS such as Oracle, SQL Server desired.
4. Proven ability to handle multiple tasks and projects simultaneously.
5. Excellent problem-solving skills, core java application troubleshooting.
6. Securing Hadoop clusters using Kerberos LDAP/AD integration and Encryption.
7. Demonstrated ability to establish priorities, organize and plan work to satisfy established timeframes.
8. 3-5 years’ experience with administration of Hortonworks and Cloudera (5.9.x & 5.10.x) distributions of Hadoop required.
9. 3+ years’ experience with OS administration around memory, CPU, system capacity planning, networks and troubleshooting skills.
10. Implement security measures for all aspects of the cluster (SSL, disk encryption, role-based access via Apache Ranger policies)
--------------------------------------------------------------------------------------------
Provide Tier 2 production support for on premise Hadoop cluster
Experience in ETL (Syncsort DMX-h, Informatica, Data Warehouse), mainframe skills, JCL
Experience in a range of big data architectures and frameworks including Hadoop ecosystem, Java MapReduce, Pig, Hive, Spark, Impala etc.
Experience with Hadoop Administration (Cloudera Framework).
Experience with Linux Administration (Unix and Red Hat)
Experience in Big Data Storage and File System Design.
Experience in performing troubleshooting procedures and designing resolution scripts
Experience with Mainframe development including Java Batch development, Architectural Design/Analysis and Database development.
Experience in analytic programming, data discovery, querying databases/data warehouses and data analysis.
Experience in data ingestion technologies with reference to relational databases (i.e. Teradata, DB2, Oracle, SQL etc)
Experience with advanced SQL query writing and data retrieval using “Big Data” Analytics.
Experience with enterprise scale application architectures.
Strong Communication Skills (Written and Verbal)
Comments
Post a Comment