Big Data Admin

AWS Online Training

Course Introduction

Big data career prospects are on the rise and Hadoop is quickly becoming a must-know among system administrators and IT professionals.

This course will give you the aptitudes and skills to excel in the Big Data Analytics industry. By opting for this course, you will know the different ways of working with the versatile and adaptable frameworks based on the Hadoop ecosystem including:

  • Hadoop installation and configuration
  • Cluster management with Flume, Sqoop, Hive, Pig, Impala, etc.
  • Implementation of big data with exceptional speed, scale, and security.
AWS Online Training

Learning Domain

  • The Case for Apache Hadoop
  • Hadoop Cluster Installation
  • The Hadoop Distributed File System (HDFS)
  • MapReduce and Spark on YARN
  • Hadoop Configuration and Daemon Logs
  • Planning Your Hadoop Cluster
  • Getting Data Into HDFS
  • Installing and Configuring Hive, Impala, and Pig
  • Hadoop Clients Including Hue
  • Advanced Cluster Configuration
  • Hadoop Security
  • Managing Resources
  • Cluster Maintenance
  • Cluster Monitoring and Troubleshooting
AWS Online Class

Upcoming Class Schedule

Start Date Class Timing Duration Class Mode Fees
XXX 6pm To 8pm 40Hrs XXX XXX


Course Structure

  • Overview of Big Data Technologies and its role in Analytics
  • Big Data challenges & solutions
  • Data Science vs Data Engineering
  • Job Roles, Skills & Tools
Setting up Development Environment
  • Setting up Development environment on User's laptop to be able to develop and execute programs
  • Setting up Eclipse ( Basics of Eclipse like, import, create project, add JARs) to understand basics of Eclipse for Map Reduce and Spark development
  • Installing Maven & Gradle to understand building tools
  • Installing Putty, FileZilla/WinSCP to get ready to access EduPristine's Big Data Cloud
  • Case Study: XYZ Telecom need to set up approprioate directory Structure along with permissions on various files on Linux file system
  • Setting up, accessing and verifying Linux server access over SSH
  • Transferring files over FTP or SFTP
  • Creating directory structure and Setting up permissions
  • Understanding File name pattern and move using regular expressions
  • Changing file owners, permissions
  • Reviewing mock file generator utility written in Shell Script, enhancing it to be more useful
  • Case Study: Developing a simulator to generate mock data using Python
  • Understand your domain requirement, describe need of required fields and possible values, file format etc
  • Preparing configuration file that can be changed to fit any requirement
  • Developing python script to generate mock data in configuration file

Case Study: Design and Develop Phone Book in Java

  • Identifying Classes and Methods for Phone Book
  • Implementing design into Java Code using Eclipse
  • Compiling and Executing Java Program
  • Enhancing the code with each learnings, like Inheritence, Method overloading
  • Further enhancing the code to initialize PhoneBook from a Text File by using Java file reading

Case Study: Handling huge data set in HDFS to make it accessible to right user and remove non-functional requirements like backups, cost, high availability etc.

  • Understanding the problem statement and challenges persisting to such large data to perceive the need of Distributed File System
  • Understanding HDFS architecture to solve problems
  • Understanding configuration and creating directory structure to get a solution of the given problem statement
  • setup appropriate permissions to secure data for appropriate users

Case Study: Developing automation tool for HDFS file management

  • Setting up Java Development with HDFS libraries to use HDFS Java APIs
  • Coding to develope menu driven HDFS file management utility and schedule to run for file management in HDFS cluster


Case Study: Develop automation utility to migrate huge RDBMS warehouse implemented in MySQL to Hadoop cluster

  • Creating and loading data into RDBMS table to understand RDBMS setup
  • Preparing data to experiment with Sqoop imports
  • Importing using Sqoop Command in HDFS file system to understand simple imports
  • Importing using Sqoop command in Hive table to import data into Hive partitioned table and perform ETL
  • Exporting using Sqoop from Hive/ HDFS to RDBM to store the output of Hive ETL into RDBMS
  • Wrapping Sqoop commands into Unix Shell Script To be able to build and use automated utility for day to day use


Case Study: Processing 4G usage data of a Telecom Operator to find out potential customers for various promotional offers

  • Cleaning data, ETL and Aggregation
  • Exploring data set using known tools like linux commands to understand the nature of data
  • Setting up Eclipse project, maven dependencies to add required Map Reduce Libraries
  • Coding, packaging and deploying project on hadoop cluster to understand how to deploy/ run map reduce on Hadoop Cluster

Case Study: Process a structured data set to find some insights

  • Finding out per driver total miles and hours driven
  • Creating Table, Loading Data, Selecting Query to load, query and cleaning of data
  • Which driver has driven maximum & minimum miles
  • Joining Tables, Saving Query results to table to explore and use right type of table type, partition schema, buckets
  • Discussing optimum file format for hive table
  • Using right file format, type of table, partition scheme to optimize query performance
  • Using UDFs to reuse domain specific implementations

Case Study: Perform ETL processing on Data Set to find some insights

  • Loading and exploring Movie - 100K data set to load data set, explore it and associate schema to it
  • Using grunt, Loading data set, defining schema
  • Finding Simple Statistic from given Data Set to clean up the data
  • Filtering and modifying data schema
  • Finding gender distribution in users
  • Aggregrating and looping
  • Finding top 25 movies by rating, joining data sets and saving to HDFS to perform aggregration
  • Dumping, Storing, joining, sorting
  • Filtering function for complex condition to reuse domain specific functionalities & avoid rewriting code
  • Using UDFs

Case Study: Build a model to predict production error/ failure (huge servers - applications/ software) with good speed by using computation power efficiently while considering processor challenges

  • Loading and performing pre-processing to convert unstructured data to some structured data format
  • Cleaning data, filtering out bad records, converting data to more usable format
  • Aggregating data based on Response Code to find out server' performace from logs
  • Filtering, Joining and aggregating data to find top 20 Frequent Hosts that generates errors

Spark Project

Case Study: Build a model (using Python) to predict production error/ failure (huge servers - applications/ software) with good speed by using computation power efficiently while considering processor challenges

  • Loading and performing pre-processing to convert unstructured data to some structured data format
  • Cleaning data, filtering out bad records, converting data to more usable format
  • Aggregating data based on Response Code to find out server' performace from logs
  • Filtering, Joining and aggregating data to find top 20 Frequent Hosts that generates errors

Case Study: Setting up Data processing pipeline to work as per schedule in Hadoop Eco System comprising of multiple components like sqoop job, hive scripts, pig scripts, spark jobs etc.

  • Setting up Oozie workflow to tigger a script, then Sqoop Job followed by Hive Job
  • Executing workflow to run complete ETL pipeline

Case Study: Find out top 10 customers by expenditure, top 10 most buying brands, and monthly sales from data stored in Hbase which is in Key value pair

  • Designing Hbase Table Schema to model table structure, decide families in table as per data
  • Deciding families in table as per data
  • Bulk Loading & Programatically Loading data using Java APIs to populate data into Hbase table
  • Querying and Showing data on UI to integret Hbase with UI/Reporting

Project: ETL processing of retail logs

  • To find demand of a given product
  • Trend and seasonality of a product
  • Understand performance of the chain

Project: Creating 360 degree view (past, present and future) of the customer for a retail company - avoiding repetition or re-keying of information, to view customer history, establishing context and initiating desired actions

  • Explorating and checking basic quality of data to understand data and need for filtering/pre-processing of data
  • Loading data into RDBMS table to simulate real world scenarios where data is persisted in RDBMS table
  • Developing & executing Sqoop Job to ingest Data in Hadoop Cluster to perfrom further actions
  • Developing & executing Pig script to perform required ETL processing on ingested Data
  • Developing & executing Hive Queries to get reports out of processed data

Project: Twitter Sentiment Analytics - Collect and real time data (JSON format), and perform sentiment analysis on continuously flowing streaming data

  • Creating and setting up Twitter App to generate Twitter Auth tokens to access twitter by APIs
  • Building Flume Source to pull twits from Twitter
  • Setting up Flume agent with Kafka sink to persist twits into Distributed Kafka topic
  • Setting up Flume agent with Kafka Source and HDFS as Sink to backup twits on HDFS for batch processing
  • Building & Executing Spark Job to perform Sentiment Analytics in real time on each incoming twits
  • Creating Hive table on Twits and perform basic queries to understand Hive Serdes and dealing with Semi-structured data in Hive
  • Writing and Executing Impala Queries to understand and overcome Impala's limitation of not being able to use Hive Serdes

Project: Machine Learning with TensorFlow - Build a solution which can recognize images on search words and can run on distributed computing like Hadoop/ Spark etc. for a photo storage company

  • Setting up development environment to be able to use TensorFlow java APIs
  • Developing Java Code using TensorFlow Inception Model using Java API to develop image recognition program in Java
  • Developing Python code to train TensorFlow model to learn to training TensorFlow model, for a domain specific problem

Project: Developing a Chat-bot to offer an artificially intelligent customer help desk for an insurance company

  • Identify client's most frequently asked questions and answers
  • Develop a training data set
  • Build TensorFlow NLP model which can understand questions
  • Train the model
  • Implement & Run the model

Real Time Analytics, Unstructured Data Ingestion

An open source database that uses a document-oriented data model

Exam pattern, CV preparation & Imp topics

Benefits of Learning Big Data Admin

  • Sound understanding of big data can change your career and help you reach greater heights in the professional hierarchy.
  • Big data certification can play a crucial role in establishing your skills in the industry. It can also give you a noticeable edge over others coming from the same educational background.
  • You will get better pay packages.
  • A certification can also authenticate your hands-on experience of the technology.


149 Total number of reviews
4.77 Aggregate review score
83% Course completion rate


Our Popular Student Base: Delhi, Mumbai, Bangalore, Pune, Noida, Gurgaon, Chandigarh, Kolkata area.



Our convenience learning system makes it possible for anyone to access the recorded sessions at any time and place of their convenience. And in case you are struck with a doubt in need of further clarification, we have a dedicated team in place to help you out. So, as you can see, a missed session won’t be a problem.

We offer our live sessions to only a limited number of participants to maintain our quality standards. Therefore, we don’t have any provision for an individual to participate without enrollment. However, we may be able to provide him/her with a few sample recordings of our live classes provided s/he is able to contact us directly for the matter.

We would recommend you to have a 2 MBps speed at the very least for an uninterrupted live session on DevClass.

All our instructors have a minimum of ten to fifteen years of experience in their relevant fields. They are industry experts and are also further trained by DevClass to provide a seamless learning experience to the participants.

Yes, we do. Based on a few factors like your performance in our exams, session attendance, etc. we will provide you with a certificate upon the completion of the course.