Course description

This comprehensive course will equip you with the skills to tackle the challenges of big data analytics using the powerful Hadoop and Spark frameworks. You'll learn to process and analyze vast amounts of data, build data pipelines, and extract valuable insights using distributed computing techniques.

Topics Covered:

  • Introduction to Big Data and its challenges
  • The Hadoop ecosystem (HDFS, YARN, MapReduce)
  • Working with Hadoop distributions (e.g., Cloudera, Hortonworks)
  • Data processing with Spark (RDDs, DataFrames, SQL)
  • Building data pipelines with Spark
  • Machine learning with Spark MLlib
  • Real-world applications of big data analytics

What will i learn?

  • Understand the fundamentals of big data and its challenges.
  • Master the Hadoop ecosystem (HDFS, YARN, MapReduce).
  • Gain proficiency in Spark for data processing and analysis (RDDs, DataFrames, SQL).
  • Build data pipelines and perform ETL operations.
  • Apply machine learning techniques to big data using Spark MLlib.
  • Develop skills in data visualization and storytelling.

Requirements

  • A computer with a stable internet connection and sufficient RAM (8GB or more recommended)
  • Basic programming knowledge (Python or Scala preferred)
  • A code editor or an IDE (Integrated Development Environment)
  • (Optional) Access to a Hadoop cluster or a cloud-based Hadoop environment (e.g., AWS EMR, Google Cloud Dataproc)

Frequently asked question

Hadoop is a framework for distributed storage and processing of large datasets. Spark is a faster and more versatile processing engine that can run on top of Hadoop or independently.

Big data analytics is used in various industries, including marketing (customer segmentation, targeted advertising), finance (fraud detection, risk management), healthcare (disease prediction, drug discovery), and more.

A basic understanding of statistics will be helpful, but the course will cover the necessary statistical concepts. The focus is more on practical application and using Hadoop and Spark for data analysis.

Yes, the course includes hands-on exercises and projects where you'll work with Hadoop and Spark to analyze real-world datasets.

Mike Lincoln

£25

£50

Lectures

0

Skill level

Intermediate

Expiry period

Lifetime

Certificate

Yes

Related courses