Course : Big Data Analytics with Python

Big Data Analytics with Python

Download in PDF format Share this course by email 2


Big Bata Analytics relies on proficiency in fundamental data exploration techniques: Descriptive, predictive, and exploratory statistics. This hands-on course will present methods such as regressions and PCAs and will teach you how to implement them with Python.


Inter
In-house
Custom

Practical course in person or remote class

Ref. BDA
Price : 2860 € E.T.
  4d - 28h00




Big Bata Analytics relies on proficiency in fundamental data exploration techniques: Descriptive, predictive, and exploratory statistics. This hands-on course will present methods such as regressions and PCAs and will teach you how to implement them with Python.

Teaching objectives
At the end of the training, the participant will be able to:
  • Understanding the principle of statistical modeling
  • Choosing regression and classification depending on data type
  • Evaluating an algorithm’s predictive performance
  • Creating selections and classifications in large volumes of data to reveal trends

Intended audience
Infodesk managers (Datamining, Marketing, Quality, etc.), database business managers and users.

Prerequisites
Basic knowledge of statistics, or have taken the course “Statistics: Proficiency in fundamentals” (code STA). Basic knowledge of Python.

Course schedule

Introduction to modeling

  • Introduction to the Python language.
  • Introduction to the Jupiter Notebook software.
  • Steps for building a model.
  • Supervised and unsupervised algorithms.
  • Choosing between regression and classification.
Hands-on work
Installing Python 3, Anaconda, and Jupiter Notebook.

Model evaluation procedures

  • Techniques for resampling in training, validation and testing sets.
  • Learning data representativeness test.
  • Predictive model performance measurements.
  • Confusion and cost matrix and AUC-ROC curve.
Hands-on work
Setting up data set sampling. Conducting evaluation tests on multiple provided models.

Supervised algorithms.

  • The principle of univariate linear regression.
  • Multivariate regression.
  • Polynomial regression.
  • Regularized regression.
  • Naive Bayes.
  • Logistic regression.
Hands-on work
Implementing regressions and classifications on multiple data types.

Unsupervised algorithms

  • Hierarchical clustering.
  • Non-hierarchical clustering.
  • Mixed approaches.
Hands-on work
Handling unsupervised clusters in multiple datasets.

Component analysis

  • Principal component analysis.
  • Correspondence analysis.
  • Multiple correspondence analysis.
  • Factor analysis for mixed data.
  • Hierarchical classification of principal components.
Hands-on work
Reducing the number of variables and identifying underlying factors of dimensions associated with significant variability.

Text data analysis

  • Collecting and preprocessing text data.
  • Extracting primary entities, named entities, and reference resolution.
  • Grammatical tagging, syntactical analysis, semantic analysis.
  • Lemmatization.
  • Text vectorization.
  • TF-IDF weighting.
  • Word2Vec.
Hands-on work
Explore the contents of a text base using latent semantic analysis.


Practical details
Hands-on work
Developing/conducting analysis in Python, with the modules pandas, NumPy, SciPy, MatPlotLib, seaborn, scikit-learn, and statsmodels.

Customer reviews
4,5 / 5
Customer reviews are based on end-of-course evaluations. The score is calculated from all evaluations within the past year. Only reviews with a textual comment are displayed.


Dates and locations
Select your location or opt for the remote class then choose your date.
Remote class