Portfolio
Appendix. Portfolio (English)
Portfolio dr Bob van Limburg PhD, MSc, MA
Date: 11th of November 2023
Portfolio Projects for Data Scientist and machine learning engineer.
I.Exploratory Data Analysis (EDA):
Analysis of car sales in the Netherlands and Europe.
Methodology: Analyze and find significant insights from a clients dataset using Google Data Studio, Tableau and python seaborn and matplotlib. Objective was to make data visualizations and dashboards showing the concentrations of car owners in order to advise a dealer network around these owners concentrations.
Analysis and find significant insights from a clients dataset of customer lifestyles.
Methodology: Analyze and find significant insights from a clients dataset using Google Data Studio, Tableau and python seaborn and matplotlib. Aim was to analyze which customer groups were most interesting for advising and planning new lifestyle parks in the Netherlands and in Belgium. Visualisations, especially done with pythons seaborn and matplotlib libraries were shown. And presented to the client.
II. Predictive Modeling with Linear Regression:
Predict us house prices.
Methodology: using a dataset from a public library with a defined goal variable, property prices.. Using python packages such as Scikit-Learn, create a basic linear regression model. Document and visualize the outcomes of my model's performances using matplotlib and seaborn. Supervised learning.
Explaining company profits.
Methodology: using a customers dataset determining what factors are crucial at explaining and later on, predicting companies profit rates.
Feature selection and feature cleaning.
III.Intermediate Portfolio Projects. Machine learning classification:
Classify students applying for a Dutch university of applied sciences.
Methodology. Train a machine learning model to categorize data into distinct groups. Using a client's dataset and a defined goal variable, students study success, predicting the probability which students are expected to finish the study with the qualification of MEd. Using python packages such as Scikit-Learn and numpy, create a basic linear regression mode turned into a classification model giving probabilities of study success. The statistical technique used was Log Linear regression analysis. Document and visualize the outcomes of the model's performance using python matplotlib.
Accountants' creativity finding fraud.
Methodology. Train a machine learning model to categorize data into distinct groups. Determine what background variables are discriminating in classifying fraud by clients. Dataset used: data from a large Accountants company in the Netherlands. Packages used are Sas and spss and python packages such as Scikit-Learn and numpy, create a basic linear regression mode turned into a classification model giving probabilities of detecting fraud.
Is it a dog or is it a cat?
Methodology. Train a machine learning model to categorize data into distinct groups, dogs and cats. Train a model using Scikit-Learn to determine whether photographs contain cats or dogs. Image Classification. Dataset: Cats and dogs images.
Machine Learning Clustering:
Methodology: Machine learning is often used to find groupings of similar data points. Customers, for example, may be clustered based on their purchasing history, or items could be clustered based on their attributes.
Dataset: Online Retail Data.
Scikit-Learn Clustering
Principal Component Analysis. Support vector machine.
IV. Advanced Portfolio Projects
Portfolio Projects: neural networks.
Methodology: using the same dataset from a public library as used before, with a defined goal variable, property prices. Using python packages such as Pytorch, creating a deep learning linear regression model. Document and visualize the outcomes of the model's performances using matplotlib and seaborn. Supervised learning.
Explaining company profits.
Methodology: using a customers dataset determining what factors are crucial at explaining and later on, predicting companies profit rates. Package used is Pytorch, deep learning.
Natural language processing (NLP).
Methodology: Create an NLP model to handle tasks like question answering. I have built a chatbot in python to achieve this. Dataset: DailyMail Dataset
Deep learning:
Methodology: Create a deep learning model to handle tasks like image identification. I used the cats and dogs again.