Pedro Mota
Information Management Portfolio
Goal: Building a predictive model that produces the highest profit for the next direct marketing campaign of the company. Achieved by cherry picking the customers that are most likely to purchase the offer
Data: Groups were given a dataset resultant from a pilot campaign, containing variables related with client information, their acquisitions of company products and whether the clients accepted the pilot campaign.
Performed functions: Data access, exploration and understanding; Data preparation; Modelling; Assessment.
Technologies: SAS Enterprise Miner.
Date: May - June 2020
Dataset Variables
Variable Worth
Model Evaluation
Goal: Building a predictive model that answers the question “What people are more likely to quit their position at the company?” using the data accessible from the employee dataset provided.
Data: Groups were given a dataset containing employee information and their churn risk as “low”, “medium” or “high”.
Performed functions: Data access, exploration and understanding; Data preparation; Modelling; Assessment.
Technologies: Python, Jupyter Notebooks, NumPy, pandas, ML Algorithms.
Date: February - May 2020
Dataset Variables
Variable Worth
Model Evaluation - Best scores obtained with Random Forest (RF)
Goals: Building a predictive model to determine which clients are at risk to churn. Building a business case around possible actions to help mitigate the customer churn increase.
Data: Groups were given a dataset containing customer and service information, plus whether the client churned or not.
Performed functions: Data access, exploration and understanding; Data preparation; Modelling; Assessment; Business case definition.
Technologies: Python, Spyder, NumPy, pandas, ML Algorithms.
Date: October 2019 - January 2020
Dataset Variables
Predictive Model
Business Case
Goal: Finding potential interesting customer patterns that could provide meaningful insights about the customers and their buying habits
Data: Groups were given a dataset containing variables related with client information and their acquisitions of company products.
Performed functions: Data access, exploration and understanding; Data Clustering; Business case definition.
Technologies: Python, Jupyter Notebooks, NumPy, pandas, SOM.
Date: October 2019 - January 2020
Dataset Variables
KMeans and SOM Clusters
Marketing Campaigns
Goal: Implementing an OLAP cube and developing a varied set of reports, dashboards and dynamic analyses on top of both the cube and the data warehouse.
Data: Groups were provided with a Data Warehouse containing data from a fictional company. This data consisted in company products, employees, customer and sales information.
Performed functions: SSAS OLAP cube building; KPI and metric definition and implementation; SSRS report building; PowerBI dashboard building.
Technologies: SQL, SQL Server, SSMS, SSAS, SSRS, Power BI.
Date: February - June 2020
OLAP Cube
Total Sales per Month SSRS Report
Manufacturing Cost Analysis PowerBI Dashboard
Goal: Design, implementation and explanation of a fully-working Data Warehouse solution.
Data: Groups were given a relational database and flat files, containing data from a fictional company. This data consisted in company products, employees, customer and sales information.
Performed functions: Data access, exploration and understanding; Dimensional model design; Staging Area; ETL processes.
Technologies: SQL, SQL Server, SSMS, SSIS.
Date: October 2019 - January 2020
Database
Data Warehouse
Title: Assessing COVID-19 impact on user opinion towards videogames
Subtitle: Sentiment analysis and structural break detection on steam data
Goal: Detect whether the emotions inflicted by the pandemic and the role played by video games on entertaining individuals, changed the sentiment displayed in user reviews.
Method: User review data was collected from Steam and processed. Sentiment polarity values were extracted from english written reviews using a set of different algorithms and analysed in a timeline. Last step consisted on testing for the existence of structural breaks in the time series.
Technologies: Python, Jupyter Notebooks, R Studio.
Date: October 2020 - November 2021