Skillset
Main technical skills
SQL
Data Vault Modeling 2.0
Data Analysis
Python
Containerization
Shell-Scripting
Languages
German
(mother tongue)
English
Vietnamese
Japanese
beginner
strong fundamentals
Proficient
Main technical skills
SQL
Data Vault Modeling 2.0
Data Analysis
Python
Containerization
Shell-Scripting
Languages
German
(mother tongue)
English
Vietnamese
Japanese
beginner
strong fundamentals
Proficient
Work Experience
08/23 – 09/23
Fulltime Intern in Data Science
Oxford University Clinical Unit (OUCRU) | Ho Chi Minh, Vietnam
Preprocessing and Analysis of Raw PPG Data for Dengue Patient Management
- Preprocessed raw PPG wave data for consistency.
- Created time windows of interest to align with clinical events.
- Matched available PPG data with clinical events and create segments through sliding windows.
- Ensured data consistency, quality control and calculated essential statistical features for each segment.
- Export collected meta information and valid processed PPG segments for clinician use.
In the context of Dengue patient management, the use of user-friendly wearable devices for collecting Photoplethysmogram (PPG) signal data can to be highly advantageous. The goal of the researching clinicians is to establish a meaningful relationship between specific clinical events or states, such as high fluid volume after administration or low fluid volume before a shock state, and the corresponding PPG wave data. To enable this analysis, preprocessing the PPG data is the essential first step for clinicians.
Outcome: To assist the researching clinician a pipeline consisting of a series of robust and documented Jupyter notebooks have been developed. These notebooks enable them to match available PPG data with relevant clinical events. The relevant raw PPG data files are then preprocessed to extract valid PPG signal data segments, which have passed basic quality checks. Additionally, the clinician can further filter these valid segments based on statistical features, such as kurtosis and skewness, to select signals that align with their specific quality criteria.
Technologies:
2023
09/2019 – 02/2023
Working student in Business Intelligence (14h/week)
Otto Group Solution Provider (OSP) GmbH | Dresden, Germany
Research project: Anomaly detection in semi-structured data
- Using NLP techniques to read in semi-structured data such as json files
- Creating a language model from scratch to transform the data into a vector representation
- Conducting anomaly detection based on vector representation through conventional methods
- Using Integrated Gradients as a method to pinpoint the specific areas in the data where an anomaly appeared.
Outcome: Semi-structured data, particularly in logistics, often consists of numerical values that can be challenging to represent as meaningful embeddings. Nevertheless, by leveraging a custom pretrained language model trained on such data and incorporating an anomaly detection module like an autoencoder or SVM, it is possible to identify anomalous data. However, Integrated Gradients may not be suitable for this specific use case, as the choice of baseline depends on the input data and is therefore too dynamic.
Technologies:
Lead architect in designing and developing a datawarehouse framework based on open-source tools
- Designed, developed and implemented a 3 Layer datawarehouse (DWH) architecture using PostreSQL as DBMS
- Created ETL process pattern based on Python scripts for entities (Sats, Links, Hubs, effectivity Sats, Bridges, Dimensionens, Facts)
- Managed scheduling and orchestration through Airflow
- Created dashboards with Superset
- Implemented deployment of services on Kubernetes in the oracle cloud
- Provided Infrastructure as a Code through Terraform
- Modelled DWH Core entities based on Data Vault Modeling 2.0 (Sats, Links, Hubs, effectivity Sats)
- Modelled and implemented star schema models
Outcome: A fully functional open source data warehouse framework, including robust ETL processes and comprehensive logging, was successfully developed. The first POC was able to successfully visualize actual and planned production values on a daily basis, whereas the report was only done monthly before. Another use case involved the automation of invoice calculations, replacing the previous manual process that relied on a large Excel file.
Technologies:
2019
10/2018 – 03/2019
Working Student in Business Intelligence (16h/week)
Otto Group Solution Provider (OSP) GmbH | Dresden, Germany
Analysis/operations & upgrade/extension of existing data warehouse
- Within the context of logistics, extended datawarehouse model (Data Vault Modeling 2.0)
- Integrated new source data from incoming shipments
- Supported modelling and implementation of star schema models
- Analysis of current and support for providing consistent data quality
Technologies:
Microsoft SQL Server, SQL Server Integration Services
05/2018 – 09/2018
Fulltime Intern in BI Data Management & Technology
Otto (GmbH & Co KG) | Hamburg, Germany
Automated provisioning of Big Data for consumers
- Operated and maintaned script based management of big data files
- Imported and provisioned data for consumers through (legacy) talend ETL pipeline
- Supported migration of services to gcp
Technologies:
2018
04/2017 – 03/2018
Fulltime Intern and Working Student in Business Intelligence
Otto Group Solution Provider (OSP) GmbH
Internship (6 months) with following Working Student Position (18h/week)
Analysis/operations & upgrade/extension of existing data warehouse
- Within the context of logistics, extended an existing datawarehouse model (Data Vault Modeling 2.0)
- Implemented basic Data Vault entities (Hubs, Sats, Links)
- Supported integration of new source data from incoming shipments and quality inspection
Technologies:
Microsoft SQL Server, SQL Server Integration Services
2017
09/2016 – 03/2017
Fulltime Intern in Data & Content Driven Services
T-Systems Multi Media Solutions GmbH | Dresden, Germany
Consulting in digital transformation and digital marketing
- Created reports with basic marketing metrics such as visits, visitors, bouncerate
- Supported in SEO analysis and web appearence
- Generated and provided knowledge and understanding of development and operations for the SAP Hybris Marketing cloud
- Created a comprehensive comparison of features between marketing plattforms (SAP Hybris Marketing, Salesforce Marketing cloud, Adobe Marketing Cloud)
Technologies:
Excel, VBA
2016
Education
since 10/2018
TU Dresden | Dresden, Germany
Business Information Systems, Diploma
03/2019 – 09/2019
Keio University | Tokyo, Japan
Exchange Semester
2019
10/2013 – 10/2018
TU Dresden | Dresden, Germany
Business Information Systems, Bachelor | Final grade: 2,3
Major in Production and Logistics
2013
Further relevant experience as part of university modules
- Applied Data Science: Case Studies | grade: 1.0:
- Developing a semi-automated pipeline for object detection based on the Tensorflow API on the Microsoft Azure cloud
- Mobile and Ubiquitous Computing | grade: 1.0:
- Extending a data streaming pipeline in Flink for real-time processing of data gathered by cyclist
- Research Paper | grade: 1.3:
- “Application of machine learning in industrial incident management” (title translated from German)
- Research Paper | grade: 1.0:
- Relevant or irrelevant? Reviewing Approaches to (semi-)automate the Screening Process of a Structured Literature Review