Hey! I am Nguyen, Hung Manh

interested in Data Engineering / Data Science

Dresden, Germany | 15.05.1993 | |

#enthusiastic #motivated #openminded #adventurous

Just do it!! – Cautiously; Methodically; Replicable; Documented!

Hey! I am Nguyen, Hung Manh

interested in Data Engineering / Data Science

Dresden, Germany | 15.05.1993 | |

#enthusiastic #motivated #openminded #adventurous

Just do it!! – Cautiously; Methodically; Replicable; Documented!

Skillset

Main technical skills

SQL

★★★★★

Data Vault Modeling 2.0

★★★★★

Data Analysis

★★★★★

Python

★★★★★

Containerization

★★★★★

Scala

★★★★★

Languages

German
(mother tongue)

★★★★★

English

★★★★★

Vietnamese

★★★★★

Japanese

★★★★★

beginner

★★★★★

strong fundamentals

★★★★★

Proficient

★★★★★

Main technical skills

SQL

★★★★★

Data Vault Modeling 2.0

★★★★★

Data Analysis

★★★★★

Python

★★★★★

Containerization

★★★★★

Shell-Scripting

★★★★★

Languages

German
(mother tongue)

★★★★★

English

★★★★★

Vietnamese

★★★★★

Japanese

★★★★★

beginner

★★★★★

strong fundamentals

★★★★★

Proficient

★★★★★

Work Experience

ongoing

11/23 – ongoing

Fulltime Research Associate / Data Engineer

TU Dresden – Medical Faculty – Data Integration Center | Dresden

Design and Development of a Data Lakehouse Framework for Clinical Data Provision and Ingestion

• Designing and implementing architecture for robust ETL processes and heterogeneous data ingestion from clinical information systems to handle medium-to-big data efficiently and in a timely manner.
• Contributing to the GemTex project at the Dresden site.

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

08/23 – 09/23

Fulltime Intern in Data Science

Oxford University Clinical Unit (OUCRU) | Ho Chi Minh, Vietnam

Preprocessing and Analysis of Raw PPG Data for Dengue Patient Management

Preprocessed raw PPG wave data for consistency.
Created time windows of interest to align with clinical events.
Matched available PPG data with clinical events and create segments through sliding windows.
Ensured data consistency, quality control and calculated essential statistical features for each segment.
Export collected meta information and valid processed PPG segments for clinician use.

In the context of Dengue patient management, the use of user-friendly wearable devices for collecting Photoplethysmogram (PPG) signal data can to be highly advantageous. The goal of the researching clinicians is to establish a meaningful relationship between specific clinical events or states, such as high fluid volume after administration or low fluid volume before a shock state, and the corresponding PPG wave data. To enable this analysis, preprocessing the PPG data is the essential first step for clinicians.

Outcome: To assist the researching clinician a pipeline consisting of a series of robust and documented Jupyter notebooks have been developed. These notebooks enable them to match available PPG data with relevant clinical events. The relevant raw PPG data files are then preprocessed to extract valid PPG signal data segments, which have passed basic quality checks. Additionally, the clinician can further filter these valid segments based on statistical features, such as kurtosis and skewness, to select signals that align with their specific quality criteria.

Technologies:

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2023

09/2019 – 02/2023

Working student in Business Intelligence (14h/week)

Otto Group Solution Provider (OSP) GmbH | Dresden, Germany

Research project: Anomaly detection in semi-structured data

Using NLP techniques to read in semi-structured data such as json files
Creating a language model from scratch to transform the data into a vector representation
Conducting anomaly detection based on vector representation through conventional methods
Using Integrated Gradients as a method to pinpoint the specific areas in the data where an anomaly appeared.

Outcome: Semi-structured data, particularly in logistics, often consists of numerical values that can be challenging to represent as meaningful embeddings. Nevertheless, by leveraging a custom pretrained language model trained on such data and incorporating an anomaly detection module like an autoencoder or SVM, it is possible to identify anomalous data. However, Integrated Gradients may not be suitable for this specific use case, as the choice of baseline depends on the input data and is therefore too dynamic.

Technologies:

Lead architect in designing and developing a datawarehouse framework based on open-source tools

Designed, developed and implemented a 3 Layer datawarehouse (DWH) architecture using PostreSQL as DBMS
Created ETL process pattern based on Python scripts for entities (Sats, Links, Hubs, effectivity Sats, Bridges, Dimensionens, Facts)
Managed scheduling and orchestration through Airflow
Created dashboards with Superset
Implemented deployment of services on Kubernetes in the oracle cloud
Provided Infrastructure as a Code through Terraform
Modelled DWH Core entities based on Data Vault Modeling 2.0 (Sats, Links, Hubs, effectivity Sats)
Modelled and implemented star schema models

Outcome: A fully functional open source data warehouse framework, including robust ETL processes and comprehensive logging, was successfully developed. The first POC was able to successfully visualize actual and planned production values on a daily basis, whereas the report was only done monthly before. Another use case involved the automation of invoice calculations, replacing the previous manual process that relied on a large Excel file.

Technologies:

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2019

10/2018 – 03/2019

Working Student in Business Intelligence (16h/week)

Otto Group Solution Provider (OSP) GmbH | Dresden, Germany

Analysis/operations & upgrade/extension of existing data warehouse

Within the context of logistics, extended datawarehouse model (Data Vault Modeling 2.0)
Integrated new source data from incoming shipments
Supported modelling and implementation of star schema models
Analysis of current and support for providing consistent data quality

Technologies:

Microsoft SQL Server, SQL Server Integration Services

05/2018 – 09/2018

Fulltime Intern in BI Data Management & Technology

Otto (GmbH & Co KG) | Hamburg, Germany

Automated provisioning of Big Data for consumers

Operated and maintaned script based management of big data files
Imported and provisioned data for consumers through (legacy) talend ETL pipeline
Supported migration of services to gcp

Technologies:

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2018

04/2017 – 03/2018

Fulltime Intern and Working Student in Business Intelligence

Otto Group Solution Provider (OSP) GmbH

Internship (6 months) with following Working Student Position (18h/week)

Analysis/operations & upgrade/extension of existing data warehouse

Within the context of logistics, extended an existing datawarehouse model (Data Vault Modeling 2.0)
Implemented basic Data Vault entities (Hubs, Sats, Links)
Supported integration of new source data from incoming shipments and quality inspection

Technologies:

Microsoft SQL Server, SQL Server Integration Services

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2017

09/2016 – 03/2017

Fulltime Intern in Data & Content Driven Services

T-Systems Multi Media Solutions GmbH | Dresden, Germany

Consulting in digital transformation and digital marketing

Created reports with basic marketing metrics such as visits, visitors, bouncerate
Supported in SEO analysis and web appearence
Generated and provided knowledge and understanding of development and operations for the SAP Hybris Marketing cloud
Created a comprehensive comparison of features between marketing plattforms (SAP Hybris Marketing, Salesforce Marketing cloud, Adobe Marketing Cloud)

Technologies:

Excel, VBA

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2016

Education

since 10/2018

TU Dresden | Dresden, Germany

Business Information Systems, Diploma

03/2019 – 09/2019

Keio University | Tokyo, Japan

Exchange Semester

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2019

10/2013 – 10/2018

TU Dresden | Dresden, Germany

Business Information Systems, Bachelor | Final grade: 2,3

Major in Production and Logistics

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2013

Further relevant experience as part of university modules

Applied Data Science: Case Studies | grade: 1.0:
- Developing a semi-automated pipeline for object detection based on the Tensorflow API on the Microsoft Azure cloud
Mobile and Ubiquitous Computing | grade: 1.0:
- Extending a data streaming pipeline in Flink for real-time processing of data gathered by cyclist
Research Paper | grade: 1.3:
- “Application of machine learning in industrial incident management” (title translated from German)
Research Paper | grade: 1.0:
- Relevant or irrelevant? Reviewing Approaches to (semi-)automate the Screening Process of a Structured Literature Review

Hey! I am Nguyen, Hung Manh

#enthusiastic #motivated #openminded #adventurous

Hey! I am Nguyen, Hung Manh

#enthusiastic #motivated #openminded #adventurous

Skillset

Main technical skills

Languages

Main technical skills

Languages

Work Experience

ongoing

Fulltime Research Associate / Data Engineer

Fulltime Intern in Data Science

Preprocessing and Analysis of Raw PPG Data for Dengue Patient Management

2023

Working student in Business Intelligence (14h/week)

Research project: Anomaly detection in semi-structured data

Lead architect in designing and developing a datawarehouse framework based on open-source tools

2019

Working Student in Business Intelligence (16h/week)

Analysis/operations & upgrade/extension of existing data warehouse

Fulltime Intern in BI Data Management & Technology

Automated provisioning of Big Data for consumers

2018

Fulltime Intern and Working Student in Business Intelligence

Analysis/operations & upgrade/extension of existing data warehouse

2017

Fulltime Intern in Data & Content Driven Services

Consulting in digital transformation and digital marketing

2016

Education

TU Dresden | Dresden, Germany

Keio University | Tokyo, Japan

2019

TU Dresden | Dresden, Germany

2013

Further relevant experience as part of university modules

Contact Me