Hey! I am Nguyen, Hung Manh

interested in Data Engineering / Data Science

Dresden, Germany | 15.05.1993 | |

#enthusiastic #motivated #openminded #adventurous

Just do it!!Cautiously; Methodically; Replicable; Documented!

Hey! I am Nguyen, Hung Manh

interested in Data Engineering / Data Science

Dresden, Germany | 15.05.1993 | |

#enthusiastic #motivated #openminded #adventurous

Just do it!!Cautiously; Methodically; Replicable; Documented!

Skillset

Main technical skills

SQL

Data Vault Modeling 2.0

Data Analysis

Python

Containerization

Shell-Scripting

Languages

German
(mother tongue)

English

Vietnamese

Japanese

beginner

strong fundamentals

Proficient

Main technical skills

SQL

Data Vault Modeling 2.0

Data Analysis

Python

Containerization

Shell-Scripting

Languages

German
(mother tongue)

English

Vietnamese

Japanese

beginner

strong fundamentals

Proficient

Work Experience

08/23 – 09/23

Fulltime Intern in Data Science

Oxford University Clinical Unit (OUCRU) | Ho Chi Minh, Vietnam

Preprocessing and Analysis of Raw PPG Data for Dengue Patient Management

  • Preprocessed raw PPG wave data for consistency.
  • Created time windows of interest to align with clinical events.
  • Matched available PPG data with clinical events and create segments through sliding windows.
  • Ensured data consistency, quality control and calculated essential statistical features for each segment.
  • Export collected meta information and valid processed PPG segments for clinician use.

In the context of Dengue patient management, the use of user-friendly wearable devices for collecting Photoplethysmogram (PPG) signal data can to be highly advantageous. The goal of the researching clinicians is to establish a meaningful relationship between specific clinical events or states, such as high fluid volume after  administration or low fluid volume before a shock state, and the corresponding PPG wave data. To enable this analysis, preprocessing the PPG data is the essential first step for clinicians.

Outcome: To assist the researching clinician a pipeline consisting of a series of robust and documented Jupyter notebooks have been developed. These notebooks enable them to match available PPG data with relevant clinical events. The relevant raw PPG data files are then preprocessed to extract valid PPG signal data segments, which have passed basic quality checks. Additionally, the clinician can further filter these valid segments based on statistical features, such as kurtosis and skewness, to select signals that align with their specific quality criteria.

Technologies:

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2023

09/2019 – 02/2023

Working student in Business Intelligence (14h/week)

Otto Group Solution Provider (OSP) GmbH | Dresden, Germany

Research project: Anomaly detection in semi-structured data

  • Using NLP techniques to read in semi-structured data such as json files
  • Creating a language model from scratch to transform the data into a vector representation
  • Conducting anomaly detection based on vector representation through conventional methods
  • Using Integrated Gradients as a method to pinpoint the specific areas in the data where an anomaly appeared.

Outcome: Semi-structured data, particularly in logistics, often consists of numerical values that can be challenging to represent as meaningful embeddings. Nevertheless, by leveraging a custom pretrained language model trained on such data and incorporating an anomaly detection module like an autoencoder or SVM, it is possible to identify anomalous data. However, Integrated Gradients may not be suitable for this specific use case, as the choice of baseline depends on the input data and is therefore too dynamic.

Technologies:


Lead architect in designing and developing a datawarehouse framework based on open-source tools

  • Designed, developed and implemented a 3 Layer datawarehouse (DWH) architecture using PostreSQL as DBMS
  • Created ETL process pattern based on Python scripts for entities (Sats, Links, Hubs, effectivity Sats, Bridges, Dimensionens, Facts)
  • Managed scheduling and orchestration through Airflow
  • Created dashboards with Superset
  • Implemented deployment of services on Kubernetes in the oracle cloud
  • Provided Infrastructure as a Code through Terraform
  • Modelled DWH Core entities based on Data Vault Modeling 2.0 (Sats, Links, Hubs, effectivity Sats)
  • Modelled and implemented star schema models

Outcome: A fully functional open source data warehouse framework, including robust ETL processes and comprehensive logging, was successfully developed. The first POC was able to successfully visualize actual and planned production values on a daily basis, whereas the report was only done monthly before. Another use case involved the automation of invoice calculations, replacing the previous manual process that relied on a large Excel file.

Technologies:

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2019

10/2018 – 03/2019

Working Student in Business Intelligence (16h/week)

Otto Group Solution Provider (OSP) GmbH | Dresden, Germany

Analysis/operations & upgrade/extension of existing data warehouse

  • Within the context of logistics, extended datawarehouse model (Data Vault Modeling 2.0)
  • Integrated new source data from incoming shipments
  • Supported modelling and implementation of star schema models
  • Analysis of current and support for providing consistent data quality

Technologies:

Microsoft SQL Server, SQL Server Integration Services

05/2018 – 09/2018

Fulltime Intern in BI Data Management & Technology

Otto (GmbH & Co KG) | Hamburg, Germany

Automated provisioning of Big Data for consumers

  • Operated and maintaned script based management of big data files
  • Imported and provisioned data for consumers through (legacy) talend ETL pipeline
  • Supported migration of services to gcp

Technologies:

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2018

04/2017 – 03/2018

Fulltime Intern and Working Student in Business Intelligence

Otto Group Solution Provider (OSP) GmbH

Internship (6 months) with following Working Student Position (18h/week)

Analysis/operations & upgrade/extension of existing data warehouse

  • Within the context of logistics, extended an existing datawarehouse model (Data Vault Modeling 2.0)
  • Implemented basic Data Vault entities (Hubs, Sats, Links)
  • Supported integration of new source data from incoming shipments and quality inspection

Technologies:

Microsoft SQL Server, SQL Server Integration Services

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2017

09/2016 – 03/2017

Fulltime Intern in Data & Content Driven Services

T-Systems Multi Media Solutions GmbH | Dresden, Germany

Consulting in digital transformation and digital marketing

  • Created reports with basic marketing metrics such as visits, visitors, bouncerate
  • Supported in SEO analysis and web appearence
  • Generated and provided knowledge and understanding of development and operations for the SAP Hybris Marketing cloud
  • Created a comprehensive comparison of features between marketing plattforms (SAP Hybris Marketing, Salesforce Marketing cloud, Adobe Marketing Cloud)

Technologies:

Excel, VBA

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2016

Education

since 10/2018

TU Dresden | Dresden, Germany

Business Information Systems, Diploma

03/2019 – 09/2019

Keio University | Tokyo, Japan

Exchange Semester

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2019

10/2013 – 10/2018

TU Dresden | Dresden, Germany

Business Information Systems, Bachelor | Final grade: 2,3

Major in Production and Logistics

[bold_timeline_item_button title=”Expand” style=”” shape=”” color=”” size=”inline” url=”#” el_class=”bold_timeline_group_button”]

2013

Further relevant experience as part of university modules

  • Applied Data Science: Case Studies | grade: 1.0:
    • Developing a semi-automated pipeline for object detection based on the Tensorflow API on the Microsoft Azure cloud
  • Mobile and Ubiquitous Computing | grade: 1.0:
    • Extending a data streaming pipeline in Flink for real-time processing of data gathered by cyclist
  • Research Paper | grade: 1.3:
    •  “Application of machine learning in industrial incident management” (title translated from German)
  • Research Paper | grade: 1.0:
    • Relevant or irrelevant? Reviewing Approaches to (semi-)automate the Screening Process of a Structured Literature Review

Contact Me