Geser Dugarov, Ph.D.

Java Developer,

Big Data Engineer

About

Software Engineer developing core functionality in a Data Lakehouse platform to extract value from PB-scale data. Also contributing to open source, Apache Hudi project, with a focus on performance and usability of solution. Wide experience in research and data analysis, PhD.

Passionate about Big Data and Distributed Systems. Personal mission statement: "Living a balanced life. Helping professionals to work smarter, not harder by creating automatic systems for their routine."

Technical

  • Java
  • Python
  • Maven
  • PostgreSQL
  • Docker
  • Hadoop Ecosystem

Experience

May 2023 - current (2+ yrs)

Java Developer / Big Data Engineer

Huawei Cloud
 

• Development of core functionality of a Data Lakehouse platform for Big Data processing on enterprise-level scalable clusters.

• Future Star Award (2024).

Jun 2024 - current (1+ yr)

Apache Hudi Contributor

The Apache Software Foundation
 

Apache Hudi is a Data Lakehouse platform that brings database functionality to data lakes and enables incremental processing for low-latency analytics

• Optimized serialization and deserialization of data stream records in Flink stream writing, resulting in a 30% increase in processing speed and 2x reduction in memory usage (design doc, main changes, umbrella ticket). Released in Apache Hudi 1.0.2.

• Implemented 4 local optimizations ([1], [2], [3], [4]) in Flink stream writing, resulting in a 10% increase in processing speed and 30% reduction in garbage collection overhead. Released in Apache Hudi 1.0.1.

• Contributed 40+ merged pull requests.

Feb 2022 – May 2023 (1+ yr)

Python Developer

Digital Research (computer vision startup)
 

• Designed and implemented an event-based architecture for a system for trucks monitoring. Developed server-side image processing handling ~20,000 images per day. In production, the system reduced fleet idle time by 12%.

• Built a customer-facing web UI featuring reports and data visualizations. Also developed an internal web UI for system monitoring.

Education

PhD, Geophysics

Trofimuk Institute of Petroleum Geology and Geophysics SB RAS

MSc, Computational and Applied Mathematics

Novosibirsk State University

Certificates

• Deep Learning Specialization, Coursera (2021)

Software copyrights

• Software for solving nonstationary thermohydraulic problem applied to reactors and experimental stands with sodium, lead and lead-bismuth coolants. Version 1.1. HYDRA-IBRAE/LM/V1.1 (2018)

Hobbies

• Going to the gym on a regular basis (since 2019).

• Reading developmental psychology and business management books.

- Meg Jay "The Defining Decade"

- Tom DeMarco "The Deadline: A Novel about Project Management"

- Alexey Markov "Hoolinomics" and "Greedology" [in Russian]

- Eliyahu Goldratt, and Jeff Cox "The Goal: A Process of Ongoing Improvement"