Geser Dugarov, Ph.D.

Software Engineer,

Big Data Engineer

About

Software Engineer developing core functionality on a data lakehouse platform to extract value from petabyte-scale data. Hands-on experience with Apache Spark and Apache Flink, with a focus on Hudi-Spark and Hudi-Flink integrations. Open-source contributor to Apache Hudi, focusing on streaming performance and solution usability.

Extensive experience in research and data analysis; PhD. Strong interest in big data and distributed systems.

Technical

  • Java
  • Python
  • Maven
  • PostgreSQL
  • Docker
  • Hadoop Ecosystem

Experience

May 2023 - current (2.5+ yrs)

Software Engineer / Big Data Engineer

Huawei
 

Development of core functionality within Huawei Cloud for big data processing on enterprise-scale clusters.

• Provided a simplified configuration system utilizing commonly used presets to overcome the complexity of managing hundreds of parameters.

• Improved performance of Flink stream writing, decreasing processing time by 2x.

• Implemented partition-level TTL, enabling customers to automate cloud storage cost management with coarse granularity.

Jun 2024 - current (1.5+ yrs)

Apache Hudi Contributor

The Apache Software Foundation
 

Apache Hudi is a data lakehouse platform that brings database functionality to data lakes and enables incremental processing for low-latency analytics.

• Optimized serialization and deserialization of records in Flink stream writing to Hudi table, resulting in a 30% increase in processing speed and 2x reduction in memory usage (design doc, main changes). Released in Hudi 1.0.2.

• Implemented 4 local optimizations ([1], [2], [3], [4]) resulting in a 10% increase in processing speed and 30% reduction in garbage collection overhead. Released in Hudi 1.0.1.

• Contributed 40+ merged pull requests.

Feb 2022 – May 2023 (1+ yr)

Software Engineer / ML Engineer

Digital Research (computer vision startup)
 

• Designed and implemented an event-based architecture for a system for trucks monitoring. Developed server-side image processing handling ~20,000 images per day. In production, the system reduced fleet idle time by 12%.

• Built a customer-facing web UI featuring reports and data visualizations. Also developed an internal web UI for system monitoring.

Education

PhD, Geophysics

Trofimuk Institute of Petroleum Geology and Geophysics SB RAS

MSc, Computational and Applied Mathematics

Novosibirsk State University

Certificates

• Deep Learning Specialization, Coursera (2021)

Software copyrights

• Software for solving nonstationary thermohydraulic problem applied to reactors and experimental stands with sodium, lead and lead-bismuth coolants. Version 1.1. HYDRA-IBRAE/LM/V1.1 (2018)

Hobbies

• Going to the gym on a regular basis (since 2019).

• Reading developmental psychology and business management books.

- Meg Jay "The Defining Decade"

- Tom DeMarco "The Deadline: A Novel about Project Management"

- Alexey Markov "Hoolinomics" and "Greedology" [in Russian]

- Eliyahu Goldratt, and Jeff Cox "The Goal: A Process of Ongoing Improvement"