Software Engineer developing core functionality on a data lakehouse platform to extract value from petabyte-scale data. Hands-on experience with Apache Spark and Apache Flink, with a focus on Hudi-Spark and Hudi-Flink integrations. Open-source contributor to Apache Hudi, focusing on streaming performance and solution usability.
Extensive experience in research and data analysis; PhD. Strong interest in big data and distributed systems.
Development of core functionality within Huawei Cloud for big data processing on enterprise-scale clusters.
• Provided a simplified configuration system utilizing commonly used presets to overcome the complexity of managing hundreds of parameters.
• Improved performance of Flink stream writing, decreasing processing time by 2x.
• Implemented partition-level TTL, enabling customers to automate cloud storage cost management with coarse granularity.
Apache Hudi is a data lakehouse platform that brings database functionality to data lakes and enables incremental processing for low-latency analytics.
• Optimized serialization and deserialization of records in Flink stream writing to Hudi table, resulting in a 30% increase in processing speed and 2x reduction in memory usage (design doc, main changes). Released in Hudi 1.0.2.
• Implemented 4 local optimizations ([1], [2], [3], [4]) resulting in a 10% increase in processing speed and 30% reduction in garbage collection overhead. Released in Hudi 1.0.1.
• Contributed 40+ merged pull requests.
• Designed and implemented an event-based architecture for a system for trucks monitoring. Developed server-side image processing handling ~20,000 images per day. In production, the system reduced fleet idle time by 12%.
• Built a customer-facing web UI featuring reports and data visualizations. Also developed an internal web UI for system monitoring.
• Deep Learning Specialization, Coursera (2021)
• Software for solving nonstationary thermohydraulic problem applied to reactors and experimental stands with sodium, lead and lead-bismuth coolants. Version 1.1. HYDRA-IBRAE/LM/V1.1 (2018)
• Going to the gym on a regular basis (since 2019).
• Reading developmental psychology and business management books.
- Meg Jay "The Defining Decade"
- Tom DeMarco "The Deadline: A Novel about Project Management"
- Alexey Markov "Hoolinomics" and "Greedology" [in Russian]
- Eliyahu Goldratt, and Jeff Cox "The Goal: A Process of Ongoing Improvement"