Kai (James) Zhang | GIBDL, SRIBD

About Me

Kai Zhang is a research data engineer at Shenzhen Research Institute of Big Data (深圳市大数据研究院, SRIBD). I am affiliated to the Government and Industrial Big Data Lab (政企大数据实验室) where I am qualified to deliver insight and impact for government and enterprises through a wide range of flexible support models, providing ad hoc, deeply transformational, and ongoing analytics architecture and solutions. Currently, I am primarily researching on factor investing in Chinese stocks market, and responsible for multiple projects related to emperical asset pricing, quantitative strategy design, and performance attribution for financial products. Before joining SRIBD, I received my Master’s degree in Biostatistics from Cornell University, and a Bachelor’s degree in Statistics with the First-class Honour from Chinese University of Hong Kong, Shenzhen.

Research Interests

Statistical Learning: time series, quantitative finance, interpretable machine learning
Causal Inference: econometrics, targeted learning, experimental design and analysis

Projects

Comprehensive Python Framework for Factor Investment (Supervised by Prof. Tao Shu at CUHKSZ)
This project is aimed at providing a comprehensive software framework for factor investment, including data interface, factors library, factor analysis, and strategy backtesting. Data interface module supports storing and importing of financial data. Factor analyzer module provides performance analysis of predictive stock factor, including Fama-MacBeth regression, time series regression, barra regression, informaction coefficient analysis and fractile analysis, etc. Backtester module provides vectorized backtesting framework for investment strategy.
Automated Project Auditing System Based on Semantic Text Matching Algorithms (Supervised by Prof. Tao Shu at CUHKSZ)
This project developed text semantic matching metrics based on WordNet lexical database with Wu Palmer similarity and Word2Vec algorithm with cosine similarity, providing Shenzhen Science and Technology Innovation Commision with a solution to audit the research outputs of the state-funded projects in batches.
Cell Type Identification of Mouse Retinal Pigment Epithelium from ScRNA-seq Data (Supervised by Prof. Xi Kathy Zhou at Cornell University)
This project used unsupervised learning methods to identify the cellular composition and activities of mouse retinal pigment epithelium (RPE) from single-cell transcriptomics, and further to detect cell-to-cell variation by statistical methods. We firstly processed raw sequence data into a high-quality expression data following Scater workflow, including rigorous pre-processing, quality control, feature selection and PCA dimensionality reduction. Then, multiple clustering algorithms were performed and compared on cells to identify RPE cell types, including K-means, DBSCAN and SNN graph-based clustering. Finally, potential gene markers were detected for each cell class using pairwise t-test with BH adjustment for multiple comparisons.
Statistical Mediation Analysis on Components of Frailty with an Ordinal Outcome (Supervised by Prof. Arindam RoyChoudhury at Cornell University)
This project investigated into Racho Bernado study, an observational cohort study, regarding mechanism of frailty in older adults. Evidence of mediation relationship among components of frailty (strength, body composition, and body performance) is provided by traditional Baron and Kenny’s three-steps method based on proportional odds model.