This chapter develops a novel approach to predict post-money valuation of startups across various regions and sectors, as well as their probabilities of success. Using startup funding data and descriptions from Crunchbase over a ten-year period, we develop two models linking information such as description, region, and venture capital funding to successful outcomes such as the achievement of an acquisition or IPO. The first model utilizes latent Dirichlet allocation, a generative statistical model in natural language processing, to organize the startups in the dataset into clusters representing various sectors in the typical economy. An optimized distributed gradient boosting regressor (XGBoost) is subsequently deployed to make use of the resultant feature set to predict post-money valuation, with Bayesian optimization used to find the optimal hyperparameters. Our model consistently achieves an accuracy of over 95% on hold-out test sets, even with some continuous features removed. The second model is a feed-forward neural network constructed using TensorFlow, with the final layer providing probabilities of success. We find that post-money valuations across regions are typically log-normally distributed, and startups in regions such as San Francisco Bay Area typically witness higher valuations across most sectors. We also find that startups operating in specific geographical regions and sectors of economy (e.g., regions and sectors with higher number of investors) typically have higher predicted probabilities of success. Our approach offers an empirical perspective to startups, policymakers, and venture funds to benchmark and predict valuation and success, clearing some opacity in the modern startup economy.
Ang, Yu Qian, Andrew Chia, and Soroush Saghafian. "Using Machine Learning to Demystify Startups Funding, Post-Money Valuation, and Success." HKS Faculty Research Working Paper Series RWP20-028, August 2020.