This chapter develops a novel approach to predict post-money valuation of startups across various regions and sectors, as well as their probabilities of success. Using startup funding data and descriptions from Crunchbase over a 10-year period, we develop two models linking information such as description, region, and venture capital funding to successful outcomes such as the achievement of an acquisition or IPO. The first model utilizes latent Dirichlet allocation, a generative statistical model in natural language processing, to organize the startups in the dataset into clusters representing various sectors in the typical economy. A distributed gradient boosting regressor (XGBoost) with hyperparameters optimized through Bayesian optimization is subsequently deployed to make use of the resultant feature set to predict post-money valuation. Our model consistently achieves an accuracy of over 95% on hold-out test sets, even with some continuous features removed. The second model is a feed-forward neural network constructed using TensorFlow, with the final layer providing predicted probabilities of success. We find that post-money valuations across regions are typically log-normally distributed, and startups in regions such as San Francisco Bay Area typically witness higher valuations across most sectors. We also find that startups operating in specific geographical regions and sectors of economy (e.g., regions and sectors with higher number of investors) typically have higher predicted probabilities of success. Our approach offers an empirical perspective to startups, policymakers, and venture funds to benchmark and predict valuation and success, clearing some opacity in the modern startup economy.
Ang, Yu Qian, Andrew Chia, and Soroush Saghafian. "Using Machine Learning to Demystify Startups’ Funding, Post-Money Valuation, and Success." Innovative Technology at the Interface of Finance and Operations. Ed. Volodymyr Babich, John R. Birge, and Hilary Gilles. Springer, Cham, 2022, 271-296.