Why (not) Big Data?

Catch-22 cycle of getting data and scaling

I was visiting a alumni senior one day who operates a tourism agency. She was trying to digitalise her business to hedge risk against Covid-19 and improve productivity, and she has achieved moderate success. However, she met a gap that was nearly impossible for her to cross. She was engaged by a large e-commerce company who were venturing into selling tourist packages, and the conversation left a strong impression on her on the importance of data. The deal with the large e-commerce company was enticing to her because they had a lot of data to back their claims and that they could direct their customers who have the right inclination to her store via the usage data they have collected. However, she is limited by the amount of usage data available to her to view and aggregate. As a small medium enterprise (SME) owner, to scale her operations up she needed more data to avoid making decisions that carry high risks because she has limited cash. The conundrum is, she doesn't have access to alot of data but current data analysis tools works with "Big Data".

So she's now in a catch-22 situation. To scale up without wasting money on the wrong decisions, she needs a lot of data. To get more data, she has to scale up.

Thanks for reading Sofya AI Tech Blog! Subscribe for free to receive new posts and support my work.

I imagine a lot of SMEs end up in this situation and then just stop there. Some get VC funding and burn through money on risky decisions to get more data and escape the cycle or crash, but most SMEs are not hyper-growth companies and do not get as much love from creditors, so they just stagnate.

Small Medium Enterprises make up 90% of the companies and 50% of the overall employment in the world. [1] However, these companies are much less productive than large companies and halving the productivity gap alone between SMEs and Large enterprises would amount to 15 trillion dollars in value added. [2]

In my opinion, the key problem we have to solve here is "Why Big Data?". Hiring a data analytics team is no small investment and these teams often cannot perform without enough data. With recent advances into deep learning and neural networks, we seem to have forgotten that heuristics exist. Heuristics is a process by which humans use mental short cuts to arrive at decisions, and it has worked sufficiently well enough for humans to apply it in politics, governance and war, which probably attests to how well it works, to some extent.

In fact, there is a name for the phenomenon where working with less data will yield better results in heuristics : the "less-is-more" effect, for example when using the "take-the-best heuristic" [3]. This runs contrary to how current deep learning models work, where more data is always better than less data as noted by Peter Norvig, leading to breakthrough Large Language Models.

For us to breakthrough the gap left behind by Big Data, we need to rethink how data analytics and models can take advantage of, or figure out heuristics within smaller amount of data, or Small Data. It is interesting that some companies have begun this transition and angle of attack, like duckdb labs (https://duckdblabs.com/news/2022/11/15/motherduck-partnership.html) while not explicitly talking about this trend.

In my opinion, success in this will trigger the next AI wave : mass adoption of AI by smaller businesses and increased productivity across the world which would lead us to the next industrial revolution.

[1] https://www.worldbank.org/en/topic/smefinance

[2] https://www.mckinsey.com/industries/public-and-social-sector/our-insights/unlocking-growth-in-small-and-medium-size-enterprises

[3] Graefe, Andreas; Armstrong, J. Scott (2012). "Predicting elections from the most important issue: A test of the take‐the‐best heuristic". Journal of Behavioral Decision Making. 25 (1): 41–48. doi:10.1002/bdm.710.

Daniel's Blog

Why (not) Big Data?

Catch-22 cycle of getting data and scaling

Recent posts