Marketplaces - Data Science - Machine Learning

by Ramesh Johari

3/4/20249 min read

Yeah the screenshot above just from the Lenny's channel. You can watch the full podcast here

Here I collected the key highlights that I found useful.

How does a marketplace business start?

Marketplace business never starts as a marketplace business, because what we think of as a marketplace business is something which at scale is removing the friction of the two sides finding each other. But when you start, you don't have that scale. So when you start, you had better be thinking, "What's my value proposition in a world in which I don't have that scaled liquidity on both sides?


What is a marketplace business, and also why is data so important and such an integral part of building a successful marketplace business?

A: Uber and Airbnb are selling you the taking away of something, which is a weird thing to think about. What they're taking away is the friction of finding a place to stay. They're taking away the friction of finding a driver.

In economics, we call those things transaction costs.

When I want to stay somewhere when I'm traveling, a friction is, who's willing to give me their room? I mean on principle, there's people who are willing to let me stay in their living room, but I don't know who they are.

So those are frictions, and : what the marketplaces are selling you is taking the friction away. That's what you're paying them for.

that's fundamentally your value proposition.

How does a marketplace work ?

And these frictions that are getting taken away, they're getting taken away because of data and data science. So I really want to highlight three pieces of this for people, which I want you to think of them as a cycle. But to start with, let's just lay them out one at a time.

1- Finding potential matches: One of them is finding people to match with. So that's the problem of, "I want to stay somewhere. Who is out there, who's willing to let me stay with them on a given timeframe?" And then if I'm a host, I have a listing. Who is out there, who's willing to stay at my place when I have it available? So that's finding matches.

2- Making these matches: Then there's making the match.

3- Learning from these matches: And then finally, we make matches.

And that's all information that the marketplace should feed back in. So this is where we get to rating systems and feedback systems, even passive data collection, ( Did you leave your booking before you were supposed to leave? Well, maybe that's a sign that something didn't quite work out the way you wanted to work out. So that's passive data collection. Did you leave five stars? That's active data collection.)

...finding potential matches, making matches, and then learning about those matches, and then cycling back again, that is the data science in marketplaces.

What's the monetization strategy we want to use? How do we address this issue that longer term relationships may disintermediate?

My favorite example is we had some stuff delivered from IKEA by a Thumbtack worker once, and my wife is like, "Oh, thanks a lot. You're so reliable." He's like, "Hey, great. Here's my business card. Ever need me again? Just call the number on the back." And that was it. Thumbtack got their one lead gen, and then we didn't need the platform anymore.

Ratings and Reviews

Q: Say a marketplace founder is trying to decide and design how they do ratings, and reviews, and things like that. What's a couple pieces of advice you'd give them for how to do this correctly? And is there a model marketplace you'd point them to like, "These guys really do it really well"? And I know it's super specific based on the marketplace, but is there one just like, "They really nailed it"?

1- something like rather than the star ratings just being poor to excellent, the top rating has actually exceeded expectations. You could go one step further and you could say, "How did this compare to this experience you had in the past that you rated really highly?" And Airbnb had something like this in place, where they would actually ask you to compare, or ask you questions about expectations.

I find that that's really valuable because it's easier for people to say, "That was good but didn't exceed my expectations. That was good, but definitely not better than this amazing stay I had two months ago," than it is to say, "Well, I'm going to ding this person and give them four stars." So that's one issue.

2- And I think another thing I want to point out for any marketplace founder is that something you want to be really careful about is the concept of averaging and whether are the implications of averaging. And that's because a default for many marketplaces is to just average the ratings that people get. It feels very natural, right? Lenny's got five ratings, let me average them.

And that actually has some pretty important distributional consequences for the marketplace. Distributional in the sense of who wins, who loses. And that's because if you're averaging and you're really established on a platform, think of a restaurant on Yelp with 10,000 reviews, it's irrelevant what the next review is. It doesn't matter. Nothing's moving it at that point.

If you're new and you break into that market, and your first review is negative, you might be completely screwed. In fact, there's some early work on eBay that showed that if your first rating's negative, that could actually immediately cause an 8% hit on your immediate expected revenue, say nothing of long-term consequences. Subsequent work has found that that's a significant indicator of potential exit from the platform, just because now it's very hard to find work. And some platforms do things like maybe they won't show your ratings until you've accumulated a few.

I feel like it's incredible that more of us don't spend our time thinking about what we're learning from the matches, and what these rating systems are telling us, and what the impact of that is on who wins and who loses in these markets,

Blind side reviews: It turned out the biggest impact was review rate went up, because people get this email, "Ramesh left you a review. If you want to see it, should leave a review." And that really increased review rate, which gave us more data. And it was a really fun experiment to work on.

o Steve Tadelis, who's a professor at Berkeley, he had a really nice paper with some folks at eBay talking about what they called effective percent positive, where rather than normalizing just by the ratings, they normalized by including ratings that weren't left. And what you found was this was much more predictive of downstream performance of a seller. So there's a lot of information in that lack of a response

How machine learning works? How A/B testing is useful for my business?

And one of the first things I was asked to think about is, well, okay, someone comes to oDesk, post a job, workers apply to that job. Predict which of these workers is most likely to be hired on that job. That was the narrow question. And so why is that a good question? Because we have a whole awesome set of tools now to solve that kind of a problem exactly. How do we do it? Take a lot of past data of past jobs, past applicants, past hires that were made. Then we ask these crazy big black box algorithms, "All right, do the best job you can predicting who's going to get hired on this job with these applicants." And we use that data to test how well these algorithms are doing. That's machine learning in 30 seconds basically. So we're working on this problem. Great.

Well, if I could predict who's most likely to be hired, then I should just rank people based on that, and that would be a good matching algorithm. That'd be a good way to sort and triage applicants for employers when they're screening, trying to figure out who to interview, who to hire." Great. Sounds pretty natural.

If you think about it a little bit, you realize what that algorithm is doing, it's really just picking up on patterns in past data. So yeah, that's great. This person is likely to be hired. But what we really want is something different. We're trying to add value by ranking people.

''correlation is not causation''

Well, when we teach people to build machine learning models, we're asking them to make predictions, we're asking them to find correlations. Prediction is inherently about correlation. But when we ask people to make decisions, we're asking them to think about causation. "If I make this decision, then will I actually increase the net value of my business? Will I have by sending the promotion, increased the likelihood that this person is going to spend more on my platform?"

the first and most important thing that I feel very strongly about in what would I get a data scientist to do is no matter who they are, even if it was that person in the weeds thinking about building this prediction model for hiring, get them to be thinking in the back of their mind always that their goal is to help the business make decisions. And that the distinction between causation and correlation matters a lot.

So the takeaway here is as a data team and as a data scientist on the team, is help the business make predictions.

And when I think about the distinction between two different ranking algorithms, I don't want to be only comparing them in terms of how well they recreate the choices people made in the past. The way I'm really going to evaluate those is in my market, does one of those lead to better matches or more matches than the other one, right?

About experiments: finding a balance with running experiments, but also creating opportunity to find a huge new opportunity

if you just run a bunch of experiments, you're kind of going to micro optimize, lead to these local maxima, and you may miss big opportunities and big unlocks if you're just extremely experiment driven.

to either be less worried about optimizing and missing something big, or just finding a balance with running experiments, but also creating opportunity to find a huge new opportunity?

So the big lesson of this Microsoft paper, it's called A/B Testing with what's called Fat Tails, which in lay terms just means you're running a business where there's potentially big opportunities out there if you look at the effects of the experiments that you run. There's a couple of lessons there about both trying a lot more stuff that's not all risk averse, and not necessarily running everything for so long. So really getting velocity up.

Terminology of Win and Loose in Experiments

to fail big actually requires changing the terminology of wins. This is one of the things I hate most in A/B testing, I have to say. I get where it comes from. Experimentation was never historically in science about winners and losers. It'd be weird if it Ronald Fisher who's kind of the father of experimentation with his agriculture experiments talked about winners. I don't think that's necessarily how he talked about things. Experimentation is always very hypothesis driven.

And that's really an important distinction because what it means is if I go with something big, risky, and it, "fails," meaning that doesn't win. Nevertheless, if I was being rigorous about what hypotheses that's testing about my business, I'm potentially learning a lot.

So learning is a win.

if you can not run the A/B test long enough then 'beliefs' come to the table!

And in the end, if you can't do that, you can't run it long enough, or you can't do that data analysis due to sparsity of data or lack of data to address the question, it matters what you bring to the table. What are your beliefs about that?

the statistical methods used typically, P-values, confidence intervals, these fall into a branch of statistics known as frequentist statistics. And the idea behind frequentist statistics without being overly technical is just I let the data speak for itself. There's no beliefs brought to the table about where that data came from.

But if you think about this in a company, in A/B testing a company, it's a weird thing, right? Because I might've run 1,000 A/B tests in the past on this exact same button, or call to action, or color, and now I am going to completely ignore that and focus only on this.

*****So there's ways to take the past into account, to build what's called a prior belief before I run an experiment, and now take the data from the experiment, connect it with the prior, to come up with a conclusion of, "Okay, in light of the past plus this experiment, what's it telling me about the future?" And that falls broadly under the category of what's called Bayesian A/B testing.

Additional Resources:

• A/B Testing with Fat Tails: https://www.journals.uchicago.edu/doi/abs/10.1086/710607

• The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon): https://www.lennyspodcast.com/the-ultimate-guide-to-ab-testing-ronny-kohavi-airbnb-microsoft-amazon/

• Bayesian A/B Testing: A More Calculated Approach to an A/B Test: https://blog.hubspot.com/marketing/bayesian-ab-testing

• Designing Informative Rating Systems: Evidence from an Online Labor Market: https://arxiv.org/abs/1810.13028

• Reputation and Feedback Systems in Online Platform Markets: https://faculty.haas.berkeley.edu/stadelis/Annual_Review_Tadelis.pdf

• How to Lie with Statisticshttps://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728

• David Freedman’s books on Amazon: https://www.amazon.com/stores/David-Freedman/author/B001IGLSGA

• Four Thousand Weeks: Time Management for Mortalshttps://www.amazon.com/Four-Thousand-Weeks-Management-Mortals/dp/0374159122