Episode 8

How to Leverage Data For Exponential Growth – Tarush Aggarwal

Tarush Aggarwal

Transcript

Description

In this episode we are joined by an industry veteran who has worked for some of the biggest names in the enterprise Data world. Tarush Aggarwal shares his journey from his early days at Salesforce.com and then WeWork, right through to the present day.

He reveals how to set Data Science & Engineering up for success in both small and large organisations.

Show Notes

Resources

Episode Summary

How Salesforce leveraged data to grow their company fast
How Mark Benioff ensured his vision was executed effectively at Salesforce.com
What it was like to join WeWork at the start of their data function
The differences between how WeWork and Salesforce.com leveraged data
How to structure a Data function – centralised V decentralised V hybrid model
How Spotify structured their data team to scale the business
Can a Fortune 500 business make the hybrid model work?
The fundamentals for a new start up – how to get building a data function right
Product company versus service delivery company – how does that affect the data function structure?
What’s next for data privacy?
The 5x Company – entry-level training program, what it is and who its for?
Data Mastermind groups – are they the way forward?

Tarush Aggarwal LinkedIn Profile

https://5xdata.co

Series you might like

"Insuring Non-Determinism”: How Munich RE is Managing AI's Probabilistic Risks

1 Part

AI V Humans

2 Parts

Data Strategy Evolved: How the Biological Model fuels enterprise data performance

1 Part

Deep Fakes

2 Parts

Enhancing GenAI with Knowledge Graphs: A Deep Dive

1 Part

Enterprise Data Architecture in The Age of AI - How To Balance Flexibility, Control and Business Value

1 Part

Future AI Trends: Strategy, Hardware & AI Security at Intel

1 Part

How AI Is Driving The Eradication Of Malaria

1 Part

How AI is Reshaping Startup Dynamics and VC Strategies

1 Part

How AI is transforming data exploration and visualisation in the enterprise

1 Part

How Observability is Advancing Data Reliability and Data Quality

1 Part

How Science is (mis)communicated in Online Media

1 Part

How to Leverage Data For Exponential Growth

1 Part

How to Use Neural Networks

2 Parts

How XPRIZE is enabling AI for social good

1 Part

Image Processing

1 Part

Key Principles For Scaling AI In Enterprise: Leadership Lessons

1 Part

Mapping forests: Verifying carbon offsetting with machine learning

1 Part

Maximising the Impact of Your Data & AI Consulting Projects

1 Part

Predicting the Next Financial Crisis

1 Part

Steel has been shaped by fire and force for centuries. What happens when you add intelligence to that equation?

1 Part

The Evolution of GenAI: From GANs to Multi-Agent Systems

1 Part

The future of LLMs, ELMs and the semantic layer

1 Part

The Path to Responsible AI

1 Part

The Pitfalls of Using AI Systems for Hiring - Julia Stoyanovich, NYU

1 Part

Transforming Freight Logistics with AI and Machine Learning

1 Part

Understanding Cause and Effect: Is Causal Discovery The Missing Layer in Artificial Intelligence?

1 Part

Using Open Source LLMs in Language for Grammatical Error Correction (GEC)

1 Part

Using Time Series Analysis to Uncover Why Gun Sales Increase After Mass Shootings

1 Part

Why Evolutionary Biology Has Big Implications For Future AI Development

1 Part

Transcript

Speaker 1 (00:00:04):

This is the data science conversations podcast with Damien Deighan and Dr. Phillip Diesinger. We feature cutting edge data science and AI research from the world’s leading academic minds and industry practitioners, so you can expand your knowledge and grow your career. This podcast is sponsored by Data Science Talent, the data science recruitment experts. Welcome to the data science conversations podcast and my name is Damian Deighan. This week we are changing things up a little bit. We’ll be talking about the critical things that data scientists need to get right in order to ensure that the data science function is successful.

Speaker 1 (00:00:51):

Phillip won’t be joining us today, but don’t worry if you’re a diehard Phillip fan, he will be returning on our next episode. However, as part of a wider plan to almost have a little mini collective of co-hosts on this podcast, with me today is someone new, and my fellow Irishman, Peter Roche. Peter sits firmly at the intersection of data science and engineering. He did his PhD in applied maths and theoretical physics at Cambridge, and he also spent a year as a researcher at CERN. He’s been working in industry since 2004, starting with eight years as a quant at Royal Bank of Scotland. And, since leaving financial services in 2013, he has been working as a senior freelance data science engineer / scientist contractor for the likes of Mars and in the pharma sector, Boehringer Ingelheim. Peter, really good to have you on board and in case you’re worried about our real plan and that it might be to flood the podcast with only Irish people, fear not because our very, very special guest today is definitely not Irish because today we’re very excited to talk to him and delighted to have with us, Mr. Tarush Aggarwal. Tarush, welcome to the podcast.

Speaker 2 (00:02:09):

Thank you for having me, excited to be here.

Speaker 1 (00:02:11):

So, by way of introduction, Tarush is undoubtedly one of the world’s leading experts in helping organisations leverage data for exponential growth. Tarush graduated with a degree in computer engineering from Carnegie Mellon in 2011, he then became the first data engineer on the analytics team at Salesforce.com. At that time, data was still in its infancy and the log metric framework which Tarush built was critical and allowing Salesforce to analyze data across customers and provide benchmarks across different industries and verticals. More recently, Tarush led the data function for, We Work, one of the fastest growing companies in the world. They managed to leverage their data to grow 10 X in just three years. That supported a footprint of 800+ offices in over 120 cities in 23+ countries and We Work also managed to scale to over 12,000 employees in that time. Tarush built the data function from two to over a hundred and their unique approach allowed them to stay lean while supporting every functional area of the business. In 2020 Tarush left to found the The 5x Company Company which supports entrepreneurs in scaling their businesses. The 5x Company’s programs are built on a first principles approach helping rapidly growing companies leverage data automation and self-service to out execute their competition. Tarush welcome, and thanks for joining us.

Speaker 2 (00:03:46):

Thank you. Looking forward to being here.

Speaker 1 (00:03:49):

Let’s start then perhaps at Salesforce. So tell us how and why you ended up going to Salesforce Tarush.

Speaker 2 (00:03:56):

It seems like a lifetime ago, but since pretty early on, and I come from a family where engineering runs in our DNA, my mom had an e-commerce business, so I’ve always been a little bit more technically oriented and software engineering was always something I’m super passionate about. So after college got lucky back in 2010/ 2011, when it still wasn’t easy to get a job. So I got very lucky. I ended up in Silicon Valley at Salesforce as a software engineer. What’s interesting is, you know, I very quickly realised that software engineering isn’t really for me, what a lot of software engineers do is, you know, they’re optimizing small parts of the stack – it’s really important that Google gets 0.1 second faster, but that’s just something I personally wasn’t interested in. So very quickly I realised it’s not for me. I happened just to be at the right place at that time. As you mentioned, Salesforce didn’t really have a data team back in 2011. Data was very much at its infancy, so I got to work on some of the early frameworks, which allowed Salesforce to analyze data from its logs and measure benchmarks across customers.

Speaker 1 (00:05:15):

What were your biggest learnings from working there?

Speaker 2 (00:05:18):

It was a really interesting time working in a company like Salesforce, which was very much on a forward trajectory. You know, what Salesforce really today is known for is really inventing the concept of software as a service. And back in 2010, it was on it’s way to prove it, but it still wasn’t as well-recognised as it is today. So there was still some, you know, doubt in peoples’ minds that is this really going to become a way of selling, or are you going to stick to the more traditional models at that time? Oracle and Microsoft, you know, were considered more traditional companies. People were still unsure of these companies were going to come and quickly swallow Salesforce’s market. So it was just a very exciting time to be in software. Salesforce was still very hungry. I remember when I joined, they had an engineering team of about 1,500 and a few years later, they were probably about 3,000 – 4,000 just in the engineering side. So it was still growing extremely quickly. I got a front row seat into what’s needed at a larger organisation. And, you know, really how do larger organisations think in terms of the build versus buy decision or feature development. And that was really interesting.

Speaker 3 (00:06:40):

Did they have a clear sort of, from the top down, like a clear plan? You said that, you know, software as a service was in it’s infancy, but was there a clear plan at Salesforce? Like, this is the way we’re going, even though it’s going against the trend or not the same approach that Microsoft and these other big companies had. I’m just wondering if that was articulated from the top down?

Speaker 2 (00:06:58):

Salesforce at that time, had a process which was called V2 MOM. It reminds me a lot of the OKR process, which has become really popular for tech companies. Salesforce would have yearly as well as quarterly goals, and they would spend a painstaking amount of time to really finalise these at every functional level of the business, and then break it down at the team level. So this is something that they did a really, really good job, and it was for sure a massive investment in terms of time, but also just people and bringing on experts to really teach the company of how to set these metrics and measure them. One of the reasons why they probably were pretty well aligned, even at scale, was because of the amount of time they spent on this alignment.

Speaker 1 (00:07:51):

What other key things Tarush do you think that big companies could learn from what they did at Salesforce?

Speaker 2 (00:07:59):

I think going back to the founder and just how important Mark Benioff was in setting the vision for Salesforce, and then having faith that his teams are going to execute on that vision. I think Mark made some massive acquisitions at that point. I remember we acquired a company called Exact Target for billions and billions of dollars, and it wasn’t necessarily well-received by the media or other industry pundits. It was a bold decision and, in hindsight, it’s obviously paid off extremely well. Salesforce was really very forward forward-thinking, and even the social media sort of revolution, Salesforce launched a product called Chatter, which really shared, you know, what people are working on. It allowed you to have visibility to what your co-workers are doing, what cases they’re working on, be able to get updates on various different metrics or tasks. And again, that was really forward thinking for the industry. So I think Mark made really big bets. He sort of, he wasn’t afraid to make these big bets, even though Salesforce wasn’t really at that same tier of, of some of its bigger counterparts. And I think in hindsight, those investments paid off really well.

Speaker 1 (00:09:26):

Moving on to We Work Tarush, obviously a completely different animal, much smaller. Talk to us about maybe the differences between what you did there and what was happening at Salesforce.

Speaker 2 (00:09:39):

I think, you know, the, We Work journey starts off just on a personal level. I was a completely different person by the time I got to We Work, right. Salesforce was my first job out of college. By the time I hit, We Work, I was six or seven years into my career, I had really spent time focused on the data space. So going into We Work, I just understood the space a whole lot more and was just much more aware of some of the inner subtleties of data and working with other stakeholders, which was very different from Salesforce, where I happened to stumble across a project, which later became a much bigger thing. I think that’s one of the differences. The second was We Work was much, much smaller at its time than Salesforce was when I worked at Salesforce. When I joined We Work there were 1,000 employees in the whole company and the tech team was about 100 employees, which is 10 times smaller than where Salesforce was a couple of years earlier. In many ways, I think the difference came back down to, you know, the CEO in their style of execution. Adam Newman, like Mark Benioff, was also a very, very big thinker. Adam really embodied the weak culture and, and this concept of treating others like the way you want to be treated and making a life, not just a living and this idea that the co-working space or the workspace is one of the few industries left which hasn’t been revolutionised by technology, right? If today you want music, you’re no longer go to buy a CD, you’re gonna pay Spotify. If you want a cab, you’re no longer gonna flag a cab down, you’re going to book an Uber. Why is it that if you wanted an office, we still have to go sign a five-year lease and manage it ourselves and tomorrow, if we go hire more people or we have to let go some people it’s not very flexible. So this idea that that sort of technology was going to eat up that space and this is how people were going to live in the future is a super grand vision. Salesforce had a massive vision, but they were still focused and they stick to a single lane, right? They stick to a broad space of software to allow businesses to execute faster, right, to sell more, to service their customers. We Work gained a lot of traction very quickly. And we started getting into just about everything, right? So, towards the end, we were doing We Work, which was obviously the co-working flagship product, but we also had, We Live and we had gyms and we had coffee shops and we had our own school and we just started really going super, super broad. And ultimately that lack of clear focus was one of the downfalls at We Work. That same level of OKR setting and goal setting, which happened top down, which really kept Salesforce in check. Um, there was a lack of that at We Work which ultimately, in my opinion, was the downfall.

Speaker 3 (00:13:03):

From a data perspective was it very different? I imagine at Salesforce, there was a lot more customer data, whereas at We Work, it was maybe more in-house data and I was just wondering what the differences were and, you know, from a data analysis point of view, what things were being analyzed and the scale of the data. Can you give us a little handle on that?

Speaker 2 (00:13:28):

At Salesforce, a very different architecture with Salesforce is a platform where other customers come to store their data. Whereas We Work in some way is more like a service provider where we do have some customer data, but that’s mainly around access. We did have a social network which allowed members to interact with each other to post help when they needed it to resolve tickets. But again, it wasn’t anywhere close to the order of magnitude of Salesforce. At one point we were processing a lot of information in the petabytes a month at Salesforce. The We Work data warehouse for a long time I could probably fit it on my iPhone, right? So We Work had 1M members, right, approximately 1,000 members in offices, 800 offices, so somewhere in the millions of members at its highest point. Whereas, sort of Salesforce through the customers it served was probably serving millions if not billions of customers, probably millions of customers. So just a massive difference in the volumes of data processed. And also at Salesforce, a lot of the analytics were around how do we allow customers to gain more insights from the data? So we were really focused on building tools which allowed customers to do more with their data. Whereas in, We Work, we were really focused on how do we help We Work in opening offices faster, or building other products which our customers will love. So, just that difference in sort of context, made working at those two companies, extremely different.

Speaker 3 (00:15:22):

Purely technically what platforms did they use or what were the sort of the main tools? I mean, people listening to the podcast will be familiar with Spark and various things like that, but I’m just wondering what tools were used and, maybe even leading on from that, how much customisation or in-house development of new tools, you know, to deal with bespoke problems within each company.

Speaker 2 (00:15:45):

So, when we were doing things at Salesforce and we wanted to analyze log data, you know, we didn’t really have any of the frameworks like CAFCA only have the stream processing frameworks which exist today. We were shipping our own log files from app servers into a central data storage and we used big actually to process the logs and extract metrics from that. At that point, Salesforce was really starting to invest in a lot of these capabilities in-house. So you’ve seen that with the Facebooks and Googles and Netflixes of the world where the infrastructure requirements become such that they start to build their own in-house tools to support that. So during my time at Salesforce, we were just entering that phase where we were starting to do a lot more in-house. I remember we looked at Splunk in an attempt to see whether it could help us with some of this log data and it just wasn’t capable of dealing with data at Salesforce scale. Fast forward a few years, by the time I was at We Work, the data architecture was starting to get more standardised. So, you know, initially we were on Redshift, which served us up until some point when again it became too slow and it wasn’t performance enough. We migrated over to Snowflake and we also used Looker. We were probably early adopters of some of the technologies in data which today are a lot more standardised. So I think today, if you think about what is the data stack names like Five Trend, Snowflake for the warehouse, Looker for the BI tool, DBT for data modeling and transformations these are the common suspects. Um, and that’s really the stack which we were using at We Work.

Speaker 1 (00:17:46):

So in terms of the, maybe the high level structure of the data functions at both of those companies, uh, was it very similar Tarush or were there big differences in how the function was set up?

Speaker 2 (00:17:58):

Salesforce at that time structured his data teams in a way which were a little bit isolated from the business, which made a lot of sense, because there were no best practices around how do you really embed the data teams into the product teams? So from a risk perspective, it was probably a good decision at that point. Fast forward when we went to We Work, we initially had a centralised data team, which was the single data team across the company. And then as we started to scale, we moved more towards Spotify’s mission and tribe model, where each mission had its own data team but we also had a central data team which was responsible for the key metrics across the company. So at that point, data became both centralised and decentralised so it was both the horizontal as well as the vertical.

Speaker 1 (00:18:58):

How well do you think that works, that kind of hybrid model?

Speaker 2 (00:19:02):

Every company goes through this dilemma, right? There’s this leadership joke, which I really like, where all companies initially start off centralised and at some point they go to this management group and they’re like, you know, everything is taking much longer, the central teams become a bottleneck. What do we do? And the gheru just looks at them and says decentralise. So they go do that and everything works well, works amazing for a while and at some point they go back to the gheru and say, you know, now we’re moving quickly but we don’t have a single source of truth for anything, there’s no consistency and so what do we do now? And at this point, the guru says, start to centralise. So I think it’s natural to hover, to both extremes. And really what I’ve seen is the extremes are really when some of the problems start to arise, you know, having more of a hybrid approach where you figure out what are some of the more important metrics which need to be centralised and for these metrics, if somebody needs to change them, it’s going to take longer. There’s going to be more than process involved just because these metrics are more widely used, while at the same time also giving individual missions autonomy to move much quicker for data sets which won’t affect everyone. So it really takes some of the advantages of both approaches. So it makes sense, sort of having said that with this, comes some of the disadvantages of both approaches too, when you have both a centralised and a decentralised function, it’s now a little bit confusing for people to understand what is the process for different types of metrics. So it starts to build some sort of debt in the organisation but again, at scale, I think being hybrid is the only real approach you have.

Speaker 3 (00:20:59):

Do you think there’s a sweet spot for that? Like for the size of business, where it moves from like a very small startup, it’s going to be despite its nature centralised, as the business grows you’re going to need to split that. And I also wonder if everyone was sitting in one room, in one office it’d be much easier to manage, but I can imagine geographical constraints would be a big player.

Speaker 2 (00:21:22):

I’ve seen that different industries sort of fundamentally invest in data at different times, right? For something like an e-commerce company or a marketing company or a traditional real estate company, they are going to want to invest in data at some point in the business, but it’s likely going to be much later than a traditional SAS company. So, depending on what you’re trying to do, you’re going to want to be more data-driven and that really dictates when is the right time to centralise or decentralise or, you know, sort of really be somewhere in between. I think initially starting off centralised makes a lot of sense because you just don’t have the sheer volume of resources to be able to give every team their own dedicated hires. So, you start to hire specialists since you don’t have a lot of these specialists. Having these specialists sit centrally and focus on top level company initiatives makes a lot of sense. By the time you start to scale this theme and you have proven success with the business and you start having a lot more top level company objectives where it’s no longer, you know, really enough where one person can focus on all of these initiatives. At some point, the divided conquer bottle starts becoming required. At that point, it makes sense to start to decentralise, right? The idea of also central teams, more often than not, means that you also have single points of failure because as you centralise individual people start to really get tied up to roles. How often do you find yourself in a company where, you know, all transformation jobs are sort of written by this one person and you sort of get into that sort of classic problem of what happens if this person’s hit by a bus? I’m a firm believer in organisation principles that are a much better predictor of people’s behavior, instead of something like a performance review. You know, you can speak to this person as much as you want and try to get him or her to change their behavior but if you move a decentralised org structure, it’s much more conducive to then forcing the organisation to be able to adapt to that different paradigm.

Speaker 3 (00:23:56):

I’m just laughing at the idea of that one person who, you know, may get hit by, an unfortunate term, but hit by a bus, but in every business I’ve worked in there’s been that person and I’m just wondering how you, what practical steps you can do to remove, not remove that person but to remove that risk from a business perspective, because it is a risk, a real risk.

Speaker 2 (00:24:17):

We had one of those people at We Work and he was probably one of the smartest people I met.

Speaker 1 (00:24:23):

They always are, yes.

Speaker 2 (00:24:24):

Exactly right. And I was fortunate that he left We Work at one point and started his own company. In hindsight, that was the best thing which happened to us because it forced us to change. It would have been very difficult as long as he was there to be able to do that. And what this brings to life for me is this other concept that very often sort of someone who has played that role might just not be the right person to be at that company for the next role of the company. Very often, you know, you have people who are amazing to go from zero to one, and these are just not the right people to go from one to two, right? When you need to bring on other people and enable these other people. There’s some people who just want to run as fast as they can. So my advice is not to like, let them go, but just this is sort of something to keep in mind that as companies evolve very often, the people you have don’t fit in into the next part , into the next equation. That’s not a reflection of yourself. That’s more so something which is on them to either adapt and evolve or find something else. Right. So if had to do it again, I would just be faster to make these decisions on when to move from centralised to decentralised and I would worry less about how people are going to fit into that model and sort of worry more about looking at the science to know that it’s time to make that shift. And then go ahead and make that shift.

Speaker 1 (00:25:57):

Maybe just winding it back a little bit, you mentioned the Spotify model. I think most people perhaps are familiar with it, but maybe a lot of people won’t be. Could you just give us like the high level overview of how they structure things.

Speaker 2 (00:26:10):

It’s going to be really challenging because Spotify has made a world-class video on this. So, for those of you listening, I would highly recommend you go and watch the Spotify video. But, on a highest level, in order to scale the business, what Spotify does is they split out the technology team into different chapters. At We Work we call them missions and each mission, or each chapter is responsible for a different area of the business. At We Work we had one for member facing products, one for billing systems, one for the core platform. So sort of different areas of the technology charter and inside these missions you had tribes which were, you know, a sub area of the mission or of the chapter, so a smaller piece of that pie. And then a tribe would further be split out into, the smallest unit, which was a squad, which is really a 5 -10 person team, which you can feed with the two pizza rule, which Amazon became famous for, right, and that’s really the smallest unit. So you have a bunch of squads comprise a tribe and a bunch of tribes are then a chapter. And this really sets a really good triangle structure for the organisation to scale. Now, each squad is comprised, not only of software engineers / product managers, but they also have data people / designers so they are really self-sufficient. So a squad is capable of working on any different parts of the stack. So when you scale this up into the tribes, as well as into the chapters, you now have fully functional teams which can own a particular part of the company and they have full autonomy on this area of the business so they can make decisions independently. And since they can make decisions independently, this is a really scalable model for organisations to grow in scale. Obviously, at scale, there are a few more concepts, but I think for now, this is just a good starting point of understanding how Spotify organises teams.

Speaker 1 (00:28:33):

Whilst they’re independent teams they still have this attachment, if that’s the right phrase, to like a small centralised team that coaches and keeps them, I guess, on track, is that fair?

Speaker 2 (00:28:45):

That’s the concept of guilds where if you have, let’s say data engineers, now it’s really meaningful for data engineers to spend time with other data engineers. So what you draw is all of the data engineers come together in the form of guild, and maybe you have a central mission or chapter who has some of the data people they might lead the guild. So they’re coming up with best practices of how every chapter should start to standardise. While individual people report into the squads, tribes and chapters, they have this dotted line reporting into their functional area, which is called the guild where they really share best practices and ensure that all of the chapters or all of the tribes or squads are roughly moving in the same direction.

Speaker 1 (00:29:40):

Then I guess the million dollar question is for a big organisation in particular, which is perhaps very traditional in its structure and maybe mindset, have you ever seen a fortune 500 company or a very large company being able to, you know, work with that hybrid model or do you think it’s even possible

Speaker 2 (00:30:00):

Very theoretically, this model makes sense for any type of organisation? What becomes really tricky is when these companies acquire another company or sort of, let’s say they start a different department. For instance, at Amazon, the Amazon web services team is pretty well organised. And, you know, if you pick two squads inside the Amazon web services group, they roughly might be pretty similar. But now when you start getting into Amazon logistics or a new department, sort of, that’s where it gets really tricky. So I think for fortune 500 companies, I like the idea of having standardisation and having a chapter or a mission type charter and inside that mission having consistency makes a lot of sense. I think what becomes really difficult is having consistency across missions, especially when you start to deal with acquisitions, new product lines, that becomes really, really hard at massive scale. And I think that’s where we can start to be a little bit more sort of forgiving on not having that standardisation. And I think that’s a way which sort of fortune 50 or 500 companies can start to implement this type of approach, which will still give them a lot of the benefits of being able to structure for scale.

Speaker 3 (00:31:34):

Do you think it worked at We Work in Spotify because they’re similar size businesses, so that was maybe a sweet spot for this approach?

Speaker 2 (00:31:42):

Yeah, I think, I think there’s a big difference in the We Work and Spotify scale compared to the Amazon and Google scale. And also We Wwork and Spotify had the opportunity to implement these systems early on. Whereas for some of these larger players it’s much harder to change and move to a system like this than to do it from the start. And the third thing is I think Google and Amazon and Microsoft and Facebook are doing it well, but you’re also talking about the four best tech companies. This is literally the golden standard of organisation and how to do it correctly. And I think a lot of the fortune 50 or 500 players are structured very differently from the Facebook and Googles of the world.

Speaker 3 (00:32:28):

Just to go back to the small chapters and interestingly you said you think within the chapter, there was a range of skills and each chapter would have a suitable range of skills with maybe data engineer and data scientist. And just one thing in my experience is that, and its been problematic, is where there maybe hasn’t been that close relationship between those two functions and that has caused real problems in businesses where the data scientists have worked on something and they are may be grouped together and then they’ve developed their model. They’ve prototyped it and then it’s like, let’s hand it over to the data engineers to get it in production, to get it into the systems, to get it used by the wider business. And that intersection has been a real problem. This model then presumably it’s an attempt to fix that or it fixes it by its very nature.

Speaker 2 (00:33:16):

I think the model where somebody builds something and hands it off to someone else is fundamentally flawed and means you stop. You know, earlier we saw this model when software engineers would write code and then hand it off to a QA engineer. As soon as they write code, they sort of wash their hands and are sort of done for the day. This sort of encouraged software engineers to write bad code because they weren’t responsible for the end-to-end life cycle of the code. And I think we’re starting to see that today in the data world with data engineers and data analysts where data scientists can write a model and they haven’t necessarily tested that model on production data and they have no idea how they’re going to deploy it. They just expect someone else to do that for them. This is creating a disjointed system. I think what data engineering or data platform really, which is the right term for the team which would be building the infrastructure for data scientists, but the data platform team they’re responsible for doing is building out tools, which allow data scientists to deploy the models at scale, but the end to end control of deploying the models and probably even parts of making sure they’re operating correctly at scale is not anyone else’s responsibility but the data scientists. So, you know, moving away from that architecture where this is my role, and this is what I’m responsible for to having end to end ownership and expecting that you will be given the right tools in order to make your job easier. Now, if a data scientist needs to spend 50-60% of their time deploying their jobs to production or monitoring their jobs, then the infrastructure or the platform teams haven’t really done their job because it’s not easy to use. And this is not an easy place to get to because it means that you have to invest in having a centralised data platform team, which is enabling all of your other chapters, all of your other missions. And that’s again, a luxury you start to get when you scale up. So it not be the right paradigm for someone just getting started with data science. But that’s one of the reasons why, in some of the bigger companies, you have the problems which we’re having today with data engineers, data analysts, data scientists.

Speaker 3 (00:35:52):

I think so and I think it’s probably larger businesses who have come from a bit more of a traditional background and they’re trying to retrofit to solve this problem. Whereas younger, smaller businesses who have maybe architected things, with this in mind, from the start and got the right people and got them working together, that’s a much easier way to presumably to avoid the problem.

Speaker 2 (00:36:16):

Yeah. And I think, you know, going back to the smaller companies and how do they solve for this. Now, if you’re a smaller company you’re anyway probably starting off centralised. And if you’re just getting started out, chances are, you don’t have a data platform engineer and a separate data engineer and a separate data scientist, right? So you need to be responsible for the end to end execution of this. And what I find some of these really small companies do is they really hack together the infrastructure piece and spend time on the data modeling side. Building these models without building some sort of fundamentals is like trying to plan to build a skyscraper and then not digging up the earth to build a foundation. So, you know, if you are a really small company and know you want to get started but you still need to invest some time and effort in building out a architecture and a fundamental which is going to help you deploy your models, run them, make sure you have that feedback loop going so that you can constantly keep iterating on your models.

Speaker 3 (00:37:27):

Presumably then, knowing for someone who’s starting a business or, you know, an executive knowing those technical skills to get that right, that’s I can imagine is not very common.

Speaker 2 (00:37:40):

It is challenging, right. And you know, I spent my whole career in data and even today, like it’s hard sometimes to keep a track of everything that’s out there. So, you know, if you’re an executive and this is not your area of expertise, it becomes really, really challenging for you to figure out again what to do, what not to do. Now, you know, over some time some of this, some of the best practices start to get more standardised, right? Like now we’re at this point where on the data reporting side, again, it’s a pretty well understood problem of moving towards ELT, having a data warehouse, having a BI tool, some of the tooling layer, like I mentioned previously is pretty well standardised. We haven’t yet come to that understanding of the data science part of the stack. And I think the next five years will be particularly interesting where we’ll start to get there. You know, with Amazon and Google and all of these clouds, having better sort of really big into their data science sort of stack and with some of the data reporting stuff now, more stable, there’s just a lot more attention on the data science stuff. So I’m not sure what that stack really looks like as yet, but it’s something which in the next few years will become more clear.

Speaker 1 (00:39:08):

What would you say then are the fundamentals for setting it up correctly? And if we look maybe at a newish startup that is a SAS company or an AI product that they’re producing, what should they, that owner of that small startup, think about?

Speaker 2 (00:39:26):

I’m convinced that an early stage data team has only got one job and that is to allow the business to answer questions for itself. If the business can answer questions for itself, all of a sudden, every employee in the company now has the autonomy to go make a decision for themselves without, depending on the data team. If they depend on the data team, you are setting this habit which is not scalable, which means as your organisation starts to scale, you’re going to have to scale your data team. And that doesn’t really work. So the first thing a data team really has to do is set up reporting in such a way that the business can answer questions for themselves. Once you do that, and the sweet spot is what you know, is what I say is answer 80% of questions in a self-service way. So that means that if you have an intern who joins tomorrow, can this person answer complex questions on your go to market strategy or on how your customers are using your product in a self-service way. Once you set up this foundation, now, all of a sudden the data team can go focus on some of the needle moving work because the organisation is now self-sufficient, and doesn’t depend on the data team to go answer these questions for themselves. If you initially jump to the recommendations or the insights, what you’re essentially doing is you’re just working on ad hoc analysis for the business. And you, just like all of the stakeholders who depend on data, you’re just going back to the raw data and trying to work on an analysis or answer questions for the business. And that breaks very, very quickly.

Speaker 2 (00:41:32):

What this then means is, if an early stage data team is just responsible for setting up the organisation for self-service, then how does a data team get to this point? And in order to do that, there’s a three step process, right? The first thing you need to do is you need to ingest your data from all your different data systems into a central place. Even a small business, you might have multiple data sources. You might have your application database, which is responsible for your website or your app. You might have marketing data on Facebook or some other tools. You might have user interaction data in something like Mixpanel. You might have customer information in a CRM like Salesforce. At We Work we had over 200 different data sources. We were dealing with physical data plus online data. So even for a small business, it’s not uncommon to have 10 different data sources.

Speaker 2 (00:42:35):

If you have to manually go and pull this data every time it’s just not going to work. Right. And a few years ago, 60/ 70/80% of time of data teams was really spent on building these pipelines and moving data. So, you know, today with tools like FiveTran, you can now really start to automate this process. So the first thing you have to do is you have to start to ingest your data centrally. Now, once you have this data, you have all of this raw data inside your warehouse. This is not data which you want to answer questions from. Number one, it’s structured in a way which makes sense for the different applications, not for answering business questions. And number two, this data can change without any notice. So if you build an analysis on top of it, if you engineer changes something in your source system, it’s going to break your analysis.

Speaker 2 (00:43:31):

So the second thing we have to do is we have to come up with a new layer, a new data model, which is really built in a way to answer questions for the business. Now, what are the questions you’re trying to answer? What are the questions that are good for market strategy? What are the questions around your product, around how your customers are using your product? Figure out all those questions and then design a data model, a few of these data models, which can answer 80% of these questions. There’s no need to optimize for a 100%, but again, can you answer most of these questions from this data model. Once you design this data model, you can now walk forward from your raw data and build these transformations. You’ve now created a business layer, which is insulated from the raw data changing.

Speaker 2 (00:44:18):

This is step two. And the third step is you now invest in these self-service tools and set them up in a way such that non-technical users can start to answer complicated questions from this data model. And this data model is really where your data scientists or your data analysts work from. They don’t go all the way back to the raw data and work from there. Because if they do that, now you have all of the same problems where if the source data changes it breaks, or let’s say you change some sort of business metric. Now you have to go and change every single job which had its own version of that metric instead of being able to change it in a single place. These are the three steps, ingestion, modeling, and then self service and this should be the only thing an early stage data team is focused on. Once you can do this, now go worry about data science, recommendation insights.

Speaker 1 (00:45:26):

Are the same fundamentals, true for large companies?

Speaker 2 (00:45:29):

Large companies split themselves up into different missions, tribes, and chapters. So yes, it is true because again, each chapter can have this fundamental data model. And this is really how some of the big tech companies go build products. They start off with having small teams and these teams have autonomy on the area. They also are able to ingest all the data, which they depend on, model it in a way to answer questions for themselves, and then allow anyone in those squads to answer questions. And as they scale up, they might have different layers underneath them, but it follows the same general structure. Once you set up in this way, you can support an organisation of a thousand or one. It’s the same thing. It’s the same pyramid. And it’s the same three step process at every different part of the stack.

Speaker 3 (00:46:27):

Now companies, do you see them moving to adopt this approach or is there pushback competing kind of methodologies?

Speaker 2 (00:46:38):

I think what we’re seeing is that once these organisations hit a certain point and they bring on experienced leaders. These experienced leaders are now organising the organisation into this approach with missions and chapters. And with this approach, you naturally tend to work in a way where teams have autonomy over their area. And then the data teams naturally sort of follow in that paradigm. What’s happening is for a lot of the smaller and medium scale organisations they don’t have this awareness as yet. So they are really sort of continuing to add more and more complexity into their stack and at some point it all comes crashing down, right? At some point it becomes very, very difficult to change a small metric. And I’m sure you’ve seen this.

Speaker 3 (00:47:41):

I can think of a few horrible examples. Yeah.

Speaker 2 (00:47:44):

Exactly right and that’s not a problem with the people you have with the intelligence is just a problem because of how teams are structured. You know, you start spending 80% of your time on the small mundane task instead of 80% of the time on the needle moving work. That’s because you have interdependencies everywhere. So, you know, we, haven’t quite seen small companies move in this sort of direction as yet and that’s something which, you know, I’m personally very invested in, in sort of helping companies with.

Speaker 1 (00:48:17):

I think also at a sort of personal level, if the data engineers or the data scientists are starting to spend a lot of time doing that stuff, I think that can lead to a bit of dissatisfaction as well. You know, and it’s, it’s not healthy from that perspective either.

Speaker 2 (00:48:31):

I mean for sure, like being on the data team, you’re trying to discover hidden insights in your data and really enable the organisation to figure out what to build next. Instead of that, if you’re spending time answering ad hoc questions for your marketing team or if the CEO’s got a board meeting the next day and you’re sort of staying up late because you need to put together some metrics, that’s a high stress situation. And I would bet it’s not what you had in mind when you signed up for the role. It doesn’t have to be that way. Right. Which is what’s really sad is that there is a best practice at how to set things up so you don’t ever get to that point. And that’s something which is missed by majority of companies and majority of data teams.

Speaker 1 (00:49:16):

I guess one, sort of nearly final question I’ve got Tarush, is the big difference between Salesforce and We Work is obvious that Salesforce are a product company and We Work are delivering a service. Did that difference have any impact on the structure of the data function or the types of data scientists or data engineers that you saw them recruit and be successful there?

Speaker 2 (00:49:44):

Again, it’s a little tricky to answer that because we didn’t really have that many data scientists or data engineers back at Salesforce since we were just getting started. But I do know a lot of what these teams look like right now. And, you know, in short, I think just bringing on A-Plus players is more important really than the type of A-plus players. So I think all of these companies, especially in Silicon Valley and in New York, are really focused on just hiring exceptionally smart people and really then figuring out what to do with these people instead of having a preconceived notion of exactly of hiring for a role. So I think in disregard both these companies are really very similar in that they’re trying to just bring in A-Plus players. A lot of these targets are moving targets as well, right? As we were just speaking about data science, not having this best practice in terms of the infrastructure layer, just in terms of the tooling layer and how to do it since a lot of these are moving targets and, and they tend to change very quickly. It makes a lot more sense for these companies to optimise and just bring in really smart people than a particular type of skill necessarily.

Speaker 1 (00:51:09):

The whole area of data privacy is a pretty hot topic right now. We see Apple and Facebook having a little bit of a, fight at the moment. How do you see that whole area playing out in the data privacy side of things?

Speaker 2 (00:51:26):

You know I don’t want to pick favorites over here. So I’m going to try and be careful how I answer this question. I think as individuals, we have a right to privacy. So I think this concept of privacy by design is something a lot more companies should be focusing on. You know, at We Work, we were building features with this idea that users can either opt in or opt out. So we ask you, hi, we want to collect your PII. And we’re going to use it to recommend you to connect with other members so that we can facilitate introductions between different members. And if somebody wants this feature, they’re going to say yes. And if somebody doesn’t, they’re going to say no, and it’s okay both ways. I think where we’ve ended up with some of these bigger companies who now sell access to the information, not sell the information, so we all know what happened when that happened. But build a sort of product, which monetizes their data for companies like this. There is no fallback option if they don’t sell these products. So there’s no real, other way for them to monetize. Hence privacy is not really an option because if the user opts out, then these platforms don’t make any money of these users. I think that’s really is how we’ve ended up in this situation. And I think Apple versus Facebook is really ultimately around, is it a user choice to choose whether they don’t want their information monetized?

Speaker 1 (00:53:24):

And it’s maybe a broader philosophical point in that these companies obviously need people’s data and they’ve collected vast amounts of user data and then they go on to make an awful lot of money of that. But the users benefit from the products, but they don’t necessarily benefit financially. And I guess you could say, you know, is that correct? Is that fair? I don’t know the solution to sort of pay them back in any way, but I think it’s something that people are addressing. And I don’t know. I mean, if there’s even technologies like blockchain technologies and things that you read about where people’s information may be privatized and these companies would in some way have to use it.

Speaker 2 (00:54:04):

What I find really interesting is that in my experience, the people who say no are a lot fewer and the ones who really make the most noise. So it seems like if you give the users an option and really add value on how will this data be used, if your features are compelling enough, I think the users would say yes anyway. And the ones who say no are anyway the ones who would be shouting the loudest. Probably the ones you don’t want using those features anyway. So you’re sort of going back to what I said. Right now we live in a world where it’s either you monetize everyone or you monetize no one, right? If you’re a Netflix or a Spotify of the world, you are not making money in that traditional sense by selling data, you’re using your data to build a better product for yourself so you can do better recommendations, but you’re not externally selling it. And these are the types of companies which are doing pretty well right now. And on the other side, you have the Facebook and Twitters of the world, which need to monetize everyone’s data. You know, I personally am excited about this world, where you could have a sort of hybrid model where you could choose if you want parts of your data to be monetized for recommendations. I personally love some of the recommendations I get on Instagram or on Facebook, because those are products I am interested in buying. Yet there are times when I wouldn’t want to share some of my data with political campaigns or in doors realms, because that’s not something I’m very interested in,

Speaker 3 (00:55:57):

Maybe the solution is for these companies just to be open and transparent, to buy data and to give the users the options to opt in opt out of suitable things.

Speaker 2 (00:56:06):

Exactly, I think being able to opt into what you’re interested in and opt out of what you don’t like, you know. Just using myself as an example, every time I get a political sort of message with trying to raise funds for, let’s say a member of a party, that’s something which annoys me because I didn’t really sign up for that. So I’m actually, you know, I have more dislike towards the platform for doing that. Whereas certain products when I’m recommended, I actually enjoy that. So it makes a lot of sense that these platforms will open themselves up to allow me to choose what I want to do with my own data. Again, we live in a world where it’s really binary. It’s either monetize or don’t monetize. And that’s really where I see a lot of the struggles really coming in from. I think what that also means is if you choose not to monetize, then some of these platforms might become paid. And that’s something which a lot of people are just not used to, right? Like, Hey, I don’t want you to monetize my data and I’m not going to pay for it either. That’s just something we’re subconsciously becoming used to, which probably will need to change. So I think a lot of reform is needed on these platform sites, but it’s also going to have a lot of impact for user behavior, which people are not talking about as yet.

Speaker 1 (00:57:27):

Moving up to the present day, you’ve in the last 12 months launched your own startup The 5x Company company and there are a couple of main services you provide, which are really quite interesting. And the second one of those is completely unique and not something I’ve ever come across before in the data science world. We’ll come to that in a second, but maybe start with giving us an overview of your entry-level training program, what it is and who it’s for.

Speaker 2 (00:57:56):

I spend my whole career in data. And what I’ve seen is that it’s difficult, even for me to keep track of what’s really happening in the space. So if you’re an entrepreneur and you’re not experienced in this, it’s really, really hard to invest correctly in data. And what we’ve seen is that 90 plus percent of companies are growing despite data, not because of it. We want to help as many companies out there leverage data so that they can grow exponentially. So what we do is we have programs to really help companies answer 80% of questions in a self-service way. So that’s our flagship program. So we work with companies who have proven out their business model, and now they want to invest in data to really scale the business. And we help these companies set up data in such a way, such that they can answer 80% of questions.

Speaker 2 (00:58:58):

So sort of everything, which I was already talking about, the three step process. We have a program which helps companies build these foundations in 12 weeks. If you’re a smaller company, and you’re not quite at that stage where you want to invest in bringing on data resources, we also have programs for companies who don’t quite have the same level of tech resources. So we work with companies right from early stage companies who are, you know, just at the ideation or just at the product launch phase to the bigger companies who have a product and narrowly want to double down on data.

Speaker 3 (00:59:42):

Is it a consultancy basis, or do you provide staff and resources for that?

Speaker 2 (00:59:48):

The typical consulting model is going there, identifying a problem and then really sort of selling a solution. The way I like to describe what we do is instead of selling you a fish, we really teach you how to fish. Once we teach you about the three steps and how to set this up in a way now, when your business priorities change, you know how to change your data models and change your BI layer, such that you can keep answering 80% of these questions in a self service way. Whereas a consultancy company is not necessarily equipping you with the same skills. If needed, we can help with the implementation of this as well. So if you really want to accelerate your timelines, we can help with this. We still want you to do the program because if we do the implementations for you, we haven’t really taught you how to fish. So, you know, we’re still pretty clear that we want you to go to the program. If you want to accelerate your timelines and move quickly, and you need some help with the implementation, we can help with that as well.

Speaker 1 (01:00:58):

And then the second thing that you guys do Tarush, which from my perspective is completely unique, at least in the data discipline is, I believe you provide the only mastermind group in the entire data discipline. And I’ve been in around the mastermind group concept for many, many years, but it’s typically been, or typically had a sales or a marketing focus or a business owner’s growth focus. So can you just explain exactly what a mastermind program or a mastermind group actually is?

Speaker 2 (01:01:36):

So the general concept of a mastermind is, you know, bringing together a group of people, roughly, you know, at the same level with the idea that when you are very careful with the group that you pick, then the group collective is extremely valuable. It’s much more balanced and knowledgeable than a single person’s experience. So, you know, once we teach companies about these data fundamentals, they now hit this certain level of maturity where they can actually execute pretty quickly because they have the right visibility. Then it becomes really interesting to bring leaders from these companies, so we’re talking typically either someone on the executive team, or if these companies are big enough, a data leader or someone inside strategy, but really thinking about how to leverage data, to hit that strategy. We bring together a group of these people and we start to discuss topics of how to really hit your business goals by being able to leverage data.

Speaker 2 (01:02:49):

So very often, you know, learning from someone else’s experience, learning what not to do is more valuable than actually knowing what to do. Right. You know, an example of, Hey, we’re about to do attribution. What’s worked for everyone, right? And then going around the group and being like, Hey, we initially did multi-touch attribution, but it turned out to be an overkill, you know, so first touch attribution was good enough or somebody else’s like AB tested attribution models. And we, you know, we found that there wasn’t a lot of difference and initially go for something which is sort of simpler to understand, right. Just learning from the experiences of other companies who may or may not be in the same space, but just what they went through gives you a lot more complete knowledge of how to go about your problem. And very often, even if it’s not immediately relevant to you now, just listening in and when it does become relevant, you now really have the sort of frameworks of how to think about it.

Speaker 2 (01:03:48):

So, you know, I had never heard of a mastermind when I lived in New York and San Francisco. Moving to Bali last year, I joined a mastermind which wasn’t super business focused, and it just happened to be the best thing which happened to me in 2020. So much so that at the beginning of this year, we just decided to add this data business mastermind into what we do at The 5x Company. A lot of other companies who do a fundamentals program end up then doing the mastermind to sort of connect with these group of other sort of companies at a similar stage, but also to just have them accelerate their own timelines towards their top business goals.

Speaker 1 (01:04:34):

Great. AndI think, yeah, there’s a serious power in being able to find out what your peers are doing to solve similar problems because none of these problems are new, ultimately.

Speaker 2 (01:04:47):

What I find really interesting is that there’s a large meetup culture in America. I know even, even inside Europe there are a lot of meet-ups . We Work actually acquired Meetup.com so I am pretty familiar.

Speaker 1 (01:05:01):

It’s huge in London.

Speaker 2 (01:05:03):

Yeah, exactly massive.

Speaker 1 (01:05:04):

Yeah. I didn’t realize until recently it’s giant.

Speaker 2 (01:05:07):

The one area which meetups don’t do a good job is this area of accountability and consistency, right? Like learning information is extremely valuable, but learning, you know, how to apply it, learning from the experience of other people who have applied, is just insanely more valuable than sort of learning, you know. Alot of stats out there which get talked about when you learn information, you sort of tend to forget most of it. But once you have an experience, learn from someone else’s experience, it’s a lot, lot more valuable. So that’s really how masterminds in my opinion are just so vastly different from meetups and why they’re just so much more successful in terms of being able to actually drive results.

Speaker 1 (01:05:56):

How can people find out more about you to Tarush and what The 5x Company are up to?

Speaker 2 (01:06:02):

Our website is Fivex.company. You can find out information on all of our programs over there. You can also reach out to me on my LinkedIn, which is Tarush Aggarwal or you can, you know, tweet to us or reach out to us on Instagram our email is info@fivex.company. So, whichever way you want to reach out to us, we are here to help you guys build these fundamentals because once you sort of really build these fundamentals, that’s when some of the really fun stuff starts to happen. That’s when you can really go focus more on AB testing and focus on building data products and focus on data science. You know, at some point we will love to do more programs around these areas, but again, the first step really is these fundamentals.

Speaker 1 (01:07:01):

And that brings to close this episode of the data science conversations podcast. We hope you enjoyed that fantastic conversation with Tarush and Peter. Thank you so much Tarush and Peter for joining us, and for your great insights and questions. And if you enjoyed this episode, please do leave us a review on your favorite podcasting platform. And we look forward to catching up with you on the next episode. Thanks and take care.