
SaaS Backwards - Reverse Engineering SaaS Success
Join us as we interview CEOs and CMOs of fast-growing SaaS firms to reveal what they are doing that’s working, and lessons learned from things that didn’t work as planned. These deep conversations dive into the dynamic world of SaaS B2B marketing, go-to-market strategies, and the SaaS business model. Content focuses on the pragmatic as well as strategic, providing a well-rounded diet for those running SaaS firms today. Hosted by Ken Lempit, Austin Lawrence Group’s president and chief business builder, who brings over 30 years of experience and expertise in helping software companies grow and their founders achieve their visions.
SaaS Backwards - Reverse Engineering SaaS Success
Ep. 164 - Voice, AI & the Future of SaaS: Productivity, Innovation, and Risk
Guests: Ken Lempit, James Ollerenshaw, and Rob Curtis
Voice is the new frontier in SaaS — but are we ready for the implications?
In the third episode of our AI Series, Ken Lempit is joined again by James Ollerenshaw and Rob Curtis to explore how voice and AI are colliding to transform the way B2B SaaS companies operate — from the way products are built and sold to how customers are supported.
We dive into:
- The shift to voice-first interfaces and its implications for product design
- Real-world use cases in sales, customer service, compliance, and manufacturing
- Why voice as a data source may be more valuable than voice as an interface
- How voice AI can unlock productivity gains — and which roles are at risk
- The compliance, privacy, and cultural risks of voice-powered surveillance
- Predictions on headless SaaS, ambient computing, and the next big AI disruption
This conversation goes longer than our typical episodes, but trust us — it’s worth every minute. Stick around for smart insights, practical frameworks, and some provocative predictions that CROs, CMOs, and SaaS builders shouldn’t miss.
📢 Learn more about Rob’s venture at standuphiro.com
🧠 Connect with Ken at austinlawrence.com
---
Not Getting Enough Demos?
Your messaging could be turning buyers away before you even get a chance to pitch.
🔗 Get a Free Messaging & Conversion Review
We’ll analyze your website and content through the eyes of your buyers to uncover what’s stopping them from booking a demo. Then, we’ll give you a personalized report with practical recommendations to help you turn more visitors into sales conversations.
And the best part?
💡 It’s completely free.
No commitments, no pressure—just actionable advice to help you book more demos.
Your next demo is just a click away—claim your free review now.
Ken: Welcome to SaaS backwards, the podcast that looks at what's working and what isn't in the world of B2B SaaS. This is the third episode in our new series on the impact of artificial intelligence on the SaaS industry, where we're unpacking what AI really means for builders, sellers, and buyers of enterprise and departmental software solutions.
Our last conversation covered the pace of AI adoption and how SaaS vendors and buyers are investing in AI native applications that enable more advanced workplace intelligence and automation. This includes agentic systems that can interact with other applications on the internet. Today we're gonna delve deeper into a topic that came up a few times in that discussion.
The way that we're interacting with computers is changing thanks to AI powered voice interfaces and voice data analytics.
I'm joined once again by my two brilliant co-conspirators and co-hosts James Ollerenshaw, a marketing strategist with deep AI expertise and a healthy skepticism earned from watching more than one hype cycle
play out. And Rob Curtis, co-founder of Hiro Studio, A venture builder working at the cutting edge of Agentic AI, and a frequent chronicler of this moment in tech via his substack. My name's Ken Lempit, I'm your host. and with voice as our topic. Today's episode should certainly give us plenty to talk about. And today James will lead the conversation.
James: Thank you Ken. So voice, voice has been in computing for, for a long time, but I think what we're seeing now with AI is it's really accelerating the capabilities and the possibilities. And for SaaS vendors, that means they have a question now, but to think about what does voice mean for their products and their business.
And I just thought it might be worth just taking a go through, what is the state of the art now with AI and voice technology? So very quickly, speech recognition that approaches human levels of accuracy. AI-driven, conversations with very lifelike text to speech, what's called multimodal integration, where you've got voice agents which integrate not only voice, but vision and text.
So these are the kind of things that you might experience in, in smart kiosks, and so you can interact with them in, in multiple ways. An important thing for privacy and, and and functioning with voice offline is on device voice, AI, Apple, Google, Samsung are all heavily investing in that. We're starting to see some maturing enterprise use cases.
Voice enabled productivity assistance, customer service automation that actually sounds human and is capable of resolving some more complex issues and voice-driven analytics. So you can just talk to your computer and say, what's our sales forecast? And in the pipeline emotionally aware AI. Or at least pretends like it is.
So voice agents that adapt based on tone and sentiment. And then I think a really interesting is, is the memory in assistance. So assistance that remember previous conversations and our preferences across the sessions that we have with them. So I think it's just helpful to break down a little bit what we mean when we think about voice and ai.
'cause I think there's, there's two main areas. I think voice interaction with AI systems is probably the first thing that people think of. Certainly major element area of development. But I think the, the less thought about, but perhaps even more interesting is voice as a data source. This was something that was very difficult for computers to do previously.
Voice data is unstructured, often ephemeral, but now it's being captured like never before. And, and our computers can readily process it. So we're in a new era talking with computers and those computers having memory of, of everything that we've said. So I wanted to start to pose some questions and see who is leading, which industries are leading in voice innovation.
I've worked quite a lot with AI in regulated industries and particular financial services, and it's interesting to see how strict compliance regulations have not been an obstacle to innovation. In fact, actually I'd say on the contrary, a company that I worked at Digital Reasoning built AI communications analytics, which was used by compliance organizations and banks highly regulated probably one of the most.
And it was that regulation that was driving this kind of technology and voice was one of the areas we were working on. This is over 10 years ago. That was in an era where regulations and blank checks from banks made it possible for that kind of technology to exist. But now it's become a lot cheaper.
So less regulated industries, hospitality, customer service moving fast and experimenting. I've also seen this come into manufacturing logistics. And it's also entering the office. We have Microsoft copilot in teams that has voice interfaces. So question that I wanted to pose to, to the group here. Is there any one sector that appears to be leading the charge in terms of meaningful voice technology adaptation, or are certain use cases pulling ahead? Ken, what, what's your take on that?
Ken: So I, I mean, I think that it's at the point of interface where this stuff is happening, right? industries that have a lot of interface, have a lot of expense. And the things that occur to me, and it wasn't in the list of industries you talked about, were also things like travel, where I think we've all experienced the defenses,
these, you know, airlines and rental car companies have put up, against having to put a human on the phone. But these systems are getting smarter and smarter to the point where it's almost preferable to me anyway, to speak with an intelligent agent that can hear me and then act. So, I, I'd say that travel hospitality probably a big place where we're gonna see this happen in a meaningful way.
Like it's going to affect many of us in our daily life. Also, probably financial services. I can't remember the last time I spoke to a human at Chase Bank, for example. And then on the B2B side we had a look see at a company in the financial advisor space called Zocks. I think we featured them on one of our webinars just a week ago.
And these companies are trying to figure out the ideal voice capture and voice engagement, using AI layer to get data out of these conversations. And I think that's, that's probably gonna be an area that explodes is, you know, how do we get the data out of the voice conversations?
And that's, that's almost everywhere. I mean, I don't know if there's a single place where human interaction is more textual than it is voice, on balance.
Rob: You know, Ken, that last point you made about the value of, of voice. Think about what we know about communication in face-to-face. The words we use account for about 7% of the information that we put across. Body language, tone, other things like that are completely lost when we interact only with text.
And, you know, we found proxies, we use emojis, we use indicators of tone and emails to try and synthesize some of those more natural forms of communication. But we can imagine now that if voice comes along and depending on how that voice is stored, and I think this is gonna be important, then we could be able to pull a whole bunch of data out of that.
So imagine if every conversation was captured in real time and recorded. We could be doing sentiment analysis, we could be training LLMs to read between the lines and understanding what is unspoken. And even if you did not I think LLMs are really great at structuring unstructured data, and I think that's probably one of the strongest use cases here.
And as we look at areas where we are seeing deep penetration, it's sales and CRM, it's customer service. It's healthcare. Places where people have to very quickly take an unstructured piece of information, structure it, record it, and get out. And I think, this is gonna be really fascinating moving forward.
Healthcare's an area that I'm really interested in. 'cause I started watching, The Pit, which is like a new ER, but it's filmed in real time, over 15 hours. So you're in there watching in real time what goes on in a hospital and there is so much communication work to be done.
And, you know, they even have a, a moment where they're like, can you please record this thing for me? Because the doctor is really busy. And so when we look at industries where there are skill shortages where there's incredibly strategic work paired with incredibly non-strategic and administrative work, I think these are the places where we are starting to see some real movement on voice.
And I'm not surprised that it's sales and CRM because every sales team has a little bit of money to experiment with and just think about the throttling factor that each of us has. We only have one voice that we can use at any given time. What's the point of having a queue in a customer services organization if you can deploy a hundred agents to concurrently answer calls without being throttled by human biology?
And I think these then paint some opportunity spaces to think about where can we reduce volumes? Where can we reduce wait times? Where can we simplify unstructured data? Because I think these are the areas that are going to naturally pull ahead. So sales customer service and healthcare.
James: You know, I was thinking what does this then mean for, for SaaS vendors? How do they adapt to voice and AI coming together? I would argue that SaaS companies have often been voice leaders in many respects. You know, SaaS is very good at building, specific solutions for specific problems. And so voices as a data source combined with analytics, has led to SaaS applications.
You mentioned Zocks, which is a bit like a Otter or fireflies you know, recording meetings and analyzing those. But built for the financial services industry which requires a specialist application because of regulation. And I mentioned before where regulation actually can have driven some of the use of voice, but now that it's becoming such more ubiquitous and affordable.
In a previous episode, we were talking about how is AI going to affect the SaaS industry? And we had some, some thoughts and predictions around the hollowing out of the middle, where you've got the big vendors, you know, the Open AIs, Microsoft, Googles holding their position, investing very heavily.
Some new AI native applications coming on them, snapping at the wheels of industry incumbents. And then those SaaS comes in the middle figuring out how to adapt, you know, and they're being built for a world of screen and text-based interfaces. And now we're talking about this much more natural language and, you know, and talking to the computers.
So do they use voice as kind of a, a bolt on AI or should they be replatforming?
Rob: So I'm gonna kinda share a couple of thought leaders opinions and I'm gonna share what I've learned from building with voice. 'cause we're kind of in the weeds with, with how we build at the studio. So let's maybe start with Mark Zuckerberg. He says Voice is the new user interface.
Why he's shilling smart glasses. Is he wrong? Maybe, maybe not. But you would look at AirPods as being something that many of us wear 12 hours a day now and start to wonder how we are taking advantage of this technology, given that Apple watch is a bit of a non-starter for interactions. If you look at a16z they predicted in the next 12 to 18 months, we will start to see deep penetration in content centers and in internal productivity.
So this is like operational stuff. How do you get a HR assistant to help you update your mailing address or what we're doing with stand up here? How do you simplify and reduce the need for expensive internal communication? So you're gonna start seeing, I think, areas where it becomes very obvious.
To do because either there's money to be made, the risks are low, or there's a high operational cost. We've been looking at it in a very particular way, kind of almost at a functional level. So just to kind of set the scene, for anybody that's listening today, our thesis is that middle managers are going to have a limited shelf life over the next five to 10 years, and we're gonna have to replace what they do.
And internal communication and alignment tasks are one of those areas that are actually really, really important, but are often the hardest to value. So when you're going through and doing headcount reductions, you're often looking at people who deliver clear outputs, but somebody that does alignment in communication, maybe they're less valuable when actually probably it's the opposite.
And so we've been building a product called initially Standup Hiro, which just automates some of your internal communication and alignment tasks on a daily basis. So if you've got status meetings, instead of having to go and verbally update with 10 people on a call at the same time, you can replace that with two minute phone calls.
And then the intelligence that sits behind the agent decides who needs to be at the meeting, who doesn't, and make sure that everybody is equally informed. That forced a lot of design choices on us. And I want to kind of just start with a really big one. There is no voice only mode. There are too many people who suffer from deafness to be able to only enable a voice method.
So it's probably going to be one of many tools in a multimodal interaction framework of which chat is probably going to be a big supporter. Voice is really great if you've got limited mobility, but equally if you can't hear the voice, that's not very helpful for you. So, I then looked at ways in which voice is perhaps more helpful.
Than just typing something down. So I'm gonna start with number one, hands free. So I think y'all are working on some clients who do maintenance. These are people whose hands are busy and who for administrative, billing and other reasons have to make sure that what they're doing is documented, that the right client is charged, time sheets are done, and so on and so forth.
And I think when the user's hands or eyes, are busy, voice becomes a natural alternative. So I think we're gonna start seeing in places where there are hands-free industries and anybody that's running SaaS in one of those industries, I think you would probably be on the top of my list to start thinking about voice enablement just as a UX improvement.
Equally, anything that's super fast, Alexa set a timer for five minutes. Far easier to do than it is to start typing and, and clicking through screens. So I think just very quick, low risk retrieval tasks. And I think anything, and this kind of talks to the hands free thing, maybe your hands might not be free, but where there is routine updates and logging.
What am I working on today? What have I done yesterday? What accounts am I working on? These things again, are really great at capturing unstructured data and structuring it into the way that we then need to use systems further down the value chain. And kind of touching upon this point around non-verbal communication, emotionally rich or personal interactions.
Because there is so much nuance carried in conversation, it's very easy to get lost when you only see transcripts. So, a former investor of mine precursors just invested in a company called Voice Ops. They help consumer companies take all of that rich customer data, they synthesize it into actionable points they can use to improve their product, improve their service, and improve their go to market.
So we're now seeing people use the kind of emotionally rich data that was previously inaccessible at scale, and they're using that to drive sentiment product and other suggestions. So you kind of got these things. They're regular, they're routine, they're probably low risk, and maybe you're hands are busy.
Time savers simplifiers is probably the best way I would think about those. And then there are tasks that are just not that great. Voice on its own can't do complex multi-step processes. I mean, you can have an agent that will run those, but you probably only need to be talking to the agent, trying to ask a voice interface to do three or four things in a row.
They're not that great at, in the same way that most LLMs are not that great at kind of thinking too far ahead. And I would say anything that's super information dense. Where it's non-linear reading reports, visualizing data, looking at financial information, it's rare on a financial report. One goes through a list from 1 to 10 of things that they're looking for more often we're scanning for variances and then drilling in where we see some of those variances.
And so naturally then you've got a kind of sense. The more complex it is, voice is probably not ready right now. The more routine, the more administrative, the more you want your users to be able to maybe multitask and particularly where there are emotional rich conversations to be had and information in that emotion.
That's kind of the matrix that I'm using. And so when we do Standup Hiro right now, we're just keeping it very functional, quick status updates. It's faster, it's easier, and let the AI do this structure for you. As we then think about deepening the utility of that, having team sentiment analysis is probably the next step.
Is everyone happy? Is a really important question for a project lead rather than has everybody done their tasks because those two things are tightly bound. But with reporting, we only ever look at one side of that. So I'd say, we'll, let's start moving into emotional management too.
Ken: I love the idea of, you know, the hands free assistant, you were talking about one of our clients called Manufacturing Asset Solutions, and you know, they're deploying an AI native. Predictive maintenance app into manufacturing. And clearly the people who are using it on the factory floor are not knowledge workers in the traditional sense.
They certainly have a lot of knowledge about the domain of expertise of machine maintenance, but they are not usually bound to a desk and a, you know, a desktop computer. And we've been focused with them, frankly, on the mobile experience, which many of them expect. But I think if they could do their data entry and be shepherded from one task to another by voice, they'd probably even prefer that so that they didn't have to engage with the system.
And, Maybe that's one of the things we should be, thinking about, as you know, if you put your product management hat on, is how do people want, in an ideal world, to engage with, you know, the tech master, you know, their technology masters. Like, do they wanna be typing, do they want to look at a screen?
Do they want even to have a handheld? I would bet you're probably right, Rob. The voice assistant, voice guided work would be much preferable as long as it wasn't too bossy.
James: So I've got a real example of that from 20 years ago. A company that I, I, I worked with that was building voice director systems for picking and packing in warehouses. Not AI based, but devices that would talk to the packer and very much quicker for them to be directed by voice than look at pieces of paper.
And it was interesting. You could adjust the speed of the voice on it. And these guys had the speed turned up so fast that if you listen it, you would, you'd not be able to make sense of it, but they got used to it. This is how they worked. It made them quicker. It was a much better form of interaction for them.
So I think your framework makes sense for that. The other thought that I had was when I first started working in public relations many years ago, and I had a secretary and my secretary would take dictation for me to write my press releases.
And I'd never worked that way. I'm used to writing on a keyboard, so I just couldn't write a press release in my head and dictate it. And I remember being reprimanded for not giving my secretary, like work through dictation. And so that was a kind of a skill I think that used to exist in the workplace
that's gone away. We've all got used to interacting keyboards. And I wonder now as voice starts to come back. Will we start to gain new skills or relearn some lost skills in how we interact with the world around this voice is, is so much more natural to us than writing. So, say when you think about your framework, Rob it forces us to ask some really smart questions about where voice is, is the most useful.
Rob: Yeah, you know, James, there's something, you know, you and I have this constant personal debate, which is that I do not like receiving voice notes on WhatsApp because what I tend to find is as people that do send voice apps have not done the work of structuring their thoughts in a form that's digestible for me.
And I think in the art of communication, it's not only what you want to express, it's the way in which you can express it so that it can be received best for the receiving party. I'd love voice notes with an LLM attached that told me what I needed to know, let you have your unstructured thought.
And then I received them structured because actually what I think in an unstructured world, you're passing that work onto somebody else. I think it, it it, these are growing in popularity all around the world. But I do think this point around how we then start to think it's gonna be very important. 'cause many of us can type faster than we can speak because we can type in structure.
But we don't tend to speak that way. And I think the, the ability to translate unstructured thoughts into structured thoughts has been a real key value driver of communications expertise. And that with an LLM is going to become automatable so easily that why even structure your thoughts in the first place.
And I wonder if that leads us to better or worse thoughts over time. I'm not sure.
Ken: With all of this voice assist, with all of this taking away the labor of building our thoughts, are we gonna lose something in our ability to be coherent? And to do the work much like, I'm sure we all know people who are lost without their GPS, right?
If I don't have Google Maps or Waze I can't get from point A to point B you know, there's a cognitive loss there. And I wonder if the tools are gonna take away our reasoning and communications capabilities in ways that are, you know, analogous to the loss of navigation skills. So, I actually have that as a concern here.
James: So we're now exploring how voice Assistant can come into the office and become our new coworkers. And, and Ken, as you're sort of thinking about how that might affect how we work, how effective do we think voice can be in transforming employee productivity? is this the big opportunity for SaaS or will it, will it create some challenges?
Ken: I remember early in my career, there were people that I absorbed a lot from. They shaped my ability to work, my effectiveness at work. And the method of that knowledge transfer was very analog and very almost accidental. You know, we worked in coworking spaces, my first job, I was a computer programmer and we sat side by side in a large room doing our work together.
And lucky for me, I sat next to somebody with 20 years experience so I could look over his shoulder literally and get that knowledge like how to be effective. In my first job when I was further in my career, you know, we would hunt down subject matter experts when we were trying to reinvent systems for Citibank.
And I think , this corporate knowledge, this subject matter expert knowledge has been one of the Holy Grails of like learning management systems. And, and I just feel like therein that unstructured data, those conversations you might have between a, earlier career employee and somebody with great expertise if these things can be captured as well as so many other conversations that happen in the office, the ability to make an organization much more effective could be transformational.
I mean, nothing short of transformational. You know, I was having these conversations literally one-on-one, but now you could have one to hundreds, one to thousands, one to tens of thousands over time, where one super expert's knowledge is made routinely available, which would never have happened before.
So I think we need to think both larger and smaller about the implication of voice, capturing voice, processing voice, distributing access to the longitudinal insights, longitudinal connections between conversations and, even presentations that are given internally. I just think that there's incredible opportunities that were nowhere near tapping.
What's happening now, for the most part, is this bolt-on stuff, which, you know, you've heard me talk trash about, to be honest. But I really feel that this is where it's gonna happen. So what we should be looking for is like, what is monday.com gonna do to radically transform work? Is it a better Kanban chart?
I don't think so. You know, I think it's gonna be some kind of longitudinal over time and a cross and throughout the organization gathering of insights and making them useful. And I think voice is gonna be a big part of making that kind of transformation possible.
Rob: Yeah. You know, Ken, as you say that it makes me think about what an enterprise really is and going through the different stages of business growth. You know, I focus on zero to one these days, but I've done a lot of like restructure, reorganization, work in large corporations. And generally what happens is out of a desire to scale and improve quality, a process landscape and a business design is put in place that is designed to,
minimize variation, let's just put it that way. And so from the moment you engage with a company as a worker, you start with a process. Generally that's a recruitment process, then it is an onboarding process, then it is a user access administration process. And then it is a one-to-one process. And like you have been overwhelmed by process to begin.
And on the way out, it is exactly the same. Handing off my logins doing an exit interview. And so if you think about the opportunity for all of those processes to be automated by AI, it would vastly change the employee experience. And I think, you know, you touch upon one of my personal bug bears project management, no tool that we have introduced.
No strategy, no methodology has improved project success rates in generations. There's 70% of projects fail. You can put them in a spreadsheet, you can put them on cards, you can meet every day and talk about them. You can put 'em on a backlog or a Gantt chart or whatever you want. But I think we are missing the, the human part that these platforms facilitate, which is as simple as the best person to do the job is the person that can do it the best is available and has bandwidth.
You know? And yet we do not have the systems or the platforms to know that James has a little bit of downtime over in HR and he is a great voice recorder, so we need him for the new creative campaign we're doing. We do not have that visibility on people. And I think you are right in that, in the functioning and the process overhead of companies is a massive opportunity to scale an AI play. And I think voice is gonna make that easier because it's a very simple way of engaging with many regular ceremonies that we do. But I think that there is a big step change we are all going to have to go through to make that work. And I'll start with just a very simple question.
Who said it was okay to record me on all of my Zoom calls? I don't want to be surveilled on both my formal and informal communication at all times. I do not want to know that somebody is recording every single thing that I have said without context, and that that is being usable for any number of legal and non-legal things.
Because the idea of being surveilled at all times is something that I don't think that we have all actively thought about. And I think as we see tools like Otter and firefly and other things like that, we're giving them an incredible amount of power and handing that power away to companies that we should not trust.
And I think what is going to be very interesting is are people willing to have that level of surveillance applied to them? Because I dunno that the answer should be yes, let alone will be. And I think that the internal use case is going to start to hit compliance problems pretty fast. So Standup Hiro
great idea. What are you working on today? If somebody comes back and says, I didn't get this done because Deanna from work touched me inappropriately. Hang on a second. Now that's written down somewhere. Is that a complaint? Is that not a complaint? Do we have a duty of care to act upon that information?
What about if that's synthesized and sent to a user? Do I have to act upon the information that was somewhere in that step? So then things where we might've said, don't write that down, which is a normal part of brainstorming out of thinking through things. Maybe we're not ready to commit to something we are not ready for discoverability, because these are not just things that are production ready.
What does it do to have that constantly surveiled? Does that have a chilling effect on creativity? And what does that mean when we get into some really thorny questions like, we're all sat in a room, we've made a recommendation, and suddenly you hear a voice from the Zoom call say, with all due respect, Ken, I disagree.
I don't know that. Employees are going to respond very well to that. And I would imagine that in a highly litigious employment landscape, like in the US, we will start to then see employee complaints. I think that was a biased answer we saw last week with news from Shopify that people will be evaluated against their value add above and beyond the AI.
So therefore, what the AI says is going to begin to have a personal performance impact. And I think these are really thorny cultural problems that I think
Gen Z will help us open the door for. But I would imagine many of our baby boomer and Gen X friends are probably not gonna like that idea too much.
James: I think it's a fascinating area to think about. You know, is voice always the better solution and what problem does it create? You know, it's. It's a huge data source, you know, that's been untapped and we are used to it being ephemeral. Voice is something which disappears. And so as that starts to change, and like you say, that's just sort of creeping into our lives.
So we know we're thinking what is it better to use voice than typing something down? As many instances of that. But do we want, you know, all of this voice to be captured at all time and, and remembered forever? And I'm gonna literally jump ahead to my prediction from the end. Think about how with social media, things that we posted from years ago, not me, but some people personally years ago, causing them problems further down the line because of drunk on a night out and leading them to not being able to get a job, for instance.
And I think if we think of all the things that we say that we really prefer not to be captured, if that's there indelibly forever, like you say Rob, often without context I think that recreates some, not just legal challenges, but just societal challenges. How do we all feel about this?
Just thinking more about sort of SaaS application of voice is and, and whether it's always better is we see voice bots starting to replace traditional call centers or even like, drive in food orders. Huge opportunities there, but I think we've all experienced those kind of frustrations when talking to chat bots like phoning the bank that misshear our question or don't give us the right options and they're getting better.
But I, you know, do wonder about whether certain systems are ready. Questions about reliability and as Rob you sort of said that right balance between the, the AI and the human touch. So I think it goes back to your framework. You know, really to think very, very hard is this the right interaction medium. Is this the right data resource?
Rob: Let's talk about some of the tools that a SaaS company can use to experiment in a pretty low cost way right now. To try and see what it might be capable of. So, you know, 11 Labs is really good at voice cloning and has a load of kind of voice APIs to play with.
I use Vapi with my business partner to build Standup Hiro. And one of the incredible flexibilities of that is that I can switch the language instantly and all interactions then are handled in Spanish. So localization, solved. Solved is probably over, over stating it somewhat. But when we start getting into more, I'm going to say less spoken
languages, there's less training data. Model accuracy goes down. And particularly where there are strong accents on either side, like we're doing rigorous testing, how many Indian names can I pick up in a word salad conversation with a load of jargon? Does it understand those well? But what we're seeing is technology that allows us to have reference data to hold those against.
So you might say a tricky name, or you might use the ERG integration four work stream, some jaga. That only makes sense. Internally we are announcing the LLMs capable of reading against lists, documents, definitions, and data dictionaries to be able to both interact more effectively on the moment, but also transcribe and analyze that data more.
So I think you're gonna see some really lumpy parts where we've got languages that are really well supported, languages that are less well supported but technology coming up to make that easier.
James: So how far does voice really take us? Are we ready for a world of AI voice companions and ambient computing? Your voice is set to play a key role in the emerging field of ambient or ubiquitous computing where technology is integrated into the world around us, and we start just to, to talk to our hotel room, talk to our car.
We certainly seen that coming into the automotive sector. And, and then with the AI companions, which we mentioned that then not only interact with us, on our last query, but actually have memories of, of all, past queries and are connected to our personal databases on the public internet.
That's a very different world that we would interact in. And I, I wonder where SaaS exists in this. We know we talked about this sort of hollowing out at the middle. That kind of joining up of technology suggests that the major tech companies are gonna dominate there.
Just very complex systems that need to be knitted together. And so SaaS, which, builds a whole industry of zillions of apps and that exists in a world where really we just talk to personal assistants. Is that the headless SaaS. How does that exist in this new voice future?
Ken: I don't know if we've coined the phrase headless SaaS, but I think it's a good way to think about it. But it's been kind of done before. I mean, so many leading SaaS applications have deep integration libraries, right? And they're, they are already drawing upon the resources of other systems to do their platform e functions.
So, you know, like we're a HubSpot partner and have been for like 12 years and that integration library has just exploded. So by definition, HubSpot is absorbing function from these partner applications in furtherance of its own user experience. So I don't know if it's like the end of an era so much as it's different way of platforming, right?
Like if the voice becomes the platform. So , if you have the best, commerce optimization plugin for HubSpot, you are gonna be accessed by voice or you're gonna be accessed by keyboard. I don't think that changes too much. I think it's certain applications that wanted primacy of interface that are gonna be most profoundly affected, and either they're gonna be replaced by other things that do a great job with voice or they're gonna be absorbed into other platforms and probably still be okay.
So I think it, to my mind, it's like where you're not able to make the jump, your application isn't voice suitable, and yet most people want to be engaging by voice.
So anyway, I think that voice is gonna become ubiquitous. That's a given. And the question is, do you become a primary voice application or are you platformed? And I think it's okay to be platformed.
There are plenty of things that are not my primary everyday application and they survive in an ecosystem, whether it's Salesforce or HubSpot or SAP or Oracle, you know, they plug into these master applications, if you will, and I think they'll continue.
Rob: So, I would use a few examples to help understand where I think the future's going. So number one, I would say we've learned from the last generation that's had the last 10 years of experimentation with voice, with Alexa, Google Home and Siri. People do not buy things on those platforms.
There's no commercial case for those companies to have a voice integration because I think the, I the dream was, you know, buy more Lemon Pledge and it would automatically come to your house. But everyone just uses them for time and so they've become a loss leader. And I think that that probably then starts to set some limiting factors around where and when we will be interested in interacting with commerce and selection and things that have a cost associated with them.
We're not at a point yet where things know whether it will fit me well, whether I like it, and it's cumbersome to have a product description read out. So I think there are gonna be some natural areas where we are not seeing it just functionally, it's just not the best option available or we need visual things in particular.
I would say the second limiting factor is, are you ready to be on an airplane with people interacting with their software out loud? I do not like being in a coworking space where I'm speaking out loud on my AirPods to somebody on a Zoom call because I feel rude and I'm the one making the noise.
I cannot imagine a world in which voice only becomes a way in which we all choose to interact. 'cause there are just places where we have to be. Imagine James, you've spent time in London, as have I, anyone saying anything on the tube?
James: Yeah, on the tube that
Rob: On the tube.
James: Yeah. It's like, oh my God.
Rob: Indeed. And yet, so I think those are gonna be some limiting factors and they can probably set the shape of, of what's confidential, where it is the most effective, and where context and environment makes it better or worse.
So we've given our SaaS audience some, some interesting strategic implications to think about whether they kind of live in this sort of following under the middle, potentially where the interface changes. We've asked people to think very carefully about this sort of framework for where voice fits.
James: I think we also talked a lot about how AI voice can help automate or streamline tasks and reduce cost increase efficiency. Also, you know, how could you make voice and integral part of your SaaS product? A couple more that I just wanted to kind of, share with each of you, Rob. You mentioned using a voice application to help you do different languages.
So I was just sort of thinking for SaaS platforms to expand their audience to cater to a global user base. Can voice AI help us to do that, do you think?
Rob: I think AI in general can, so, you know, I worked on the branding for a tequila distillery last year and I wrote that website in English with an LLM that translated through the framer which is, which is the web platform. Its standard model then translated the entire website into possible, if not elegant Spanish.
And so I think that the idea of investing large amounts in localization will, will definitely come down. And I think voice certainly makes things more accessible as much as any text does.
Ken: The only thing I'd say about localization is it depends on how much opportunity there is for, for the sponsor of that localization. I don't think Mondelez is gonna allow an AI to localize its websites, right? They wanna sell Lorna Dunes globally. They wanna speak in the local vernacular. And I think we're a little, a little far from being able to do that without localization experts, but for small to mid-size business to be able to have a website in 12 languages instead of two gigantic opportunity, you know, it, it, it could be game changing.
We often interview on SaaS backwards firms from Europe that want to get into the US market and, you know, translating into English is for some of them, more of a challenge than others. And certainly if you want to be able to communicate, to 10 or 12 or 14 countries and their native tongue and you're a small company, gigantic leverage there.
Rob: Customer support, like if you're using agents to do it, why not offer support in 27 languages? Help people use your product.
James: I think Ken, what I wanted to put to you which is we often will start a client project by talking to their customers, talking to their salespeople, talking, capturing that invoice, and so analyzing that, that voice data is great for insights that we can use in business management, point of development and, and marketing strategies.
So how do you think, the AI driven voice is gonna change that for us.
Ken: It's really laborious to come up with the insights and, and to conduct the interviews. I think you have to almost do it in person 'cause there's spontaneity that, you know, there are things that happen you can't plan for in the interview. But, if you've interviewed 15 people for an hour each and then you know, you as a marketing advisor, you're supposed to draw the conclusions and see all the consistencies or inconsistencies in response.
I think it would be, you know, pretty obvious use case to run the transcripts of those interviews against, you know, a set of prompts so that you can build your insights more effectively and, and more efficiently. That's only like the work we do where it's like, you know, 15, maybe 20 interviews. What if you're doing a hundred or 200 or 500 interviews, you know, becomes beyond,
I think where it gets interesting is when it becomes beyond the ability of a human to keep all of the stuff straight, right? What did interview 327 say? I have no idea. It almost becomes unmanageable in your own mind and you just, you become almost roboticized. And administering like interviews like that.
And in fact, it might take a team of people to administer them. So I'd say the larger the data problem, the more this gets interesting. The larger the data asset, the more this gets interesting. So imagine a thousand interviews that you know could be summarized and the implications of the consistent remarks brought to you, you know, in like 12 minutes or
Rob: And, and we're capable of holding seven things in our short term memory at the same time. So like, it's just hard.
Ken: I just think that big, you have to think big. I think one of my, you know, like a, like a place to land for me is you have to think big even if you're thinking small, right? So even if the problem is small, where are the data large? Where do the data sets get big? And that's where it gets really interesting.
You know, if it's three interviews that are 12 minutes long, well I could do that. I probably don't even need to write it down. But if it's 3000 interviews that are 12 minutes long, well that starts to get interesting. So think big.
James: So what we say that one of our favorite parts of this podcast is our predictions and we were so excited that Rob, you and I gave our predictions earlier on but I do wanna loop back to them. Rob, just in case you've got anything more to add on that you, you did have concerns about the legal implications for, for AI invoice any further to add that.
Rob: So I briefly suggested before a reality that we are about to come across, which is that we're going to have new actors in the workplace. I politely disagree, Ken as the voice agent, sorry. Pick on you. Ken is an actor that will start to arrive, and so applying a little bit of first principles knowledge to that, we are soon going to feel personally impacted by an actor in the room at work.
And I think with job disruptions, layoffs coming. We will start to see serious employment litigation against companies based on the feelings and perceptions of discrimination made by AI. So in particular, if you think about how many employment suits are settled, it's like, make lots of loud noise, put a big number in front of somebody and settle.
So if you imagine that that is a large part of how the employment litigation market works, it's impossible to say whether an AI was biased. It's impossible to tell you what it's thought before it made a suggestion. And if it says, I'm sorry, but the quality of this work is not good enough and somebody feels that they have been discriminated against as a protected class against that, it is almost impossible for an employer to push back.
And so I think we are going to see a wave of attempts to settle employment claims as it relates to AI, to human interaction. And I think that's going to start this year.
James: Ken, now your prediction.
Ken: Well, I, I just think that the knowledge base is the thing, you know, I've sort of woven that into some of my conversation in this episode, but the knowledge base typically has been just the orphan application. It's never worked. It's it's promise unfulfilled. So I think this is the year of the knowledge base, and someone's gonna crack that code and create transformational productivity gains based on harvesting the inside of subject matter experts and making it routinely available on a platform like a Monday or some other, you know, broadly used HR employee management app.
James: Oh yeah. I gave my reaction to your prediction, Rob, about you know, the legal implications and I just reiterate what I was saying. I think this forever memory of, of voice really creates not just legal, but a societal challenge. We've not caught up yet with the internet and having all of what, all of that information existing forever means.
Social media has amplified that further. And so voice, which is say the most ephemeral kind of form of data now becoming permanent. I think this is a huge area of adaptation for us. I have no idea where that's gonna go, but it's definitely a question for me. I'm gonna give a bit of a positive kind of, or semi positive prediction.
So I think the, the rising quality of the voice interactions means it's gonna become ubiquitous in, in many interactions with big companies, customer service especially that will be very much more efficient. But I think it could also lead to a bit of a two-tier experience where those who are able to pay more, get the access to real humans.
What might be nice outta that is that those humans become better trained and have more time to deal with
Rob: properly.
James: Yeah, I, you know, it's interesting when, whenever I see AI being applied in the enterprise there really worried about layoffs. And my experience is that that's not happened.
People have actually taken that time and used that to train their employees our questions about whether it means to reduce hiring in the future, but I've not seen it lead to layoff. It has led to you know, better employee training and, and better experiences for customers. So I can see voice doing that for us in the future.
So we've got, you know, people you can pay to talk to, they can offer great experience and AI voice for the rest of us.
Ken: Yeah, and by the way, that's also a revenue opportunity and I've seen instances of software companies pricing human support as a premium. So I think there's opportunity for everybody in that two tier. You know, you, you can, if you can't afford the human support, you probably would get very. Capable AI led support.
I think this is a great place to land our episode. This is a two dog walk episode again, guys. So I think I, I think we'll, we'll land it here. James Ollerenshaw, thanks so much for lending your expertise as always. And Rob, if people wanna learn more about what you're doing in your venture studio, how can they get ahold of you?
Rob: Best thing to do is go to www.standuphiro.com
Ken: My Demand Generation Agency for SaaS is Austin Lawrence. We're at austinlawrence.com. That's Lawrence with a w. If you wanna reach me on LinkedIn, it's LinkedIn/in/kenlempit. And if you haven't subscribed to the SaaS backwards podcast yet, and we haven't scared you off with the length of our AI episodes, please subscribe wherever podcasts are distributed, and of course, a rating of the podcast would be so helpful. Gentlemen, thanks so much for a great episode of SaaS backwards.
Rob: Again, thanks James.
James: Thank you.