Last week, I discussed how generative AI and Large Language Models enable a fundamental shift away from the search bar and website link list that dominated how we have conducted internet search for the last 30 years. I’d recommend you read that piece as a preamble to this one if you haven’t already.
Last week’s focus centered on conducting internet searches with a keyboard and a screen, as we do with phones, tablets and laptops. This week I want to dig deeper into how our interactions with technology are becoming increasingly “human.”
The keyboard and screen will not be the dominant user interfaces in a voice-initiated and AI-response future. Thinking about this topic is timely, in large part due to the recent partnership announcement between Apple (Siri) and Microsoft/OpenAI (ChatGPT) – two powerhouse tech companies that are investing heavily in hardware devices, natural language processing (NLP), large language models (LLM) and AI.
The combination of NLP with LLMs is a major step towards the “Star Trek” paradigm of computer interfaces. Who remembers Jean Luc Picard commanding, “Computer. Tea, Earl Gray, hot” to vocally prompt the Enterprise to generate his preferred beverage? My personal favorite voice interface was David Hasselhoff’s Michael Knight talking to his autonomous car KITT via an early form of smartwatch. “KITT, run an analysis of the LA County database and identify the person across the street.” These fictional interactions at times involved a screen, but most frequently did not.
At the recent Grap-a-Palooza conference, Igor Jablokov, Founder of Pryon and a well-known AI expert noted accurately that humans learn to speak years before we learn to read or write. The launch of Siri, Alexa, Google Home and other voice-interface tools enable a more natural and human way for us to interact with technology. But to date, those tools only effectively work when our spoken queries mimic the style of query we’d type into a search bar.
The user experience has been less than human
To this point, detailed and conversational prompts completely baffle Siri. Why? Conversational queries are not well suited to the legacy “text search bar” tech upon which early versions of voice assistants were built. That changes with LLM integration.
I’m sure that the first launch of Siri+GPT will have hiccups and hallucinations. These edge cases will make the news and stir up a lot of debate. Ignore the noise. Each component technology, whether it is the microphone sensors, noise cancellation algorithms or language/dialect/accent training will all incrementally improve. We are closer than ever to genuine human-like conversational dialogue with AI. The evolution may feel slow in the moment, but will advance really quickly. A few years to high maturity, not decades.
Leading indicators of our social desire for a shift to voice are already apparent
When was the last time you saw someone holding a phone to their ear to speak on it? People use earbuds or speakerphone mode to hold most phone conversations that I observe. Holding a phone to your head isn’t a natural human thing to do. We want to see the people we speak to (speakerphone/video calls) or to talk while using our limbs for other purposes (earbuds/handsfree).
Holding tools in our hands is natural for tasks like chopping wood. But for most of human history, conversation didn’t naturally require us to hold a tool. The phone forced us to adopt the unnatural task of holding something to our ear to communicate. We hold phones out of necessity, not out of a natural behavior. The same argument can be made for keyboards. Voice (natural) will eventually displace keyboards (unnatural tool). Augmented reality (digital data augmented over our real world view) will over time displace screens (looking away from the world to see the screen).
The above argument considers our individual behaviors. But what about when viewed through a broader societal lens?
Is a shift toward natural human-to-technology interaction socially congruent?
I’ve observed a significant uptick in people using speakerphones in public and populated places. While that behavior feels a bit annoying to an old guy like me, it is clearly becoming a social norm. And why shouldn’t it? We’ve had human-to-human conversations in crowded locations – on the subway, in a restaurant, at a park – our whole lives. Just because some conversations are through a phone doesn’t mean that our desire to communicate in a human-like manner isn’t our instinctive desire.
Talking directly to computers (and cars and appliances and kiosks and robots), in similarly crowded places will seem odd at first, but quickly become natural. Old folks like me will eventually recognize that when technology brings us closer together – digitally enabling human interaction, whether over a phone or with an AI – that is behavior that we should embrace.
I’m not yet sold that we will recognize that AI that we interact with through conversational dialogue as being human itself. I think technology will quickly get better at understanding and accurately responding to context for even complex query/response style interactions. We’ll very efficiently accomplish administrative tasks, research topics, diagnose problems, write code and conduct analysis with AI assistance. But it will simply be a more natural way to interact with a computer.
As robotics further proliferate, we’ll see a lot of advances in voice-input and physical-AI-generated-output as well. Telling your beverage machine to make Picard’s tea is a simple example. We’ll engage robot waiters, autonomous vehicles and smart houses – all through conversational interfaces.
But I think we are still a distance from using computers to get marital advice, debate philosophy or secure emotional support. The interfaces will be more naturally human, but we’ll still recognize the machines as computers. The integration of ChatGPT into Siri is a big step. We’ll see NLP put into all of our devices, our appliances, all the “things” we engage with every day. NLP enables our front-end inputs. LLMs generate human-like responses.
We are at a tipping point for how we interface with computers. Who knew that artificial intelligence would lead to more natural interaction?
The post Tom Snyder: Voice assistants and the future of human-computer interaction first appeared on WRAL TechWire.


