I started this blog about 12 years ago, just after publishing my first book The Voice in the Machine (I will publish a second book, AI Assistant, in 2021). The initial idea of this blog was to have a venue for my musings and ruminations on conversational AI. However I haven’t posted anything for a while, being very busy making other plans. Indeed, a lot has happened in AI in the past 12 years, and while I continued to work actively in the field, at places like Jibo, Google, and now Uniphore, I feel a renewed desire to share my general thoughts on technology.
Let me start with a quick and highly simplified recap of the past 12 years for those who have not been following the recent developments of AI.
Around 2012 Geoffrey Hinton showed the world that deep learning was indeed possible, and it could be used to solve problems, like speech recognition, better than any other technique. Of course that was known, and Hinton had not been the only person working on deep learning. Notably Yann LeCun and Yoshua Bengio (who shared with Hinton the Turing Award in 2018), and possibly many others, had created the foundation of modern machine intelligence technology. AI, or better Machine Learning (ML), assumed an increasingly important role in the industry and in the world. Deep Learning became the reference technology for many problems, like speech and image recognition, producing results that overshadowed those attainable with the previous, still reputable, ML technologies. The invention of the attention mechanism gave rise to today’s transformers, which are the foundation of Large Language Models (LLMs) that became popular knowledge everyone talks about, even outside the circle of scientists, that culminated in the unveiling of ChatGPT with the beginning of a process of popularization and democratization of AI. The rest is history.
What I want to briefly discuss here is a more philosophical point of view: how did the way we build intelligent machines changed during the past 10-15 years. I will start that by citing one of the most important cognitive philosophers of our era, Daniel Dennett, who unfortunately departed recently, on April 19, 2024.
Dennett points out that Darwin showed how evolution by natural selection can build sophisticated things (like viruses and humans), without any knowledge of how actually those things work. Robert MacKenzie Beverly, one of Darwin’s nineteenth-century critics, said about Darwin’s evolution theory: absolute ignorance is the artificer […] in order to make a perfect and beautiful machine it is not requisite to know how to make it. Beverly called that “a strange inversion of reasoning”. While Beverly used that as an argument against Darwins’ theory, Dennett recast that argument in a positive way, and of course in favor of the evolutionary theory.
Dennet also goes beyond Darwin by including Alan Turing’s machine in the same category of these strange inversions of reasoning, reiterating that: in order to be a beautiful computing machine, it is not a requisite to know what arithmetic is. In fact the Turing Machine, that is a theoretical contraption that has no more and no less power than any computer today, does not know arithmetics, but knows only how to write and read symbols on an infinite tape, and move the reading/writing head to the left or right. Everything else, even today’s sophisticated AI, can be built, at least theoretically, on top of that elemental capability.
Simple and ignorant mechanisms such as evolution by natural selection, and the Turing machine (even though Dennett’s statement may be arguable), are apparently enough for building the whole physical world of living things, and the whole digital world we are experiencing today. In other words these mechanisms demonstrate competence without comprehension (even though this concept can be arguable too: do the cells of our brain have any comprehension at all?).
However my point is, and surely I am not the first one to observe that, that the current evolution of deep neural networks technology, for instance the LLMs, brought us machines that can appear extremely competent, like conversing with humans in a seemingly human way, without actually having any built in comprehension about what they are talking about. Inside those machine there are no clearly defined modules for lexical, morphological, syntactic and semantic analysis, but a huge number of artificial neurons (the elemental units of neural networks) arranged in multiple layers with some of them serving different purposes, like encoders, decoders, attention, but yet without any specific comprehension of the things they do.
By comparison, building an intelligent machine, like a speech or a natural language processor, only a few years ago required the design of specific modules that in a way coped with the phenomenology of those signals. Examples of those modules were grammars, parsers, feature extractors, phonetic translators, vocal tract-length estimators, and yes, language models–indeed we had language models before ChatGPT–to predict what the next word would be. Those modules were carefully designed and crafted by ML scientists expert on the particular phenomenon to model, such as speech, natural language, images, etc.
Today, apparently, none of that is needed, but rather the wrangling of massive amounts of data, and while the resulting machine eventually may get some internal representation of that data, it does not have any built-in comprehension of the underlying phenomenon. I like to point out that, in little more than 10 years, we went from an era where AI was intelligently designed, to today’s era, where there is no need for intelligent designers.
Of course the perceived end of the intelligent design of AI machines did not happen all of a sudden. We saw that coming. The first AI era, that started at the end of the 1950s, was characterized by a strong intelligent design. Rule based systems were painstakingly crafted, rule by rule, by the engineers who built them. At that time people talked of expert systems, where the rule, acquired from experts in a specific area (e.g. linguistics, medicine, etc.) were represented in a digital form to be used by a logic inference engine to formulate hypotheses based on some observable facts. As opposed to that, statistical machine learning, which roughly started in the 1970s, advocated the use of data to estimate statistical models of the phenomena in question. For a while the two camps coexisted and argued against each other, the rationalists who continued to model phenomena with rules, and the empiricists who rather used data to create models. Eventually the rationalists gave up, mostly because they could not match the results of their competitors on any given task. Yet those statistical models, generally represented in a parametric form, still required a deep understanding of the underlying phenomena that informed their design. Moreover, building the models required a number of simplifying assumptions, often to make them mathematically computable. The models, in a way, had to have some inbuilt level of comprehension of the phenomenon. The quality of the results were limited by the assumptions made, and how well the resulting model represented the actual reality of the phenomena.
A third camp appeared in the 1980s, that of the so-called connectionists. They claimed that there was no need to architect a specific model of the phenomenon in question, but rather one could use a large network of highly connected, simple computational elements characterized by a number of free parameters, called weights. These models, that came to be called neural networks, relied on a unique general simple learning mechanism, the backpropagation algorithm, co-invented by a number of people, notably David Rummelhard and Yann LeCun, based on a gradient descend strategy to modify the weight to reduce the inference error. Neural networks and backpropagation don’t know anything about the problem to solve, as long as labeled data is provided, and one could build a number of beautiful mechanisms, like predictors and classifiers, without knowing anything about the underlying phenomenon: a perfect example of competence without comprehension.
Neural networks could not demonstrate their superiority at the time they were introduced, in the 1980s, and they were forgotten for a while, while statistical ML evolved towards more and more sophisticated models: such as Hidden Markov Models (HMMs), Support Vector Machine (SVMs), Conditional Random Fields (CRFs), etc. That was the state of affairs until 2012,.The rest is history. While Statistical ML is still useful for a number of problems, the whole world of AI is embracing sophisticated neural networks known, most of them based on a Transformermodel, to solve a large variety of diverse problems: competence without comprehension on steroids.
So, is this the end of it? Do we not need intelligent designers anymore?
I don’t think so. There is still a lot of work to do for intelligent designers, aka Ai scientists. Generative AI is at the peak of hype. Thanks to the process of democratization that started with the wild popularity of ChatGPT and its derivatives, many more people than a few years ago have access to AI for the first time, a good part of them thinking that Gen Ai is magic, and that it can solve all problems. Many other people, especially those who have been seriously contributing to the science of AI, who have studied that in depth, fortunately, have a good sense of the limitations and risks (and I am not talking about AI doomsday) of the current technology.
Of course, in reality, Gen AI is not magic, and the problems with it are quickly revealing themselves, but yet it proves to be an amazing tool, never seen before, that will continue to bring to life transformational applications. However I strongly feel that we still need intelligent designers in AI, but a different type of the designers that built the intelligent machines of the past decades. We need intelligent designers that understand how to harness the power of humongous amounts of data and create more advanced models, possibly beyond today’s LLMs. II believe, and hope, that serious science will continue to evolve towards better and smarter models than today’s LLM, and those models will help humankind solve bigger and bigger problems and continue to improve the quality of our lives.


Leave a comment