The Next Paradigm
This post is the speaker notes for a talk I gave at an event for AI and Machine Learning for Games in New York City, July 6th.
Hello! My name is John Shaughnessy. I'm a Senior Engineering Manager for Hubs at Mozilla. I live here in New York City and I'm excited to be here.
Hubs is open source software for building virtual worlds. People use it for meetups, art galleries, and escape the room games. Our goal with Hubs is to ensure that the barrier to entry into spatial computing is as low as possible, and that there is always a viable open source option for people to run independently.
I want to show a little experiment we ran to integrate AI tools into Hubs. What we did was center the players within a skybox and seed the beginning of a fantasy story -- either in the world of Lord of the Rings, Harry Potter, or Dungeons and Dragons. From there, each action the player would take would generate the next step of the story and a new skybox would be generated for them to occupy.
The multiple-choice answers are a kluge because we didn't take time to allow for arbitrary voice responses to the AI, but it's easy to imagine the story being advanced more directly by the player (or DM, in a D&D setting) speaking about what happens next.
This was a fun demo to build. We think there's incredible potential for things like NPC's. We imagine entering the tavern with many simultaneous conversations going on that you can selectively join and leave during your adventure.
The main challenge we faced using AI in this realtime setting was hiding latency. Generating the skybox often took several more seconds than generating the next bit of text. Folks who want to productize this kind of system will probably want to build more in the meta-layers of the application which would do things like generate valid map nodes the player can travel to in advance and separate description/skybox for each node from the next stage of the story. That way, each action can be taken immediately (the player is served the prerendered part first, then the generated story step), and it results in a more seemless experience.
One of the many problems of virtual worlds is content. Virtual worlds remain relatively niche outside of specific games like minecraft, roblox, and fortnite because of the high cost (in money and time) to producing high-quality 3D content. A lack of content leads to the so-called "dead mall problem", where you're exploring an abandoned, usually low-quality "social space" without any people in it.
This was one of our first experiments with using AI with Hubs. We'd also done some texture generation to make creating new scenes easier. But overall we're at the beginning of our journey to explore what's possible and build internal capabilities.
AI and machine learning are going to have a tremendous impact on the art pipeline for games, both for professional artists and amateurs. The next billion dollar startup could easily grow out of this space.
But beyond art for games, I think what we'll see is "visual artifacts" becomes better embedded in communication and language.
We've already seen how images embed themselves in our language - as emoji (or animoji), as gifs and memes, and as visual aids in presentations. Images are an incredibly useful shorthand. They "stand in" for many, many words at once. Chalk talk is a time-honored entertainment and pedagogical tool.
AI and spatial computing allows 3D models to occupy a similar role.
This is just the beginning though, and it's a viewpoint borne of our existing paradigm, rather than one that becomes possible in the new world. It's like having a computer and saying, "wow, this thing can store so many books and movies" while ignoring that computation enabled hypertext and interactive games.
I think the next paradigm can be summarized as follows:
- From storytelling and narrative to interactive models and systems.
- From personal computing to communal computing.
- From screens and virtual to environment and augmented.
- From manual and artisanal to ai generated and amateur.
- From specialization to cross-disciplinary.
Communication is deeply linked with storytelling and narrative. But as our understanding of the world becomes more sophisticated, we move beyond narrative to models and systems.
The personal computers we have today are actually born from timesharing on shared computers. It was an idea that was rejected for a long time, because personal computers were so much less powerful than mainframes and incumbent players in the industry didn't want to give up their spot.
Though Google's Stadia failed commercially, I think it's at least partially correct about how to handle expensive computation. All of today's most popular AI systems are fundamentally cloud compute. I don't think Cloud is the right answer for all cases, but what's clear is that there's a need for powerful computers that are too expensive for most individual consumers.
Where do we look to imagine what's next?
In my opinion, the most lucid imagined future is what Bret Victor is building with the Dynamicland Foundation: Communal computing for 21st century science, Apr 2023 .
While I think he's fundamentally correct about the necessity of bringing computation into our environment (where the people are), I still think there's a ton of good work to be done in this direction with the help of headsets -- especially AR headsets for co-located teams.
AI can help generate the kind of quick "throw-away" scripts that people write for Realtalk sessions. It will save time and effort and lower the barrier to entry.
I think the hardware costs will be prohibitively expensive for some time (both to build a real-world environment and/or to produce the necessary headsets and on-site GPU/AI "mainframes".)
I'm eager to keep building in this direction and hope to explore it more in the coming years.