How I Used A.I. to Create My First Book And Generate a Podcast From It

If you would have told me two years ago I would publish a book of short stories on Amazon I would’ve never believed you.

That patience…

The struggle…

The war of art is real, as it’s often hard for artists to wrap up their work and publish it as completed. As Leonardo (da Vinci, not the ninja turtle) said: ‘Art is never finished, only abandoned.’

In my latest side project, a lot of the mundane, heavy lifting was done by using A.I. tech. Because I could focus on the fun part (generating ideas) I was near the finish line before I knew it. It felt like a cheat code.

Of course those last steps still took some weeks to get done, but no one has to know about that.

While the end result might not turn out to be my Mona Lisa, this experiment was an eye-opener in terms of how creators and artists can interact with A.I. to augment their ideas and get their work out to people with less frustration and struggle.

Here’s how I started…

Writing the content with GPT-3

Encounters at the Ramen Shop started by me fooling around with GPT-3, a pretty impressive natural language API by OpenAI, similar to the API (LaMDA) that made a Google engineer believe it became sentient.

With it, you enter a text prompt, say a first sentence of a story, press a button and a piece of generated text will roll out of it. You can tweak some parameters like ‘temperature’ which indicates how random (or wild) the completion of a text will be, as well as prevent the repeating of sentences (so it can pass the Turing test).

After the A.I. generates a piece of text you can simply submit the generated text again to add more to it (unless the A.I. determines the text is already completed), or edit or add to the text yourself and submit it as a new prompt.

This way, you go back and forth with something that feels like a fellow author. OpenAI has a handy Playground text editor that does it all, so there’s no coding needed to get going.

a glance at the text generation of GPT-3

I picked an opening scene for my story (someone enjoying food in a ramen restaurant, don’t ask why) and then see how many versions I could make using the A.I. to co-author along with me.

The first few versions I let the A.I. have its way and I ended up with typical wholesome Hollywood stories about people finding their future soulmate in the restaurant. Every generic prompt I gave was spun into a positive conclusion very quickly, like the A.I. was trained on some Disney movie blueprints.

That wasn’t any fun.

I decided to throw some curveballs at it by making the protagonist behave offensively, or by throwing some absurd elements into the mix (explosions in the kitchen, ninjas, fights with other customers) to see how the A.I. would handle it. This instantly became a lot more fun.

It was pretty impressive how GPT-3 could still spin absurd prompts into some kind of resolution that still kept with the main storyline, staying unfazed about the sudden events that had happened.

The best moments (felt like one in ten) came when the A.I. generated its own good ideas in between, like it recognized the direction you were going in and was happy to have things further escalate in the story.

Plenty of stories read like a teenager raised on eighties action movies wrote them. As the main setting is a ramen restaurant, I couldn’t count the times the yakuza got involved into the plot somehow.

Sometimes it felt like trying to get a 5-star wanted level in GTA, with myself as the player and GPT-3 playing the role of law enforcement.

Of course there are some boundaries in terms of what you can and can’t use the API for in OpenAI’s terms and conditions, overly violent or sexual content is not allowed for example.

Before I knew it I had a Google Doc full of crazy, short stories.

I created encountersattheramenshop.com and posted about one of those per day.

Creating story and cover art with Dall-E and Midjourney

Using the text-to-image A.I. Dall-E, creating art to go along with each episode was a breeze. At first I used simple scene descriptions from each episode, resulting in prompts like:

*‘Cute waitress from the local ramen shop’*

A well-dressed man is lying on the floor next to a gun in a ramen restaurant. Hundreds of rats are gathered around the man. Comic book style. Grim and surreal

Eventually I needed a single art style, so after a few iterations I stuck ‘One Punch Man art style’ after every prompt and came out with some great looking results on some of the stories.

a kitsune sitting on top of a kitchen countertop. One Punch Man art style

a hacker with sunglasses is typing on his laptop in a ramen restaurant. Next to his laptop is a plate with gyoza. One Punch Man art style

a yakuza boss making a toast with a glass of sake. One Punch Man art style

Anything slightly hinting towards violence is a no-go on Dall-E. On those occasions, Midjourney came to the rescue:

A logo for the website needed to be something different. Something simple, cute and colorful, to offset from the darker tone of the written content.

After running the following prompt for a couple of times I finally made a a pick out of the wealth of cute ramen bowls Dall-E generated.

It turns out, taking a prompt like this and using Photoshop or Illustrator for post-processing the results isn’t really a bad way to go about logo design. I understand the sentiment about logo designers being replaced or artists having their style copied by prompters.

Be aware that the A.I. often includes gibberish texts, as soon as you mention something with ‘logo’ in the prompt. It takes some trial and error to further tweak the results into something of your liking.

After collecting my material into a book format with Kindle Create, I needed a book cover as well. I’m very fond of cyberpunk and blade runner’s grim, yet colorful aesthetic (I’m sure ‘cyberpunk’ is one of the most overused prompts in A.I. image generation), so this was the style I wanted for the cover art.

Now, a book cover doesn’t really fit in the default 1200 x 1200 square dimensions of these A.I. image generators.

Luckily, Dall-E has something called Outpainting. This allows you to work on a bigger canvas with your square image as a source and little by little, expand your image from the sides outwards, into the required image resolution.

Using this and a bit of Pixelmator for cleaning up the results and slapping a book title on top of it, I quickly ended up with some candidates for the book cover.

After gathering some votes, I stuck with option #3 (prompt: “Encounters at the Ramen Shop, sci-fi book cover like blade runner, anime art style”).

I Outpainted the back cover as well (needed for the paperback version) and the final result turned out like this:

with a more disturbing version that didn’t make the cut:

The book is out on Amazon now!

Generating a Podcast with My Own Custom Text-to-speech Voice using Descript

For the bite-sized episodes of Encounters at the Ramen Shop, I thought a narrated audio version of each would really go well into a podcast format, to supplement the stories on the blog.

But, being a lazy artist, I didn’t feel like painstakingly narrating each episode myself, in proper recording conditions and making sure that there was consistent audio quality throughout the entire series.

Luckily, text-to-speech technology has gone a long way as well. If you don’t know what that is, just think about the audio option inside Google Translate, where you can have a native voice speak back the text you’ve just entered.

I didn’t want those stock, robot-sounding text-to-speech models though…

This is where Descript came in.

Descript is a tool that allows video producers to transcribe their video content and then, through editing the transcription in their text-editor, magically cut and edit the associated video in turn as well.

Descript also has a technology called Overdub. This feature allows users to create a text-to-speech model from their own voice, making it possible to type some text and then hear yourself saying it back, with the option to export your script to audiograms and raw audio files, without significant effort.

It’s kind of disturbing in a way, using the kind of tech that’s also responsible for those convincing celebrity deepfakes.

Descript needs some good quality training data, meaning a collection of audio recordings from your voice between 10 and 30 minutes, for best results.

They have a special training script, which is about 30 minutes of some Planet Earth documentary transcription.

So off I went, recording a 30-minute take about polar bears and penguins, speaking into my wardrobe (to dampen the room reverb).

I submitted my training audio to Descript and the following morning I got an e-mail saying that my Overdub voice was ready.

Hearing my voice model say the first few texts I typed was pretty freaky. There were still hints of robot in it, but it sounded unmistakably like me.

Now I could copy and paste each short story into Descript and export it as an audio file.

As I never published a podcast before, I resorted to Spotify’s Anchor.fm to publish my episodes on Spotify, which went pretty smooth and instantaneously and was also free of charge (yay).

Listen to Encounters at the Ramen Shop on Spotify:

What’s next?

I’m sure that within a year (maybe even a couple of months) this post will be full of antiquated tech and surpassed by whatever the latest models are for text, audio and image generation. It’s all advancing at a break-neck pace at the time of writing.

All the little image artifacts or inconsistencies will no doubt be ironed out with and we’re even getting to the stage where you can prompt to generate video and convincing 3d models.

Tools like GPT-3 and Dall-E have definitely earned a permanent spot in the creative work and side-projects I’m doing, I’m burning through credits every month as we speak.

I’m still waiting on a similar A.I. that will do the same for music production, where I prompt with a 4-bar loop of music or even a voice note and have the A.I. complete it into a high-fidelity arrangement of my initial idea. If you’ve found something promising, do let me know.

As for Encounters at the Ramen Shop, who knows. When video-based A.I. becomes available to the public I could add this as another media format. And when there’s any demand for more there might be a season 2.

Writing the content with GPT-3

Creating story and cover art with Dall-E and Midjourney

Generating a Podcast with My Own Custom Text-to-speech Voice using Descript

What’s next?

Leave a Reply Cancel reply