Skip to main content

Show HN: AI prompt-to-storyboard videos w/ GPT, Coqui voices, StabilityAI images https://ift.tt/LYxotTU

Show HN: AI prompt-to-storyboard videos w/ GPT, Coqui voices, StabilityAI images I had 2 weeks off from work and wanted a pet project before heading back. With GPT and Generative AI in the news, I decided to chain multiple Al products together to build something really cool. I set my end goal to be: prompt-to-storyboard (aka fun videos generated purely via generative AI). There exists some prompt-to-video products, but I wanted to tell stories with audio as well. The end product takes an initial prompt and produces a series of images and audio files, which I then combine (with subtitles) into the final video. To showcase videos, there is a basic upvote/downvote leaderboard. Text | OpenAI https://openai.com/ Text is generated in a few high-level steps that I ask GPT to work through. These are all based on the initial user prompt, as such (ideally) indirectly controlled by the user. - Create a concept for a movie scene based on the prompt, including the theme and setting - Define each character in the scene - Define how each character looks - Define how each character sounds - Define 'frames' of the storyboard All of this textual information is defined in a JSON object I describe to GPT. I then take GPT's output and build the storyboard with the tools below. Voices | Coqui https://coqui.ai/ From the GPT output, I needed three major pieces of information to build voices in a way that I found satisfying: - Description of the voice - Description of the performance - Text of the actual dialog spoken Coqui has a product called 'prompt-to-voice', where you can describe how a character will sound and a custom voice is made for that character - this is how GPT defines the characters to use in the storyboard. As such, every voice is unique per storyboard. GPT will decide that a certain character is an "older man with a raspy voice", and I'll ask Coqui for that type of voice. In addition to this, in order to describe the performance, GPT outputs a basic emotion to summarize the line of dialog (happy, sad, angry, etc) - this is also sent to Coqui per audio clip generated. Images | Stability AI https://stability.ai/ While I originally setup the storyboard generator to use DALL-E due to already integrating with OpenAl for GPI, I found the cost prohibitive. As such, the images generated for the storyboards are from Stability Al's Stable Diffusion (stable-diffusion-512-v2-1). I combine the description of the frame that GPT provides, in addition to the theme and setting that GPT output for the whole storyboard, to generate each frame. Since GPT controls the data sent to Stable Diffusion with the description of the frame as well as the theme and setting, if your prompt dictates a theme it should hopefully translate into a theme in your storyboard. Both the storyboard and the 'prompt enhanced' image generation in the 'Create Content' tab pre-feed a GPT request with a summary of Stability Al's prompt guide. It will try and pick keyword weights to improve the image, and much like the setting and theme, keywords should be influenced by the initial prompt provided to the product. Conclusion: Have fun and make my 2 weeks of work seem worth it! Voting on storyboards and creating storyboards both require a simple Google login to get access. https://meyer.id April 21, 2023 at 06:05AM

Comments

Popular posts from this blog

Show HN: Tape It, iOS recording app for musicians https://ift.tt/3udBTSi

Show HN: Tape It, iOS recording app for musicians Hello HN, Over the last 15 months, two friends and I developed the music recording app we felt we wanted based on our own needs as musicians. It's called Tape It [1] and has just recently hit the Apple App Store [2]. We put a lot of effort into a good UX to help musicians really focus on playing their instrument instead of pretending to be a recording engineer. The app records in stereo on newer iPhones (although that's a premium feature; the free version only records in standard mono audio quality). I would be really grateful for advice from this community on how to best approach marketing. We had a great TechCrunch article covering our launch [3], and we posted it on various music websites. Turns out advertising on Google or Apple Search is a dark art, though. We have some good ideas for developing a good social media presence, but they will take time. Please hit us with feedback, opinions and advice that you think a young ind...

Show HN: Moderator,lightweight peer4peer anon forum https://ift.tt/3fZSDGl

Show HN: Moderator,lightweight peer4peer anon forum hello all! here's a link to my little pinteresting like forum that stores no data on the server and uses IPFS for image storage. The design aesthetic is that everything would in 64kb of memory so we're going for a collapse-proof low bandwidth experience. this makes moderator really fast. https://moderator.rocks is the web preview, a flutter client is in the works at https://ift.tt/32wqdRb take a look, post something fun, ask questions. I'm also on twitter @moderatorium in case interested. Have fun! January 26, 2022 at 12:23AM

Show HN: Comment on live websites just like you comment on Google Docs/Figma https://ift.tt/GRhrjX0

Show HN: Comment on live websites just like you comment on Google Docs/Figma I'd love your feedback on this new JS plugin we launched. With this, you can comment on live websites just like you comment on Google Docs or Figma. You can use is to get Copy or UI feedback right on the website you are building. Feedback can be provided in rich formats like audio and video. You can get started by installing a JS tag in the footer of the website. You can then turn the review mode on or off on demand by adding “?review=true” to the URL. Demo video (43s): https://www.youtube.com/watch?v=cdnfBEw8TfI Demo video: https://www.youtube.com/watch?v=h6vxzXJuh8o https://ift.tt/ocLpdEu October 26, 2022 at 02:18AM