Setup and architecture of our midjourney automation
Introduction
Midjourney is an AI image generation model developed by OpenAI. Currently, it is only accessible through a Discord bot on the official Discord server. This can be done either by direct messaging the bot or by inviting it to a third-party server. Although it may be a challenge to automate, we were determined to leverage it for our content creation.
When we began our research on automating Midjourney, our first instinct was to create another bot that would interact with the Midjourney bot. However, this approach quickly failed as bots cannot access /commands, which Midjourney uses. So, we knew we needed a different solution and set out on our Midjourney automation journey.
Automation
We started by inspecting how requests are sent when a user interacts with bot commands. We quickly discovered that these requests are not that complicated. In fact, simple requests to Discord’s API (https://discord.com/api/v9/interactions) allowed us to simulate these interactions.
Our bot now has its own slash commands to interact with the API. It accomplishes this by sending requests, as a user would, from one of our accounts that has a Midjourney subscription. We set up dedicated channels for commands (protocol and bot1) and let it run. Magically, the bot responded with the generated image in our bot channel.
Upscaling
The next stage involved adding an auto upscaling feature. We encountered an issue while attempting to execute this function. After examining the code, we realized that we needed to respond to the mid-journey message and include the appropriate context in the upscale request. Once we made these changes, the upscaling feature worked smoothly, and we created a new channel to store all the upscaled images.
After verifying that all system components were functioning as intended, we proceeded to set up a Flask client to transmit requests to our bot. The bot operates as a subprocess within the Flask client and employs a blocking queue implementation to relay Flask requests as commands to our bot. One such command is the ‘imagine’ command, which operates equivalent to the user’s imagine command, which enqueues in a seperate ‘imagine’ queue to be further processed.
Periodically, the bot inspects the queue for entries and identifies an unoccupied channel to request an image, employing our mid-journey strategy, which comprises three concurrent jobs and, therefore, three request channels.
Furthermore, each channel monitors an ‘upscale’ command, automatically magnifying the most recent produced image.
Finishing Touches
The final step is to save the generated images to the file system, which is essential for preserving the images for future use. Thanks to the built-in feature of the Discord library we used, this step was effortless. The images are automatically stored in a particular directory within the file system, where they can be accessed by our automation systems or administrative user interface.
By storing the images in the file system, we can quickly process them through our automation systems or administrative user interface. This processing generates data that provides valuable insights and analysis, which we can use to improve our services.
Saving the generated images to the file system is a critical step that ensures the accessibility of the images and enables us to generate valuable insights and analysis through our automation systems and administrative user interface.