Technology

The 2023 Award for Text-in-Image AI • AI Blog



Aspect Ratio

One notable distinction between DALL-E and its competitor, Midjourney, is the flexibility in controlling the aspect ratio of the generated images. Unlike Midjourney, which offers users the ability to specify the desired aspect ratio, thus catering to specific size requirements for various applications, DALL-E lacks this feature. This limitation in DALL-E can be particularly challenging when the task at hand demands images of a specific dimension. For instance, designers or content creators often require images that fit certain size criteria for web layouts, print media, or social media platforms. Midjourney’s capability to tailor the aspect ratio makes it a more versatile tool in such scenarios, providing users with a significant level of control over the output, ensuring that the generated images align precisely with their specific project needs. The absence of this feature in DALL-E, on the other hand, can necessitate additional steps for users, like cropping or resizing the images externally, which might compromise the original quality or composition of the AI-generated artwork.

Complexity of Text and Positioning

In the realm of AI-generated imagery, both DALL-E and Midjourney demonstrate a varying degree of proficiency in text generation, especially when comparing common phrases to more niche or specialized ones. For instance, generating widely recognized phrases like “Happy Birthday” tends to be more successful for both platforms, likely due to the prevalence of such phrases in their training datasets. However, when it comes to less common phrases, such as “2023 in AI”, the results can be less reliable. The models may struggle to understand and correctly place less frequently encountered terms within an appropriate context. Moreover, when it comes to the placement of text within images, Midjourney shows a particular limitation. Unlike DALL-E, which generally manages to integrate text more seamlessly into the visual narrative, Midjourney often falters in accurately positioning text. This discrepancy can be crucial for projects where the spatial arrangement of text is as important as its content, underscoring the need for continued advancements in AI’s understanding of the intricate relationship between textual and visual elements.

In the following examples, DALL-E tends to get the spelling and positioning of the text more right than Midjourney 6, but both are still in dire need of improvement before the image can be used “in production”. One important caveat is that inpainting with AI allows for easy correction of errors.



Source link

asad
the authorasad

Leave a Reply