Skip to main content

Using AI to Describe an Image - Then DALL-E to Generate a New Image

Creative Project

Pushing AI Vision to the limits

With our ambition to grow our stock photography and videography portfolios even further this year, I thought I’d share some interesting experiments we’ve done with Chat GPT AI and Open AI Vision systems. 

We’ve developed a batch process to use Open AI Vision to describe a stock image or video (yes - video; via inputting a series of thumbnails). It has blown our minds how specific and detailed this AI has been in describing an image. It can detect most objects, time of day, perspective, mood, and more! It doesn’t know the names of places, so location and place name data is additionally input.

That data then can be used in AI at scale to generate powerful Titles, Captions, and Keyword lists that are just as good as a human. Does it mess up sometimes? Yes. But we’ve found the descriptions are only inaccurate about 1-2% of the time, so after a quick spot check it’s good to go!

The experiment we’re sharing today involves taking 10 of our images, having Open AI Vision describe that image, and THEN taking that description into Chat GPT 4 to have it generate an image (which uses DALL-E) solely from that description. How close to the original will it be? Honestly, some are pretty close, and it is a bit eerie! Overall, the generated images are more vibrant and ‘fantastical’ in a way that is not realistic. But looking at the images side-by-side, you can see how well Vision was able to describe the image, given that the generated image has a very similar vibe and composition simply off of the Vision's description of the original.

Before we jump into our 10 images, just a side note on blogs starting this year. Back in November of 2020, we committed to blogging more - twice a month in fact, which we stuck to consistently since then. However, as we push to grow our stock photos and videos, we’re evaluating our time and have decided to start blogging just once per month. Our blogs may get more complex if there are a few topics to discuss, but other than that, all will be the same!

Alright, here are our 10 original images, each followed by the Open AI’s description, then by the AI Generate Image from that description! How do you think it did?

Botanical Gardens Interior

Botanical Gardens Interior

"The image presents a vibrant and lush indoor garden within a greenhouse, characterized by a diverse array of tropical plants and a glass ceiling that allows for ample natural light. The greenhouse seems to be meticulously maintained, with a central pond acting as a focal point. The pond is encircled by a stone border covered in green moss, and features small aquatic plants and a few decorative elements that contribute to its serene atmosphere. A wooden bench is strategically placed to the left of the pond, offering visitors a spot to sit and enjoy the tranquil surroundings. The architecture of the greenhouse is visible, with glass panes held together by a network of metal beams, creating a pattern that contrasts with the organic shapes of the foliage. The photo captures the essence of a peaceful retreat amidst the bustle of daily life, suggesting that this could be a place for relaxation, education, or botanical study. The location is not specified, but it gives off the impression of a well-established botanical garden or conservatory, possibly situated within a city or park that values the cultivation and display of plant life. Potential buyers of this stock image might be attracted to keywords such as greenhouse, tropical plants, indoor garden, botanical, conservatory, relaxation, natural light, greenery, and tranquil. The image conveys a sense of calm and could be used in materials promoting well-being, environment-friendly practices, or educational content on plant species and ecosystems. The eye-level perspective enhances the accessibility and inviting nature of the scene, making it easy for viewers"
DALL-E Output

DALL-E Output

Abstract Mural

Abstract Mural

"This image features a vibrant and colorful geometric mural painted on a wall. The wall is covered with a variety of triangles in shades of green, blue, white, and a splash of orange, creating a lively and dynamic pattern. The bright green base color of the wall provides a fresh and energetic background that enhances the contrasting blue and white triangles. The occasional orange triangle adds a pop of warmth to the otherwise cool color palette. The mural is depicted in a straight-on perspective, suggesting that the photograph was taken at eye-level, and the full frame of the image is filled with the pattern, allowing no surrounding context to determine the specific location. The absence of any discernible landmarks or environmental features suggests that the focus is solely on the art itself, making it an excellent choice for stock photography where the emphasis is on abstract patterns, color, and texture. The image conveys a sense of modernity and urban artistry, suitable for a variety of creative applications such as backgrounds, cover art, or design elements. This type of image would attract buyers looking for keywords like geometric pattern, colorful mural, urban art, abstract background, and vibrant wall texture. The photograph captures the essence of street art and could be used to evoke themes such as creativity, diversity, and visual impact in design and marketing materials."
DALL-E Output

DALL-E Output

Milwaukee Marina

Milwaukee Marina

"This image presents an aerial view of a vibrant marina, bustling with a plethora of docked boats, flanked by a lush, expansive public park and an urban skyline. The marina is neatly organized and filled with various types of boats, primarily sailboats, which suggests a scene of recreational boating and maritime leisure. The park area adds a touch of nature and recreation to the urban environment, with its green open spaces, trees, and pathways, indicating a place where city dwellers can relax and enjoy outdoor activities. The skyline in the background features an assortment of modern high-rise buildings, underlining the proximity of the marina to a bustling city center. The architecture varies, with some buildings under construction, indicating ongoing urban development. The clear blue sky and calm waters suggest a peaceful daytime setting, likely late morning or early afternoon given the well-lit scene and the absence of long shadows. This image encapsulates themes of urban life, leisure, and the balance between nature and city living. It also conveys a sense of tranquility and escape within a metropolitan setting, which is often sought after in both lifestyle choices and as a visual concept for businesses related to real estate, travel, or outdoor activities. The photo's high vantage point provides a comprehensive view that would be valuable for those seeking imagery of urban harbors, community parks, or the interface between developed cityscapes and recreational marine environments. Potential buyers might search for keywords such as city marina, urban skyline,"
DALL-E Output

DALL-E Output

Parking under Train Tracks with Colored Columns

Parking under Train Tracks with Colored Columns

"This image captures a striking urban scene under a bridge or overpass. The perspective is from one end of the underpass looking straight through to the other side, creating a deep vanishing point. The most prominent subjects are the robust columns supporting the overhead structure, painted in vibrant blocks of red, green, and blue, offering a pop of color against the otherwise industrial grey concrete. The repetition of the columns and the shadows cast on the ground contribute to a rhythmic and geometric visual appeal. At the far end of the passage, a person can be seen sitting on a bench, providing a human element to the scene and suggesting a space that serves not just as a thoroughfare but also as a place for rest or contemplation. A vehicle is parked to the right, adding a touch of everyday life to the image. The lighting appears natural and suggests daytime, with sunlight filtering in from the sides, accentuating the textures of the concrete and the painted surfaces. The image evokes concepts of urban design, architecture, infrastructure, and public spaces. Potential stock photography buyers might be drawn to this image for themes related to city life, urban planning, color contrast in architecture, or the use of public spaces. The combination of structural elements and the human presence might appeal to those looking for images that depict the intersection of urbanism and personal experience in city environments."
DALL-E Output

DALL-E Output

Dam in Ann Arbor

Dam in Ann Arbor

“The image captures a serene riverscape, featuring a dam in the foreground, a railway bridge spanning the river, and the calm waterway extending towards a lush treeline in the background. The dam, with its gates partially open, allows water to cascade down, creating a frothy white flow that contrasts with the stillness of the river above. The railway bridge, supported by sturdy piers, stands as a symbol of connectivity and industrial progress against the natural backdrop. Taken during what appears to be late afternoon, the lighting imbues the scene with a warm, golden glow, highlighting the textures of the water and the surrounding greenery. The shadows are soft, suggesting that the sun is not at its peak, possibly hinting at the golden hour when the light is especially favorable for photography. The presence of kayakers on the water adds a sense of leisure and recreation, suggesting the river's role as a venue for outdoor activities. This image could evoke concepts of energy, infrastructure, and the intersection of human engineering with the natural environment. The perspective seems to be from a high vantage point, potentially an elevated position on a bridge or a nearby structure, giving viewers a comprehensive overview of the scene. For businesses buying stock images, keywords related to this image might include river, dam, railway bridge, renewable energy, water management, outdoor recreation, infrastructure, and natural beauty."
Ann Arbor Dam DALL E

Ann Arbor Dam DALL E

Waterfalls in Bahamas

Waterfalls in Bahamas

“The image features a striking natural scene dominated by a cascading waterfall that flows down a rugged, mossy cliffside. The waterfall is the central subject, its white frothy water contrasting vividly with the earthy tones of the rocky surface from which it plunges. The cliff's texture is rich and varied, with hints of green vegetation clinging to its surface, suggesting a humid, tropical environment. Above the waterfall, a clear blue sky with patches of white clouds creates a serene backdrop, adding a sense of tranquility to the scene. Surrounding the waterfall are tall, slender palm trees that sway gently, their fronds tousled by the breeze. The palm trees are a significant part of the composition, as they frame the waterfall and emphasize the exotic and tropical ambiance of the location. The perspective is eye-level, allowing the viewer to feel as if they are standing at the base of the waterfall, looking up at the majestic scene. This image evokes a sense of escape and natural beauty, perfect for businesses related to travel, tourism, nature, and wellness. Potential buyers might search for keywords such as tropical waterfall, nature scenery, palm trees, travel destination, exotic landscape, and natural tranquility when looking for such serene and picturesque stock photography."
DALL-E Output

DALL-E Output

Trains in Chicago at Night

Trains in Chicago at Night

"The image captures a bustling urban scene during the blue hour, where the sky transitions from day to night offering a deep blue backdrop that contrasts with the artificial lights of the city. The view is taken from an elevated perspective, likely from a high vantage point in an adjacent building, overlooking a curving elevated train track that cuts through a canyon of modern and traditional architecture. One of the most striking features of the image is the long exposure capturing the light trails of a train as it moves along the track, creating a fluid ribbon of color that draws the eye through the composition. The train's motion contrasts with the static buildings, adding a sense of dynamism to the otherwise still urban landscape. The surrounding buildings display a variety of functions, with visible office spaces, retail shops at street level, and a mix of architectural styles that suggest a diverse and vibrant city environment, possibly in a major metropolitan area. The image conveys themes of urbanization, transportation, and the perpetual motion of city life, making it a compelling choice for businesses seeking visuals related to progress, technology, or urban development. It encapsulates a moment of daily life in the city, with the potential to resonate with viewers who experience or are fascinated by the energy and complexity of urban spaces."
DALL-E Output

DALL-E Output

Downtown Ann Arbor, Michigan

Downtown Ann Arbor, Michigan

"The image captures a vibrant street scene at dusk, the blue hour settling in with the last hints of daylight fading from the sky. The prominent subjects include the glowing neon signs of the Michigan Theater and the State Theatre, which anchor the composition and add a sense of nostalgia and entertainment to the scene. These theaters, with their classic marquee signs, are reminiscent of a bygone era and suggest a cultural or historical significance within the city. Pedestrians walk along the sidewalk, giving the image a sense of life and activity, while cars line the street, indicating an urban setting. The architecture of the buildings, with their ornate facades and the presence of a tall tower in the background, suggests that this location could be in a historical part of a city in Michigan, United States. The lighting, from both the sky and the artificial neon, creates a contrast that highlights the theaters as the focal points of the image. The atmosphere conveys an evening out in the city, perhaps before or after a movie or a live performance. It's a snapshot that tells a story of urban entertainment, community engagement, and the timeless appeal of local landmarks. This image would resonate with businesses looking for stock photography conveying themes of urban nightlife, entertainment, historical preservation, or community events. Potential buyers might search with keywords like city dusk, neon signs, historic theater, urban street scene, and blue hour ambiance."
DALL-E Output

DALL-E Output

Downtown Fort Wayne, Indiana

Downtown Fort Wayne, Indiana

"The image presents an expansive view of an urban landscape during what appears to be the golden hour, with the sun casting a warm glow that enhances the autumnal colors of the foliage. A distinctive feature is the bridge in the foreground, which spans a tranquil river reflecting the sky and trees along its banks. The bridge's modern design, with its series of arches, contrasts with the natural elements and is a testament to contemporary engineering. Looking beyond the bridge, the city's skyline is characterized by a mix of mid-rise buildings, which suggests a mid-sized metropolitan area. The concentration of taller structures in the center likely indicates the city's commercial or business district. The varying hues of the trees, transitioning from green to vibrant reds and oranges, signify the changing of seasons and add a rich tapestry of color to the urban environment. This photograph would likely appeal to those searching for stock images related to urban planning, architecture, infrastructure, seasonal change, and cityscapes. The aerial perspective gives a comprehensive view of the urban and natural harmony, making it suitable for editorial content or marketing materials that aim to convey growth, transformation, or the beauty of urban life amidst nature's cycle."
Fort Wayne Downtown DALL E

Fort Wayne Downtown DALL E

Frogs on Branch

Frogs on Branch

"The image captures a vibrant scene featuring two colorful frogs in what appears to be a tropical habitat. The foreground is dominated by a bright yellow frog with black spots, positioned on a dark, textured piece of wood. The frog's striking color and glossy skin make it a prominent subject, standing out against the more muted colors of its surroundings. In the background, there is a second frog, this one boasting a deep blue hue, almost blending into the shadows. It is perched on a leaf that curls over the wooden surface, adding a layer of depth to the composition. The environment suggests a humid and lush ecosystem, possibly a rainforest or a controlled habitat mimicking such conditions. The rich green leaves interspersed around the frogs contribute to the tropical ambiance. There is a sense of tranquility and natural beauty, with the frogs appearing calm and at ease within their environment. The perspective is at eye-level with the yellow frog, fostering a sense of connection with the viewer. This image would be suitable for themes related to wildlife, biodiversity, nature conservation, and the beauty of amphibians. The photograph's vivid colors and focus on the frogs in their naturalistic setting would attract businesses in the fields of education, environmental protection, or pet care. Potential buyers might search for keywords such as tropical frog, rainforest wildlife, colorful amphibian, or nature conservation."
Frogs In Greenhouse DALL E

Frogs In Greenhouse DALL E

Comment below how you think AI is progressing, and if you are also impressed and/or creeped out by Open AI’s Vision system.

About the Author

Nicholas Klein

Founder and visual artist, specializing in all aspects of a businesses presence from imagery and video to graphics and web. A graduate of IPFW with a Bachelor in Fine Arts, Concentration Photography as well as an Associates Degree in Business. His personal photography works are focused on landscape, travel, and aerial photography.