The Future of AI Image Recognition: Unveiling What’s Next

Three questions and answers: AI image recognition - what's next?

Write new article from this text only in smaller paragraphs

AI image generators contain systems that can process language, recognize images and weave both into new content. Open systems such as Dall-E and Stable Diffusion make such applications accessible to the general public. With Firefly, Adobe is already bringing such generators to Photoshop for enterprise customers. The image generators have triggered a wide debate about copyrights and many artists are actively speaking out in their communities. In an interview with Dr. Gerhard Heinzerling, AI Specialist at the Arineo Group, takes iX a look at whether these systems can also technically draw, where image recognition could develop and what role multimodal learning has in understanding prompts. Many people are already using AI image generators instead of illustrators or graphic designers. How quickly do such systems also replace technical draftsmen? The use of AI models to generate images in the field of technical drawings has made progress in recent years. There are already models that can generate simple technical drawings or complete existing drawings. These models can automatically create drawings based on input parameters and specific requirements. However, technical drafting is a profession that requires extensive expertise and problem-solving skills in addition to drawing. It’s not just about generating images, but also understanding technical concepts, considering the needs of the customer and adapting the drawings accordingly. Currently, AI models are not advanced enough to replace humans when it comes to generating complex technical drawings. It still requires human expertise to understand the technical details, make changes and come up with creative solutions. I don’t see a real and complete replacement of technical draftsmen by AI or image generators in the near future. dr Gerhard Heinzerling did his doctorate in 1999 on the question of how words are stored in the brain. He then worked as a SAP consultant and is now employed at Arineo in the field of image recognition using AI. Apart from drawings and the now officially canceled biometric mass surveillance: What new uses of image recognition systems can we expect in the near future? There’s a lot coming our way. In the automotive industry, image recognition systems play an important role in the development of autonomous vehicles. AI is used to recognize and track traffic signs, pedestrians, vehicles and other objects on the road. In healthcare, image recognition systems are discussed primarily for the diagnosis of diseases and injuries. It is about detecting tumors, abnormalities or other health conditions on X-rays, CT scans, MRIs and other medical images. Image recognition systems can also be used in the open field. On the one hand, plant diseases, pest infestations or nutrient deficiencies in plants can be identified in agriculture. Such systems are also suitable for monitoring plant growth, for harvest planning and for optimizing irrigation. On the other hand, image recognition can help to monitor pollution, deforestation, biodiversity or other environmental aspects. The AI ​​then supports environmental protection measures and the analysis of environmental changes. And retail can also benefit: Image recognition systems analyze customer behavior and customer flows and help to optimize product placement and improve inventory management. Everyone talks about AI, but few get to the point. The new iX Special shows how the architecture behind the large language models works and which systems work together in a generative AI when generating images. Developers learn how AI chips work and why so many of them are appearing on the market. A benchmark helps to choose the right GPU for your own projects. For self-development, the special issue guides you through common libraries and shows where the best open models can be found. iX subscribers receive the new issue free of charge – otherwise order it here in the heise shop now! Images and language are very different and complex in the way they convey meaning. How can we be sure that an AI system can process all the information from a human language prompt or an image? Processing information from a prompt is a complex task for AI systems. However, there are several approaches to ensure that an AI system can properly process all relevant information. On the one hand, training with a wide range of data. To deal with information from different sources, it is important to train an AI with a wide range of data. This can mean that developers train the system with a variety of text, images and multi-modal data sources to develop a thorough understanding of the different information formats. This then leads to multimodal learning, in which AI systems process both visual and verbal information together. By training with multimodal data, AI systems can learn to create relationships between images and associated texts and thus develop a better understanding of the information. For special purposes, the AI ​​systems can then be fine-tuned to specific tasks or domains. Transfer learning also enables AI systems to apply the knowledge learned in one context to new, similar contexts. The results of models always depend on the data quality and, especially with complex topics, you should always include the expertise of people. Mr. Heinzerling, thank you very much for your answers! In the “Three Questions and Answers” series, iX wants to get to the heart of today’s IT challenges – whether it’s the user’s point of view in front of the PC, the manager’s point of view or the everyday life of an administrator. Do you have suggestions from your daily practice or that of your users? Whose tips on which topic would you like to read in a nutshell? Then please write to us or leave a comment in the forum. (pst) To the home page

Leave a Reply