Neural generative
networks are currently being actively implemented in various areas of human
activity and have a significant impact on work processes, while demonstrating
impressive capabilities in creating new data similar to that on which they were
trained, from generating realistic images and videos to creating music and
writing texts. Generative networks have demonstrated their ability for
creativity and innovation beyond traditional algorithms. In the field of
design, for example, they are used to create new product concepts,
architectural designs, and fashion collections [1-3].
In addition to
entertainment and media industries, generative neural networks are actively
being implemented in high-tech industries such as banking, law, and industrial
production. In medicine, research is underway on the use of neural network
technologies for analyzing medical images and developing personalized treatment
plans.
However, despite the
impressive potential, the application of generative neural networks in
manufacturing tasks, especially in computer-aided design (CAD), faces serious
challenges [4]. Although neural networks are potentially capable of generating
design documentation and 3D models, their "black box" - the
unpredictability and opacity of the generation process - becomes a significant
and dangerous limitation.
Unlike traditional CAD
methods, where each design step is controlled and documented, neural networks
often produce results without a clear explanation of how they were obtained,
which makes it difficult to validate the results, find and correct errors, and
make adjustments during the modeling process. This makes it difficult to trust
the results and limits their use in critical projects that require strict
quality control and compliance with standards. As a result, despite the
potential to automate and accelerate the design process, the implementation of
generative neural networks in CAD production processes requires solving the
“black box” problem and developing methods to ensure deterministic control and
clarity of the generation process.
This study proposes a
concept of a hybrid methodological approach designed to overcome these
limitations. The approach is based on the synergy of natural language
processing (NLP) and verified engineering software packages. It is assumed that
the combination of these two approaches will minimize the likelihood of errors
and inaccuracies in the design process, while ensuring the necessary level of
control by specialists.
The proposed methodology
is based on the integration of the capabilities of artificial intelligence
systems in the field of natural language processing and the rapid generation of
variable solutions with existing algorithms for constructing CAD models in
domestic automated design systems such as KOMPAS-3D [5] and TeFlex [6].
The validity of the
proposed approach is confirmed by the fact that the exclusive use of neural
network technologies does not guarantee the consideration of all design features
and does not provide the possibility of manual correction of the identified
discrepancies. During the experimental verification of the concept, the open
software platform Blender was used, where basic
three-dimensional geometric structures were synthesized by integrating two
technological approaches: a sphere, a cube, a cone and a gear wheel. Each
geometric object was characterized by the possibility of parametric
modification.
Thus, this study aims to
develop and verify a hybrid methodological approach to automated design that
combines the capabilities of AI and traditional CAD systems. The results of the
study can contribute to improving the efficiency and accuracy of the design
process, as well as expanding the control capabilities of specialists.
The proposed
methodology is a hybrid approach to automated 3D modeling, combining natural
language processing (NLP) with the use of proven engineering software packages
(CAD), such as KOMPAS-3D or TeFlex. This approach is aimed at minimizing errors and
increasing the accuracy of the modeling process compared to using exclusively
generative neural networks. The key advantage is the verification of the
parameters of the script generated by the AI, instead of checking the entire
generated model.
Instead of
directly using a neural network to generate a 3D model, which is fraught with
hidden errors, text AI is used to create a control script in a programming
language compatible with the selected CAD system. This allows shifting the
focus of control from checking the finished model to verifying the parameters
specified in the script, ensuring earlier detection and correction of potential
errors. The iterative nature of the process involves adjusting the prompt and
script based on the analysis of intermediate results, which ensures flexibility
and high accuracy of the final 3D model.
During the
research, an approximate algorithm was developed that allows applying a similar
approach in practice. Its scheme is presented in Fig. 1.
Fig. 1. Algorithmic diagram of the described methodology
1) Problem Statement: Clearly
define the requirements for the 3D model, including functionality, geometric
parameters and constraints.
2)Prompt generation:
Formulate a text query ( prompt ) for the text AI describing the
desired 3D part, taking into account the results of step 1.
3) Script generation: Using text
AI to generate control code (script) in a programming language compatible with
the selected CAD system (e.g. Python for Blender, programming languages supported by KOMPAS-3D or TeFlex).
4) Script processing in CAD: Run
the generated script in a CAD system to automatically build a 3D model.
5) Correction and validation: Analysis of the obtained 3D model by a
specialist. Making necessary corrections to the prompt or script based on the
analysis of the results. Repeating stages 3-5 until a satisfactory result is
achieved.
6) Save Model: Save the finished 3D model in a suitable format for further use (e.g. for stereoscopic display).
This methodology offers
a compromise between automation and controllability of the 3D modeling process,
combining the advantages of AI and proven engineering tools. Positive factors
include such as reduced modeling time due to process automation, but unlike the
use of special neural networks for 3D modeling, here it is possible to control the process and immediately make the
necessary adjustments. The entry threshold for users without deep knowledge in
programming for performing similar production tasks is also reduced, at the
same time, the value of advanced specialists only increases, since validation
and verification tasks require in-depth professional knowledge and extensive
experience.
However, the use of such
a methodology poses a new set of problems that must be taken into account when
working. Among them:
∙ dependence on the quality of text AI and
the correctness of the prompt;
∙ the need for basic knowledge in working
with the selected CAD system;
∙ the possibility of errors in the generated
script;
∙ limited to text description only when
formulating a task, which creates inconvenience for engineers.
It should be noted that there are two approaches to 3D modeling: manual modeling ( Manual Modeling ), which allows you to create more
unique and detailed objects, and scripted modeling ( Scripted Modeling ), which
is more focused on automating the modeling process and creating parametric
models. Script modeling is suitable for creating complex structures or a series
of similar objects with varying parameters that are difficult to model
manually.
Script modeling is used in the following areas:
∙ creation of parametric models (machine parts, architectural elements);
∙ generation of complex structures and patterns;
∙ automation of the modeling process;
∙ projects requiring high precision and reproducibility .
Script modeling
is characterized by the need to have programming skills in addition to modeling
skills. Programming such a script requires knowledge of the basics of syntax
and understanding of object construction algorithms; the time for debugging the
program and testing its operation should also be taken into account.
A simple task
like drawing primitives can take a novice programmer 1-2 hours to write a
program and debug it. A professional developer with experience writing scripts
in Python or JavaScript can complete the same simple task in 10 to 30 minutes.
The proposed approach allows you to reduce the time it takes to create a
separate script and increase work efficiency.
During the
experimental verification of the concept, the open software platform Blender [7] was used, where, through the integration of
two technological approaches, basic three-dimensional geometric structures were
synthesized: a sphere, a cube, a cone and a gear wheel.
Blender was
chosen as a platform for conducting preliminary experiments due to the
following factors:
1) Open source and freely
available: Using Blender eliminates licensing restrictions and provides
unlimited access to the source code;
2) Python scripting support;
3) Rich API and large community:
Blender's well-documented API and active developer community provide access to
a wide range of tools and libraries, making it easy to integrate with external
systems and expand your experimentation capabilities;
4) Blender provides parametric
modeling tools, which allows you to create 3D models with variable parameters
set through scripts. This is essential for testing the hypothesis about the
influence of parameters set by text AI on the characteristics of the generated
models;
5) Blender provides a wide range
of tools for subsequent manual editing and modification of 3D models, which
allows you to analyze the results and make the necessary adjustments.
The method of automated
modeling using programming based on the execution of program code (script) was
applied. Within the framework of this approach, geometric modeling is carried
out by means of automatic execution of program code (script) generated by an
external system. This approach demonstrates high productivity in generating primitive
geometric shapes and relatively simple composite objects, the geometry of which
is completely determined by a set of input parameters.
This approach also makes
it easy to vary the specified parameters in various combinations (Fig. 2).
Fig. 2. An example of using the control code for modeling a 3D figure (gear)
Initially, the figure
was defined by the description of the following set of parameters (Table 1):
Table 1. Set of forming parameters for the 3D model of a toothed gear.
def create_gear_2_82(
teeth=12, # quantity teeth
radius=0.1, # radius (10cm)
thickness=0.02, # thickness (2cm)
tooth_depth = 0.02, # depth tooth (2cm)
tooth_width =0.02 # width tooth (2cm)
|
Based on the
results of the script execution, the program generated a gear model, shown in
Fig. 3.
This approach provides
the possibility of operational software modification of the generated 3D model.
As an illustration, Fig. 4 shows an example of parametric modification of the
cone geometry (the parameters are given in Table 2). It consists of performing
a Boolean subtraction operation, as a result of which a segment constituting a
quarter of the volume of the original cone is removed.
Fig. 3. Visualization of the “gear” part
Table 2. Set of forming parameters for the 3D model of a cone.
# Creating a cone
# Base radius = 0.1 meter (10 cm)
# Depth (height) = 0.2 meters (20 cm) - can be changed as desired
bpy.ops.mesh.primitive_cone_add (
radius 1=0.1, # base radius in meters
radius 2=0, # radius of the top (0 for a sharp cone)
depth =0.2, # cone height
location =(0, 0, 0.1) # location (raised by half height so that the base is at the level of the grid)
)
|
Fig. 4 Cone with a cut out segment
Visualization using
stereoscopic technologies significantly enhances the viewer's perception of
depth and spatial characteristics of an object, bringing its presentation
closer to real perception. The key advantage of this method is the creation of
conditions that are as close as possible to natural visual perception.
The project, conducted at the Keldysh Institute of Applied Mathematics of
the Russian Academy of Sciences, investigates methods for creating stereoscopic
representations of the results of scientific research. Two stereoscopic systems
are used for the experiments: a classic setup and an autostereoscopic Dimenco
monitor, which provides viewing of a stereo image without
special glasses. The autostereoscopic monitor allows the formation of an
integrated image, including multiple projections of an object, thereby
expanding the range of viewing angles. A detailed description of the
autostereoscopic display technology is given in [8-9].
Fig. 5 shows a sample of a composite stereoscopic frame constructed using the
multi-view method (on the right) and a separately enlarged
image allowing one to examine details (on the left). The left image is purely
illustrative and is not part of the stereo frame.
Fig 5. Results of hybrid gear modeling (main frame + multi-view representation)
Also shown in stereo was
a modified cone model with a removed segment that made up a quarter of the
model's volume (Fig. 6).
Fig 6. Results of hybrid
modeling of a cone with a cut segment (main frame + multi-view view )
The complex geometry of
3D models, characterized by the presence of curved surfaces, is of considerable
interest for studies of stereoscopic perception, since the subjective
perception of depth and shape can vary widely depending on many factors,
including individual characteristics of the visual system and the physical
dimensions of the object.
In this paper, a new
hybrid approach to automated 3D modeling was presented, combining natural
language processing (NLP) and traditional CAD methods. This approach, based on
AI-based script generation and subsequent processing in specialized software,
demonstrates a number of advantages and disadvantages that need to be discussed
to assess its practical applicability and prospects for further development.
On the one hand, the
proposed method demonstrates significant potential for improving the efficiency
and accuracy of the 3D modeling process. Automation of script generation based
on text descriptions significantly reduces the time required to create basic
models, and the ability to check script parameters at early stages allows
minimizing errors and increasing the reliability of the final result. The
flexibility of the iterative process, which allows you to adjust both text
queries and the generated code, makes this approach adaptable to various tasks
and requirements. A specialist can focus on adjusting and improving the model,
rather than on the routine creation of basic geometry. As the system is used
and the results are adjusted, it is possible to train AI to improve the quality
of the generated scripts.
However, it is necessary
to recognize certain limitations. The quality of the generated scripts directly
depends on the quality of the text request (prompt),
which requires certain skills and understanding of the capabilities of AI from
the user. In addition, the need for manual correction and validation of results
limits the degree of full automation of the process. There may also be errors
in the generated code that require specialist intervention. Finally, the
applicability of this approach may be limited by the capabilities of the CAD
systems and programming languages used. There are also problems with
scalability at this stage: the approach may be effective for creating
individual parts, but its capabilities for complex assemblies and projects may
be limited.
The proposed hybrid approach is compared with existing approaches to 3D modeling using neural
networks, which can be roughly divided into several categories.
1. Fully generative models [10]: These models, such as PointNeRF, GAN- based
models (e.g. StyleGAN for 3D), use neural networks to
generate 3D models directly from noise or latent space. Their advantages
include high generation speed, the ability to create new unique shapes.
However, in their case, the "black box" problem becomes pronounced.
It is difficult to control the generation process and make adjustments, so the
quality of the models can be unpredictable, and the final post-processing can
take as much time as designing from scratch. This method is suitable for use in
areas where precision and attention to detail are not required.
2. 2D to 3D
transformation based models [11]: These models use neural networks to transform
2D images (or multiple images) into 3D models. Examples include multi-view
image based or sketch based methods. This makes it possible to create 3D models
from available 2D data (photographs, drawings). However, the quality of the 3D
models is highly dependent on the quality and quantity of the 2D data. It can
be difficult to obtain accurate geometry and detail. The "black box"
problem is also present.
The proposed
hybrid approach occupies an intermediate position between fully generative
models and scripted modeling. It uses text-based AI to transform informal
requirements into parameters, which are then used to control the generation of
a 3D model. Once generated, the model can be manually refined.
There are a
number of promising areas for further research and improvement. Here are some
of them:
∙ development of specialized text AI for CAD;
∙ automation of prompt engineering:
development of algorithms that automatically generate optimal prompts based on
specified requirements for a 3D model;
∙ integration with knowledge bases and
ontologies;
∙ development of APIs for data exchange
between neural networks and CAD systems: creation of standardized APIs that
allow easy integration of neural networks with existing CAD systems;
∙ interactive model editing: Developing
interfaces that allow professionals to interactively edit generated models
using CAD tools and automatically update scripts;
∙ and integration with systems that
recognize drawings: this will allow the creation of a mixed prompt based on a
graphic request and a text description.
The article
discusses a hybrid approach to 3D modeling that combines natural language
processing and traditional CAD methods, which demonstrates a promising
combination of automation and controllability. The methodology of the approach
is described. The methodology is tested on real modeling problems. The results of
work on construction on an autostereoscopic monitor using a multi-view
representation are presented.
Despite the
identified limitations related to the quality of input data and the need for
manual adjustments, the method showed significant potential for accelerating
and increasing the accuracy of the 3D model creation process, especially for
parametric tasks. Further research should be aimed at improving the quality of
script generation, automating the validation process, and expanding compatibility with various CAD systems. The results obtained open
up new opportunities for increasing the efficiency and accessibility of 3D
modeling for a wide range of users.
1. J. Ho, A. Jain, P. Abbeel , Denoising Diffusion Probabilistic Models, 2020, https://doi.org/10.48550/arXiv.2006.11239 (date appeals 03/29/2023)
2. C. Meng , Y. He, Y. Song, J. Song, J. Wu, J. Zhu, S. Ermon , SDEdit : Guided Image Synthesis and Editing with Stochastic Differential Equations, 2022, https://doi.org/10.48550/arXiv.2108.01073
3. Radford A., Jong WK, Hallacy C., Ramesh A., Goh G., Agarwal S., Sastry G., Askell A., Mishkin P., Clark J., Krueger G., Sutskever I. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv preprint arXiv:2103.00020 [cs.CV]. https://doi.org/10.48550/arXiv.2103.
4. Bondareva N.A. Graphic neural networks and image verification problems // Proceedings of the 33rd International Conference on Computer Graphics and Machine Vision GraphiCon 2023, V.A. Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia, September 19-21, 2023, pp . 317-327, DOI : 10.20948/ graphicon -2023-317-327
5. KOMPAS-3 D Russian import-independent system of three-dimensional design. URL : https://kompas.ru/ (date of access 29.04.2025)
6. T - FLEX CAD Russian engineering software for 3D design and development of design documentation URL : https :// www . tflexcad . ru / (date of access 04/29/2025)
7. Blender URL : https://www.blender.org/ (accessed 29.04.2025)
8. Andreev SV, Bondareva NA Constructing a representation of textual information in stereo presentations // Proceedings of the 28th International Conference of Computer Graphics and Vision GraphiCon-2018. — TUSUR Publishing — Tomsk, 24–27 September 2018 — P. 86–89.
9. Andreev SV, Bondareva NA, Bondarev AE Expansion of the Functions of the Multi-View Stereomaker Software for Automatic Construction of Complex Stereo Images // Scientific Visualization. — 2021. — Vol. 13 - N 2 - P. 149-156. DOI: 10.26583/sv.13.2.10
10. Masterpiece Studio URL: https://masterpiecestudio.com/ (date appeals 04/29/2025)
11. Kaedim AI-powered Art Outsourcing URL: https://www.kaedim3d.com/ (date appeals 04/29/2025)