Evaluating the Performance of Large Language Models in 3D Architectural Modeling

Authors

  • Aris Budhiyanto Department of Architecture, Petra Christian University, Siwalankerto No.121-131, Surabaya, Indonesia Author
  • Kevin Harsono Graduate Institute of Architecture, National Yang Ming Chiao Tung University, No. 1001, Daxue Rd, East District, Hsinchu City, 30010, Taiwan Author
  • Sarai Montiel Escandon Department of Architecture, Universidad Veracruzana, Calz Juan Pablo II 1193, Costa Verde, 94294 Veracruz, Ver., Mexico Author

Keywords:

Large language model, rhino 3D model, ChatGPT, claude

Abstract

This study evaluates the performance of two large language models (LLMs), Claude and ChatGPT, in generating 3D architectural models from textual and visual prompts. Using Pass@k metrics (k = 1, 3, 5) and error analysis, the research examines their accuracy and reliability. Results show that both models perform equally at Pass@1 (24%), but Claude achieves higher scores at Pass@3 (43% vs. 35%) and Pass@5 (47% vs. 40%), demonstrating greater consistency. While syntax errors are minimal (below 5%), structural errors dominate, particularly in complex tasks where outputs tend to be simplified, incomplete, or overlapping. Logical and visual-semantic errors occur less frequently but further reduce usability. Despite these limitations, LLMs demonstrate strong potential in multimodal processing and efficiency, as they can generate models in under one minute. The findings suggest that LLM-based modeling is still at an early stage but may evolve into a transformative tool in architectural design, bridging conceptual speed with structurally valid outputs.

Downloads

Published

2026-05-15

Issue

Section

Articles