Description
Key Learnings
- Learn about the challenges related to extraction, storage, and processing of BIM data.
- Discover the business-specific value embedded in your own intelligent models.
- Ideate the architecture of a BIM data processing system.
- Work with tools and technologies enabling interaction with BIM data stored in the Autodesk ecosystem
Speaker
- Pawel BaranStructural engineer by education, BIM and software engineer by passion and trade - Pawel is bridging the gap between the two fields by specifying, designing and developing digital workflows as well as supporting their implementation. Always interested in innovation in the AEC industry, Pawel has been working with various technologies for over 10 years, starting with BIM, through visual programming and large-scale 3D printing, to authoring software solutions and working with data infrastructure. Pawel is always seeking for opportunities to leverage data and automation in order to improve the quality and speed of work done by all participants of the design process, with particular focus on BIM and interoperability workflows.
PAWEL BARAN: Welcome, everyone, to this Autodesk University session. The topic of today's presentation is "Understanding BIM Data at Scale: Data Processing Infrastructure for Revit." I'll be talking about it based on the journey that we took on at Buro Happold trying to build out such an infrastructure. And, yeah, today's type of presentation is a case study, so it will be very much about how we did it and the thought process that we went through.
First of all, a few words about myself. My name is Pawl Baran. I'm a software engineer consultant working with Buro Happold for quite a few years already. Throughout most of my career, I was mainly focused on programming and computational design, also innovation, like digital fabrication, also large-scale 3D printing. I did work as a BIM coordinator for some time, which also gave me a bit of a background to what I'm doing now.
Since 2019, I'm 100% focused on software engineering. It's like 90% BHoM Framework, which is a framework that we've developed at Buro Happold and Autodesk stack-- so Revit, Navisworks, Forge/APS, and so on and so forth. And, well, I started doing automation for Revit processes or BIM processes. But over time, I was drifting more towards data-oriented solutions, and that's why I'm here and that's why I'm presenting such a topic today.
Well, before we start, the presentation is split into two parts actually. So first, we will go through the thought process that was behind the undertaking that we took with Buro Happold on building private data processing infrastructure. And then we will go into implementation, so actually, how did we do it.
Well, a few words about Buro Happold itself. It's a relatively large engineering company, around 2,500 staff members, still growing in size, scope, and geographic geographical reach. The portfolio is, at this point, global-- five continents. Many highlight projects like, for example, Morpheus in Macau or Museum of the Future in Dubai-- so very computationally advanced projects, which means that also the company is taking part in this tech race that is happening in the industry. And, well, by taking part in the tech race, I don't only mean buying the products off the shelf that are advanced but also building out the in-house capacity by training people, building our own tools, and scaling up the code.
Moving on to data in AEC, why it's so interesting, exciting, and challenging. First of all, it's because it's about the depth. Our engineering processes are very complex. They are combined. They consist of multiple expertise, expert processes that are sort of stacked up on each other and interconnected, which means that we have multiple technologies and formats involved in our engineering processes. This can be documents, drawings, models, code, scripts, everything that we produce every day, and it's stored on different platforms like databases, cloud solutions, file storage, even external drives sometimes.
This all means that we have lots of-- we have high complexity and also data silos that are really hard to break, especially that many of our workflows are very much non-standard. Like, each project is different and many things are, I would say, manual to large extent.
But still, this is a collective knowledge of every business in this industry, which means that it's-- like, that's the value. That's something that we need to strive for, to actually capture this knowledge and to be able to organize it, to digest it in a meaningful way. And since, well, we can't do it, of course, to full extent. But we have one common denominator, which is very helpful, and this common denominator is BIM.
And why BIM? It's a common denominator because it captures the interdisciplinary engineering workflow on multiple levels. So we have the organization level, project level, and also model level. And on each level, we can draw different information from the BIM models and processes that we have. So any sort of business metrics, global overviews, aggregations on the organization level.
Then on project level, we can measure the progress. We can manage risk. We can work with coordination between models, between stakeholders, and so on. And then on model level, of course, it's just correctness of and quality of the problem itself. This also translates into different stakeholders on each level, which is pretty obvious.
Now, moving on to mining the BIM data. In this case, we'll be talking about Revit because that's, well, that's the main product used by or distributed by Autodesk. Also, the main product used at Buro Happold, so that's sort of our go-to solution that we deal with.
We have three ways of extracting data from Revit, which is manual. So we can use any sort of button-based solutions where we export stuff to databases or to Excel. But it's pretty much based on clicking. Then we have Revit API, which can help us automating the processes at the level of desktop Revit as an application.
And finally, we have Autodesk Platform Services, which lets us extract the data directly from the models hosted in the Autodesk Cloud. And then it's, yeah, we can build scheduled, more consistent cloud-based solutions that don't interact with desktops anymore. We have many different products that allow us to extract the data from Revit. They usually have similar loads, I would say.
First type of solutions is focused on extracting the geometry and the tabular list of properties of each element in a model, which means that we can extract everything from the model and then build another block from this data and then start processing it in a similar way as we would do it in the model itself, which is not necessarily the most efficient way of doing things if we already have one format that already has all this data. That's something that we haven't found overly efficient so far at Buro Happold. So that's why it's not necessarily the solution of preference for us.
The other type of products or solutions is the dashboards that are sort of pushed top-down. And as long as they are good with those objective rules set in, they often become problematic due to the fact that each project is slightly different. Unfortunately, the conventions are different. The stakeholders are different. And it's sometimes hard to compare apples to pears, and that can be then reflected in the dashboards generating false flags quite often.
And finally, the closed-formula solutions, like, for example, the model checker from Autodesk. That is a great tool, but at the same time, it only gives you this extent of queries that you can run and nothing more than that because-- yeah, because it's a closed formula. Based on all these considerations, we were thinking of a description of an ideal solution, and we came to a conclusion that what is important is to be able to run batch extractions as well as targeted queries in the language of Revit itself, so in Revit API. Then the scheduled and on-demand execution-- so either we can schedule the execution or we can run it on-demand on desktop-like solution, the desktop application where the user just runs a check and checks what happens and then actions it.
Execution against models published on Autodesk Cloud or collaboration for Revit so either-- I haven't talked about it, but we can either-- we have two levels of storing the data in Autodesk Cloud. We have the Autodesk Docs, where the models need to be published to be accessible through APIs. And then we have Collaboration for Revit, which is a system where Revit models are stored and then if we query at this level, then we actually query the Revit model, not the model stored in Autodesk Docs. And that's a pretty important difference. Because if you, for example, have a publish routine every week, but you want to check the model before publishing, then it's important to have the access to this collaboration for Revit level of modeling. And for now, it can be either done by Design Automation for Revit, which is an APS feature, or we can just do it using the desktop solution and running the checks from there or the extraction from there.
Well, we would like to minimize the massive blobs of data because, well, that's not the best, not the most efficient way of working with the BIM data, at least up until now. And finally, open-ended formula-- that's pretty obvious. We want to be able to extend the solution that we built and also be able to cross-reference the data with other sources.
Moving on to pre-design, so the steps that we took before we actually started designing, even, the product. Key considerations-- strategy, the objectives that are sort of-- that stem from the strategy; the type and format and storage of BIM data and non-BIM data that will be involved in the process; risk assessment; and finally, tech stack and skills-- so what we can do with the skills that we have.
Starting from strategy, the BH strategy consists of three horizons. The picture is blurred because it's confidential information. Apologies for that. But I highlighted the most important aspects in the context of this presentation.
So we have three horizons. The first one is BAU. Second horizon is foreseeable future, two to three years. And then Horizon 3 is more of a vision, six years ahead, let's say.
And as we can see in Horizon 1, we have a product, which is BIM Radar, as well as a process, which is high-performance BIM processes targeting the clients. So this is what we are dealing with at the moment, and this is sort of our short-term targets. And then in long-term, we have a vision to build this wider data analytics capability.
And it may-- it may sound vague, this BIM-- sorry, this data analytics capability. But it's not necessarily vague because as you can see, we have a well-defined vision in the context, for example, of BIM data, which you can see in the screen now, where the whole strategy is translated into a blueprint.
I will not go into details of this blueprint, but what is most important is the second bottom layer, so the curation of the data, because let's say the bottom layer. The bottom of the bottom is given. That's the data we have, the sources, and the ways of extracting it. But then curation is the step that is most important from the perspective of this exercise because we need to first build a way of understanding the data we are dealing with and persisting it in a way so that it can be used for work for the workflows built on top of this infrastructure.
So the second layer, curation part, is like the key bit that we are trying to discuss today. Well, back to reality. Horizon 1, this is what we deal with on daily basis.
So we've got different third-party tools as well as in-house workflows and dashboards that are not often very useful, as we can hear from different stakeholders. Sometimes they show obvious information. Sometimes the information is irrelevant, or even from the perspective of a particular project, wrong. Because, for example, there is a naming issue being flagged up on the project, while in practice, it turns out that the naming convention is enforced by the client, which means that the naming, of course, will be wrong from the perspective of wider organization, and it's still flagging a threat.
So we need to be aware of that. Also, the information that is flagged as wrong, but, for example, the users, they consistently say that this is OK, and this is just different interpretation. We need to be aware of that, and we need to take that into account.
As mentioned, processing blobs containing full model data hasn't been successful so far. So we are aiming for something more precise, especially that the users also have preference for more targeted problems, more targeted solutions that help them deal with their daily basis problems. And, of course, "I need it" culture across the whole industry. So if you are working on something, usually there is a range of people waiting for it to use it the day after. So it's always a bit tricky to measure, to manage the expectations and the time we need to build something.
And now, how did we phrase the objective in one sentence? It may sound a bit generic, but it is "Build an open-ended, customizable, and scalable Revit data extraction, persistence and processing framework leveraging the collective knowledge shared across Buro Happold." A beautiful sentence. Now what does it mean in practice?
Everything that I've mentioned before during this presentation. And then, in one picture, what we want to do is to enable targeted queries on demand and on schedule and batch extractions on schedule to be able to build both types of queries. Either the model checks that are sort of targeted at models or extract the data to build insights on higher level. And then we want to be able to run these queries and extractions against the Autodesk Docs files, also the cloud collaboration for Revit, and finally, desktop for the user who-- users who just want to click a button and see what's wrong in the model.
So this is our full-scale ideal solution. And then what's the outcome of this solution? Of course, dashboards, interactions with the users so that the user would know what needs to be changed or what's the status of the model, and finally, also, cross-referencing this data with other data, for example, business usage, metrics, software analysis, or like-- sorry, software analytics-- or any other design data that we have and store in different formats.
Risk management, very quickly about it-- we need to consider a few major items-- feasibility first, of course. Security-- this is mainly about data security. So far everything is meant to be in-house, which means that we don't need to deal with the external-facing solutions, which makes the problem less concerning because then we can hide behind our firewall.
Cost-- and here, we are talking about the cost of building out the tool but also maintenance. Because if we have lots of data, then of course, the cost of storage and processing will grow exponentially. So that's something that needs to be taken into account.
And on the legal side, data security. Again, everything behind our firewall and access to the data stored on third-party servers. That's very important because many models are stored on the servers. For example, the clients who don't necessarily want to let us use our apps on their BIM 360 or Autodesk Docs apps because, yeah, we need to convince them that they can trust our code and that we will not be capturing any other data apart from what we promise to do. So that's something that needs to be addressed and communicated clearly with the stakeholders that also have access or that own the servers on which we store our data.
And finally, tech stack and skills. Here we have mainly C#. Our framework is C# native. We also have Python developers, UI frameworks-- it's mainly Blazor and WPF.
On the side of data persistence, it's SQL Server and MongoDB and reflection of data, usually any sort of dashboarding, insights is Power BI. And the biggest value of-- the biggest value in-- biggest asset in our in-house codebase, which is BHoM. And BHoM is an open-source framework developed by Buro Happold, myself included. You can find it on bhom.xyz website. Please feel free to visit it.
And what is the BHoM? BHoM is a framework founded on a central schema-- we call it object model-- that defines how objects are represented within our workflows. So, for example, let's take a wall. If we work on a workflow that pulls the data from Revit and runs LCA workflow using the BHoM framework, the wall representation in such a workflow will be exactly the same as in a workflow where the user is building a generative design script in Grasshopper. I mean, the wall, of course, will be different, but the schema will be the same because we are using the same object definition. So the object will have the same properties.
Of course, for LCA, it will be more focused on LCA. And then for example, in generative design, it may be some analytical aspects, but it's the same schema, which means that we can compare apples to apples and also, for example, persist data coming from different sources or different workflows in a similar or in the same schema in similar representation.
And just to show you how does the sample object look like-- a circle, you can see. It's super lightweight, only the minimum amount of information stored on it, which means that also, the objects, if we want to persist them, they are super lightweight. There is no methods on them. So they are almost like DTOs.
OK, we've got our object model, and then we want to interact with other software or platforms. So for each platform or software, we build an adapter that allows us to push and pull data. So for example, we build an adapter with Revit, with Robot, with MongoDB, Excel, and so on and so forth. And each of those adapters lets us exchange the information between the software and the BHoM.
So, for example, if we-- so with this setup that we have on the screen, we can easily interact with Revit data being passed on to Robot, for example, but also to GSA or persisted in MongoDB. That's the power of BHoM. So it's a very powerful tool.
It's a toolset. You can use it in Grasshopper, but also it's a framework. So we can use it to leverage or to empower our software solutions. And, of course, it's like, the sky's the limit. You can use it anywhere for any purpose.
OK, design now. So we have our consideration. And now how do they translate into actual actions? High-level overview of the design that we've sketched-- we are building code and infrastructure framework.
So it's not code only. It's not infrastructure only. It's a combined framework that is responsible for both of those aspects. So it's almost a platform.
And then the main capability of this platform is to be able to run queries, both on-demand in Revit, or scheduled on the cloud-- in this case, APS or Design Automation for Revit. And this solution is meant to be accessible for the users locally in Revit, if the users are working with the Revit checklist site, and on the web, if we're talking about the Global Management, so global scheduled extractions that are meant to be used to build any sort of organization-scale dashboards or insights.
The tool, the solution, is meant to-- it's using BHoM as a sort of core framework, empowering the-- empowering the ability to persist the data, extract it, and process. And we'll talk about it a bit more later.
And very important-- the database is non-relational because as you can imagine, the BHoM objects are-- the BHoM schema is layered, so we have nested properties in and nested objects in our schema, which means that any sort of a relational database would be challenging to maintain and curate. So it's a non-relational database from the very beginning.
And then if you want to reflect this data in the form of any form, like any table or form, let's say Power BI then we use SQL Agents to basically create the views that are then consumed by particular implementations of reflection.
Architecture-- on high level, it looks like the diagram on the screen. So we've got the web and desktop UIs that enable us to build the checks that are stored on the cloud. Then we have the extraction and processing step during which we interact with the Revit models. And this is done on the desktop if we're talking of-- if we're speaking of those checks run by individual users or on APS, like via APS endpoints if the extractions are more focused on the metrics.
They design automation for Revit, so headless Revit execution in the cloud, that's still in the investigation stage, but definitely something we would like to enable over time in the next year because then the desktop Revit processing will also be possible on the cloud, which gives us a massive boost because then the checks can be run automatically without involving the user or sending out a virtual machine where we would schedule these tasks. Persistence, it's all on the cloud. So we have a second collection which is the result collection.
And finally, post-processing and reflection that, as already mentioned, a SQL agent that transforms the data into tabular form to be reflected in Power BI. And in parallel, we can also use the Mongo adapter that we have in the BHoM framework to extract the data from database, deserialize it, and then reflect it. For example, in Grasshopper to inspect it or in Revit to action it if the check or query is reactive and the user can make use of it in Revit.
Workflow-- it's a different flavor of the previous flowchart. So we start on the left. We have the query storage and we extract the query from the storage. We run it against the model. We have the result that is stored in the database, and then we can either preview it on the user's machine using the Mongo adapter that we have in the BHoM or we can translate data to use it to reflect it in any other tabular form or aggregate it with other data coming from other sources.
If we have more than one query, and of course, we will have more than one query because each query is like naming queries, one. Dimensioning query would be another one. Well, any sort of annotation query, a third one. Correctness of annotation, another one. Each of them is a separate query that is stored as a result and then they can be processed and reflected either in batches or individually.
Code architecture-- I will not talk in detail about what we have in this slide because I don't think we have time for it. But what is very important in this diagram is the fact that we have two sides of the-- we have two sides of it. So on the left of the dotted line, we have the desktop solution, which is a fully native Revit solution without any involvement of APS or Forge. And this is our buy-in in that we sort of prototyped at the beginning to actually start solving users' problems that they deal with on a daily basis.
So we've built this Revit side where you can create a query, you can persist it, and then you can run it. And once you run it, the output of the query is also persisted in the database. And then you can, as a Revit user, you can see it. You can see the output.
You can see the result of the query and you can also action it if there is a way of actioning it. So you can either select the elements or show them in view or, for example, fix something if it's, for example, let's say wrong works. Then you can click Fix Works button and that's how it works.
And, of course, at this point, we already are able to build first insights. And then this also gives us time and support from the practice to build this cloud-based automated scheduled version of queries and extractions. So this is the approach that we took and that gave us this initial boost to build this product.
About databases-- as mentioned before, the semantic and persistence layer are most important in this exercise because the semantic layer is actually the object model. So how do we define, how do we represent the objects that we work with? So we are very sort of conscious about the fact that the objects need to be well-defined and they need to have a schema that is efficient, but at the same time, it doesn't lead us to the point where we start regenerating similar schemas for similar objects coming from different sources.
So we need to be very conscious about how we create our schema so that it works for multiple different workflows for different contexts so that we don't repeat, reproduce the same-- we don't reinvent the wheel all the time by representing similar objects in slightly different ways. And then thanks to the fact that we are sort of doing it in the BHoM framework, we have BHoM-compliant data structures, which automatically gives us the BHoM adapters, which means that the persistence layer almost gets sorted by itself as long as we set up the database in a correct way because the adapter comes for free.
And, of course, since the fact-- since the objects in the BHoM framework are very minimalistic, as you could see on the example of the circle definition, reproducing the schemas for the use in relational databases is relatively simple because we can just reproduce the schema based on the object definitions that we have.
For now, we have only two collections which is very interesting because apparently, you can achieve that much with two collections-- one of them being the queries and the other one results. Of course, they are MongoDB, as mentioned before, because our team is sort of MongoDB native. And the big value of having only two collections and flexible schema-- I mean, it's flexible but in a form that we can add more types, more object definitions, without adding more schemas and more collections. This promotes speed of prototyping and development because we don't need to build another collection, another DB for every type that we add to our ecosystem.
And again, second layer is built on SQL Server. And we use the SQL Server Agent to translate the data to then reflect it in dashboards. Processing, also mentioned a few times already, but to repeat, we have the desktop Revit processing module that allows us to run the queries directly in Revit and then automatically show the outputs of the query. And also, if there is a particular action related to the query, we can expose the action to the user.
APS APIs-- Autodesk Platform Services APIs-- enable us to build scheduled periodical extraction tasks run by a service set up on a virtual machine. Here, the main use we make-- we make the main use from-- we make the main use of Indexes API and also Model Derivative API. These are the two APIs that are most powerful, I would say, in terms of scraping the models at this point. I know that there is a new one that is, I think, still in the beta stage that is using GraphQL, to be more specific, about the extraction. So that's a massive boost that we definitely will investigate as well.
And finally, design automation for Revit-- that's something that we are still investigating but haven't gotten there yet. UI-- a very simple Revit-side WPF on the website. It's Blazor outside standard stuff here. One important bit is that we are using the same view model to some extent for the checks to make sure that whenever we add a check, we only need to modify one view model to minimize the effort related to adding new queries and new workflows.
Interpretation-- many times mentioned today. Also, reactive fixes as Revit commands-- so as mentioned, if you have a Revit query that gives you an output, that can be post-processed. Then you have a Revit command to do the post-processing.
And finally, prototyping an investigation can be done using Grasshopper because in Grasshopper, we have a UI for the BHoM that lets us create a BHoM adapter. And literally with a few clicks, you can start querying the database for items and then you can manipulate it and investigate these items on the Grasshopper Canvas, which is absolutely beautiful and super helpful.
Implementation-- so the actual implementation, sort of mainly organization aspects of implementing the design. First of all, it's a long-term investment. It's very important to highlight the fact that it's not something that is built overnight or in one week or in one year even. We started in 2022 from brainstorming, planning, diagramming, some early prototyping.
This year was mainly development and testing, lots of prototyping, research detours, very agile approach that is then going to be more constrained in the upcoming years when we will be releasing and deploying the tool across the entire practice. Because for now, we were prototyping and only working with super users on chosen projects. So it's a really big undertaking that is still in progress.
Its current status, as mentioned, development and testing stage. And-- very important-- Agile approach. We are still learning from the practice what are the exact needs and what is the exact shape of the final product. Because we have the vision, we have the framework, we have the large-scale solution.
But the implementation details-- what sort of checks, where should be the button, how the user would prefer getting the UI setup-- these are the things that it takes time to research. And if you ask the user, you don't get a definite answer right away. And if you ask 10 users, you usually get 10 different answers. So that's something that we are still researching while developing. And it's something that couldn't be done with this sort of waterfall approach, so that's why we are very agile about it.
And also the agility is like-- to be even more agile, we store everything on-prem. We didn't deploy anything on any cloud platforms yet because this would mean that we will need to spend more effort on setting things up. But ultimate solution, the ultimate goal, is to migrate everything to Azure.
Just to give you a bit of a flavor of what I mean by being agile and learning from the practice, literally in August-- so two months ago, more or less or even less-- we had a community event called Automation Lab, during which one of the teams was QA/QC for NEP models. And even though I spoke to many people about this topic many times in the past, still we had newcomers-- we had new people. We had people we've talked to but with new ideas, being challenged by each other, talking in one room for many, many hours, coming up with weird and wonderful ideas of how things could be checked, how they could be actioned, and how can we improve our workflows. And that's something you wouldn't get if you just asked people.
So this is the community aspect that you also need to involve in this process if you really want to make this tool useful. And that's something that is still sort of a part of our development process. So we are very agile, and we are listening to the community to build something useful for the community. And that's the core ideal of what we are doing in the BHoM team and at Buro Happold in general, like, within technology team.
Future-- first of all, we need to finalize and deploy this massive product that we are building. And then, probably it will start sooner, but the first step of improving, like, making wider use of this tool is linking up the data from other domains to gain more targeted or more complete insights. And also assisted / semi-automated model fixing. I already mentioned that we can action certain queries in Revit to fix the models, but this is still pretty basic, like, fix work sets, whatever. But in long term, we could start thinking of more intelligent solutions.
Also using the design automation for Revit platform that would allow us to do those checks and fixes on the cloud without intervention of a user, this sort of stuff. This is a massive, massive opportunity that we are also investigating while building this tool. And, of course, decision-making, machine learning, and AI as well. Of course, we do have undertakings in this area within our practice, but it's very much targeted and we don't have a common data platform for it.
So here we are-- like, here, by talking about machine learning and AI, I'm talking about a platform that enables doing machine learning and AI in this higher-- in this wider, more curated scale.
Finally, summary-- we're heading towards the end of this presentation. Important to highlight that building such a product is not easy. It's not trivial to identify the realistic needs and requirements, as you could see in a few slides ago. The users keep on pumping the ideas about what they need. Also, they tend to forget what they needed a year ago or a month ago or a week ago, which means that we need to be very realistic about the needs requirements and how to implement them.
It's a long-term undertaking. It can't be done quickly because combining knowledge and skills from so many domains-- databases, programming, BIM as such, engineering-- combining this with avoiding pitfalls. It's not something that can be done quickly.
Design-- our design is different from most tools that are on the market because, as mentioned, it's more complete. It allows not only for batch extractions but also targeted queries executed in the language of Revit, which comes at the cost of complexity, of course, because we need to build all these UIs. We need to build the engine that is very much complex-- or maybe not complicated but complex. It needs to support multiple platforms and multiple adapters.
So that's the cost that we pay for being able to support many different workflows. But we believe that it's a more sustainable way to build such a data extraction and processing environment than blindly taking one approach, like one direction with one solution. Because, well, then if something changes on the way, it's hard to align. And that's why we took this very specific approach that I presented today.
And, well, will it be a success? I think that time will show. So thank you very much, and you're welcome to ask the questions. Thank you.
Downloads
Tags
Product | |
Industries | |
Topics |