BHP has spent the past 14 months establishing a “logical data fabric” that makes it easier to find and access data stores from across its worldwide operations for data-driven projects.
The miner has deployed data virtualisation clusters by Denodo in all of its major geographies, as well as adjacent to its main cloud-based data lake.
Data virtualisation leaves data in source systems but exposes “an integrated view of all the data to data consumers”, according to Denodo. “As business users drill down into reports, data virtualisation fetches the data in real time from the underlying source systems.”
BHP principal solutions architect Joshua Fletcher told Denodo’s Datafest 2020 conference that the miner had previously physically copied data into a new repository every time it was needed for a new project.
“We were continually repeating the engineering effort required to access that information,” Fletcher said.
“Before our data scientists or data analysts could access this information, they had to rely on the ingestion or loading of that data, increasing the delivery timeframe as well as the delay in the value that we were going to get from that data solution.
“Because we were creating project-centric data repositories, that meant we had multiple copies of the same data. That also increased our total cost of ownership.”
Fletcher said BHP had recently created a data strategy to underpin its move to become a “data-driven organisation”.
“We wanted to improve the way data professionals within BHP access and consume that information, as well as improving the quality of the data that they use,” he said.
“We wanted to provide data plumbing that would enable our assets [mostly mines] within our organisation to leverage the data that they create as well as some of our global data stores.
“We also wanted to build reusable data assets that multiple people could access … in a timely and effective way.”
The data strategy led BHP to deploy a data virtualisation platform across its worldwide operations, comprising a number of interconnected Denodo clusters.
“We have data marts, data warehouses and data lakes spotted throughout the organisation,” Fletcher said.
“We put together two Denodo clusters in Perth and Brisbane, another Denodo cluster in Houston [for the US operations], a fourth cluster in our Santiago data centre [for BHP’s South American operations], and then we have a data lake in a cloud platform, so we also put a Denodo cluster co-located with that data lake.
“These aren’t standalone clusters; we wanted to make sure they were interconnected, so if you’re a user in Santiago and you request some global data that’s held in the Houston data centre, you don’t have to know that you get that data out of Houston.
“The user just queries their local cluster - the Santiago instance - and that will then be able to reference the Houston cluster to go and request that information.”
Fletcher noted the data virtualisation model still meant that BHP had to ship data across regions.
“However, by using local Denodo instances in each of those data centres, any queries and combinations of multiple data sources there will be optimised, and the dataset will be aggregated where possible before its return to the user, so we’re reducing the overall footprint of the data that we’re moving,” he said.
“It also meant that we could make the same contextualised data assets visible in every one of those clusters, so no matter which one of them you log onto you see the same data assets that you can query.”
The Denodo platform is designed to be queried directly by BHP data users; Fletcher said that “desktop connectors” were built “so that any user within the organisation is able to connect to the Denodo platform and query it with any tool they would like to.”
In addition, the company used “standardised and endorsed patterns” to provide BHP’s solution architects with “pre-built connectivity to some of our key data marts and data lakes.”
With the Denodo platform making some sensitive internal data stores more accessible, Fletcher said that security was made “a big focus”.
“Because we wanted this platform to handle our most highly confidential data, we made sure that we set up encryption both in transit -so any data transferred through the Denodo platform is encrypted - as well as anywhere we’re caching that information is also encrypted at rest,” he said.
“We use extensive auditing to ensure that we know that every single user we have logged and can report what queries they’ve used, and what data sources they’ve touched.”
Fletcher said the data fabric was tested on several internal projects.
“One of the key projects we partnered with needed to connect multiple data sources,” Fletcher said.
“They originally were going to integrate that into a data mart.
“That was going to be quite a lengthy delivery timeframe. It also meant creating copies of the data before they were then able to build a dashboard on top of that data mart.
“As an alternative, we offered the use of data virtualisation to access that data in situ. We wouldn’t need to make a copy of it.
“Because we were using data virtualisation through Denodo, it also meant that we were able to iterate on the way that we were combining that information to present to the dashboard.”
Once a source is connected to Denodo, it then becomes a standalone dataset that can be more easily queried by other teams.
“We were able to publish that data as a standalone dataset that other users were then able to access, and that aligns with our data strategy in terms of build reusable data assets that multiple people can access,” Fletcher said.
BHP worked directly with system and data owners to smooth the path for those stores to be made accessible through Denodo.
Fletcher said that approach is already paying dividends.
“We had a really good example out of this project where one of the data owners started to direct all new requests for that same data through to the data virtualisation platform,” he said.
“They said, ‘I’ve published it there, it’s exactly the way that I want it to appear, and it’s secured in the most appropriate way’, so all users are now directed to that.
“We thought that was a really great win.”