Where is data stored?

A company’s data architecture describes how data is collected, stored, transformed, distributed, and consumed. It includes the rules governing structured formats, such as databases and file systems, and the systems for connecting data with the business processes that consume it. In general, Data Architecture is a master plan of the enterprise data locations, data flows, and data availability. It is a conceptual infrastructure to support data quality, data stewardship, data integration, data migration, and system collaboration.

Data_Architecture_mine2.png
  • ETL stands for “extract, transform, load” which is the procedure by which data is copied from multiple sources, cleaned and transformed into the proper structure and loaded into the target database, the data warehouse.

  • The data warehouse is a central repository of information. Data regularly flows into the data warehouse, where it is stored in a structured, integrated and consistent format which allows business intelligence (BI) analysts to build queries to pull the data from the warehouse into reports.

  • In contrast, the data lake includes unstructured and semi-structured data. In order to retrieve information from the data lake, you use methods such as big data analytics, full text search or machine learning which do not require data in a tabular format.

  • A data mart is a data warehouse that serves the needs of a specific team or business unit, like finance, marketing, or sales. It is smaller, more focused, and may contain summaries of data that best serve its community of users.

Here and here are some further explanations of what a data warehouse is.

We found a good article by McKinsey that provides some tipps on how to build a data architecture that drives innovation:

Create a data architecture that is

 

Modular & simple

to ensure flexibility and scalability

 

Easily accessible

to ensure that managers can analyse data without the bottleneck of BI

Bottom-up

to ensure fit with business unit’s needs

 

Cloud-based

to lower the expertise required, speed up deployment and reduce operational overhead