Big Data System: Benefits and Introduction of Big Data Architecture

Dr. Manishika Jain- Join online Paper 1 intensive course. Includes tests and expected questions.

Topics

  • What is big data architecture?

  • Benefits of big data architecture

  • Working Model of big data architecture

  • Big data architecture components

  • MCQ

What is Big Data Architecture?

  • The big data architecture is designed to handle the injection processing.

  • Analyzing data of traditional database which are very large and complex.

Big data architecture is designed to handle the following types of work:

Big Data Architecture

Big Data Architecture

Loading image

Benefits of Big Data Architecture

In big data the amount of data is very large which increases every time. It is not enough to just have data, we need to be able to understand and use data on time to make our important decisions, so it is very important to have an architecture of big data.

The benefits of big data as:

  • Reducing costs

  • Save time

  • Making faster, better decisions

  • Predicting future needs and creating new products

Introduction of Big Data Architecture

Introduction of Big Data Architecture

Introduction of Big Data Architecture

Loading image

The components of big data architecture are:

  • Data sources.

  • Data storage.

  • Batch processing.

  • Real time message ingestion.

  • Stream processing.

  • Analytical data store.

  • Analysis and reporting.

Data Sources

  • There are many sources for generating data. Big data consists of machine generated and human generated data. This data is in many forms such as structure, unstructured and semi-structured form.

  • Machine generated data: cell phone GPS signal, machine logs, device etc.

  • Human generated data: photos, video, audio, social media text message.

  • So, there is more than one source of big data.

For example:

Data Sources

Data Sources

Loading image

Data Storage

The data that is managed for batch processing operations is usually included in the distributed file store and stored in the file store which can hold high volumes of large files in various formats. This type of store is called data lake.

For example:

Options for implementing this storage include:

Data Storage

Data Storage

Loading image

Batch Processing

Because there is a large amount of data, the architecture needs a solution that filter, aggregate, and process data for advanced analysis. It uses long-running batch jobs. These jobs usually use sources, read the data from them, and process it and finally provides output for new files.

For example:

The batch processing is done in various ways such as:

Batch Processing

Batch Processing

Loading image

Real-Time Message Ingestion

  • Big data solve our problem if solution requires a real-time source, the big data architecture must have a way to store and capture real- time messages.

  • It is a simple data store or data mart responsible for all incoming messages, and they are dropped inside the folder which is used for data processing.

  • There is also solution that require message ingestion store to serve as a buffer for messages and to support scale-based processing, reliable delivery along with other message queuing semantics.

For example:

Real Time Message Ingestion

Real Time Message Ingestion

Loading image

Stream Processing

  • Stream processing comes after real time message in Real time message ingestion.

  • It captures the real time messages and provide the kind of solution that processes the data by filtering and aggregating so that the data can be prepared for analysis.

  • Then the processed stream data writes the output sink.

For example:

Stream Processing

Stream Processing

Loading image

Analytics-Based Data Store

  • As soon as stream processing and batch processing is done, the data is sent for analysing.

  • For this, Data is collected in one place so that the whole data can be analysed. There are many solutions in big data that are ready to analyse the data.

  • This might take the form of a cloud-based data warehouse or a relational database, depending on your needs.

Orchestration

  • Orchestration is an imported solution of big data that uses big data in which the repeated data processing operations are used. That is the data, which is being repeated and again, that data is encapsulated in one place.

  • The processed data is then loaded into the analytics data store or push the results straight to a report or dashboard.

  • Orchestration technology is used to perform all this process run automatically.

For example:

Analytics Based Data Store: Orchestration

Analytics Based Data Store: Orchestration

Loading image

MCQ

Q1. _____ captures the real time messages

1. Batch processing

2. Real time message ingestion

3. Stream processing

4. Data store

Ans: 3. Stream processing

Q2. where is the data generated by humans stored mostly in big data architecture?

1. Batch processing

2. Real time message ingestion

3. Stream processing

4. None of these

Ans: 2. Real time message ingestion