K2D Data Analytics

Posts

Download Learning Spark V2 from O'Reily for FREE!!! (2021)

- August 15, 2021

Apache Spark™ is a unified analytics engine for large-scale data processing. Spark can run 100x faster than Hadoop's MapReduce in Memory and 10x faster in while processing the data in disk. Learning Spark V2 is the best book for learning Apache Spark. It includes the latest updates on new features from the Apache Spark 3.0 release, to help you: Learn the Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow Use Koalas, the open source pandas framework, and Spark for data transformation and feature engineering and You can get it absolutely for FREE from Databricks!! Go to the below link and fill the details to download it. Download Learning Spark V2

Get AWS Certified: Solutions Architect Challenge - 50% Offer (Validity till - December 4, 2021) (Updated)

- August 13, 2021

Get AWS Certified: Solutions Architect Challenge Join this challenge and set a goal for earning AWS Certified Solutions Architect - Associate. Follow a recommended preparation path to earn your certification before AWS re:Invent 2021. Get exam ready with free training, including our new series on Twitch, AWS Power Hour: Architecting . By signing up, you’ll also be eligible for a 50% discount voucher for the exam. Terms and conditions apply. Solutions Architect Challenge This challenge ends December 4, 2021. Why should you take the challenge? Set and achieve your goal with peers from around the world and support from AWS Training and Certification. You’ll get free recommended training, suggested resources, Q&A opportunities, and encouragement along the way. You’ll also be eligible for a 50% discount voucher for the exam *terms and conditions apply. Why should you earn this AWS Certification? Embrace new skills, show your potential, and plan a career trajecto

Install Apache Sqoop on Ubuntu 20.04 LTS (2021)

- August 13, 2021

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. In this post, lets see how to install Apache Sqoop on Ubuntu. Step 1: Download Apache Sqoop Download the Apache Sqoop binary file from official download page Apache Sqoop Note: As of 2021-06, Apache Sqoop project has been retired since there is no development after Sqoop version 1.4.7 Step 2: Untar the Sqoop binary file Untar and rename the Sqoop binary file using the below command tar -xvzf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop Step 3: Configure Sqoop Configure the Sqoop by executing the below command cd sqoop/conf mv sqoop-env-template.sh sqoop-env.sh

Install Apache Spark on Ubuntu 20.04 LTS (2021)

- August 13, 2021

Apache Spark™ is a unified analytics engine for large-scale data processing. Spark can run 100x faster than Hadoop's MapReduce in Memory and 10x faster in while processing the data in disk. Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale. In this post we will see how to install Apache Spark on Ubuntu 20.04. Before installing Spark, make sure you have Java and Python installed in your s

Install Hive on Ubuntu 20.04 with MySQL Integration (2021)

- August 13, 2021

The Apache Hive ™ is a data warehouse software that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Hive internally converts the SQL codes into MapReduce codes hence the developer doesn't have to learn Java programming and they can use SQL to query the data in HDFS. All Hive implementations require a metastore where the metadata of all the Hive objects will be stored. The metastore must be a JDBC compliant relational database, by default Hive uses the Apache Derby database as its metastore. Apache Derby is suitable only for where there is a small need of RDBMS in an application. In this post, we will see how we can install Hive with MySQL as its metastore. MySQL Installation Step 1: Installing MySQL server Update your Ubuntu package list by executing the below command

Install Hadoop on Ubuntu 20.04 (2021)

- August 12, 2021

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. Step 1: Installing Java Hadoop is written in Java so we need to install Java before installing Java as all the Hadoop daemons will be running as JVM processes. You can install OpenJDK 8 from the default apt repositories: sudo apt-get update sudo apt install openjdk-8-jdk Once the installation is completed, you can verify the installation b

Search This Blog

K2D Data Analytics

Posts

Featured post

Make an executable Bash Scripts

Download Learning Spark V2 from O'Reily for FREE!!! (2021)

Get AWS Certified: Solutions Architect Challenge - 50% Offer (Validity till - December 4, 2021) (Updated)

Install Apache Sqoop on Ubuntu 20.04 LTS (2021)

Install Apache Spark on Ubuntu 20.04 LTS (2021)

Install Hive on Ubuntu 20.04 with MySQL Integration (2021)

Install Hadoop on Ubuntu 20.04 (2021)