TOP PYSPARK INTERVIEW QUESTION 2023

What is Apache Spark and how does it differ from Hadoop? What are the benefits of using Spark over MapReduce? What is a Spark RDD and what operations can be performed on it? How does Spark handle fault-tolerance and data consistency? Explain the difference between Spark transformations and actions. What is a Spark DataFrame and how is it different from an RDD? What is Spark SQL and how does it work? How can you optimize a Spark job to improve its performance? How does Spark handle memory management and garbage collection? Explain the role of Spark Driver and Executors. What is PySpark and how does it differ from Apache Spark? How do you create a SparkContext in PySpark? What is the purpose of SparkContext? What is RDD (Resilient Distributed Dataset)? How is it different from DataFrame and Dataset? What are the different ways to create RDD in PySpark? What is the use of persist() method in PySpark? How does it differ from cache() method? What is the use of broadcast variables in PySpark...

How to build a CI/CD Pipeline in AWS using CodeCommit, CodeDeploy, CodePipeline: Hands-on!

 This tutorial helps to continuous integration and continuous deployment of your application from the local system to your QA or Staging or Production server. So please follow the below steps to get it done.

Step-1: Install GIT and configure in your local system

Go to git-scm.com and download it to your system then install it. Then after you need to configure the GIT on the local system using below command.

$ git config - -global user.name “Trilochan Parida”
$ git config - -global user.email “tri***@gmail.com”

Step-2: Create CodeCommit repository ( YouTube )

  1. Go to CodeCommit service in the AWS console.
  2. Create a new repository with your project name.
  3. Copy the clone URL → Clone HTTPS.

Before clone to your local system, you need the credentials to access this repository, so let’s create that first.

Step-3: Create a new AWS IAM user and generate GIT credentials (YouTube)

  1. Add the user and create a group.
  2. Attach IAM policy “AWSCodeCommitFullAccess” and “AWSCodePipelineFullAccess”.
  3. Security Credentials → HTTPS GIT Credentials → Generate.

Step-4: GIT Clone (YouTube)

Using the above credentials, now we can clone the repository from AWS CodeCommit to the local system. And the command to clone is

$ git clone <URL>

As we are in AWS IAM service console so let’s create two Roles for future use in CodeDeploy and EC2 to access S3 artifacts.

Step-5: Create service Role for EC2 and CodeDeploy ( YouTube )

  1. Create an IAM role for CodeDeploy service and attach existing IAM policy “AWSCodeDeployRole”
  2. Create an IAM role for EC2 service and attach existing IAM policy “AmazonS3ReadOnlyAccess”

Step-6: Code push and understanding of appspec.yml ( YouTube )

Now we will put our application to that same folder which cloned from AWS. And here you have remembered to create an appspec.yml file to automate the code deployment. Here I will recommend watching my video to better understanding.

appspec.yml

version: 0.0
os: linux
files:
  - source: /
    destination: /var/www/html/

Step-7: Set up an EC2 instance, install Nginx, and install CodeDeploy agent ( YouTube )

  1. Launch EC2 instance with default VPC, instance type would be t2.micro, select IAM Role that we had created in step-5 and add Tags with Key and value which will be used in CodeDeploy.
  2. Install Nginx
$ apt update
$ apt install nginx
  1. Install CodeDeploy agent
$ apt install ruby
$ apt install wget
$ wget https://aws-codedeploy-ap-south-1.s3.ap-south-1.amazonaws.com/latest/install //For Mumbai Region
$ chmod +x ./install
$./ install auto

Now you need to code deploy agent service for that use below command

$ service codedeploy-agent start

Step-8: Create an application in CodeDeploy ( YouTube )

  1. Go to CodeDeploy service console
  2. Create a new application with your application name and select compute platform to EC2/On-Premises
  3. Create deployment groups with name and deployment type is “in-place” and select environment configuration as Amazon EC2 instance with Key=Name and Value= “tag value of EC2”

Step-9: Create CodePipeline ( YouTube )

  1. Go to AWS CodePipeline service console
  2. Create a pipeline with a name and keep the default set up.
  3. Add Source Stage → Source provider: AWS CodeCommit → Select your repository → Select your branch → Detection option: CloudWatch Event
  4. Add Build Stage → Skip
  5. Add Deploy Stage → Deploy Provider: AWS CodeDeploy → Enter application name → EnterDeployment Group

Step-10: Test your application and watch the complete video

Go to your browser and type your IP. Congratulations!!! You have successfully deployed your code.

Please share and subscribe .

Comments

Post a Comment

Popular posts from this blog

Spark SQL “case when” and “when otherwise”

Top Hive Commands with Examples

SPARK : Ways to Rename column on Spark DataFrame