Posts

Showing posts with the label aws

TOP PYSPARK INTERVIEW QUESTION 2023

What is Apache Spark and how does it differ from Hadoop? What are the benefits of using Spark over MapReduce? What is a Spark RDD and what operations can be performed on it? How does Spark handle fault-tolerance and data consistency? Explain the difference between Spark transformations and actions. What is a Spark DataFrame and how is it different from an RDD? What is Spark SQL and how does it work? How can you optimize a Spark job to improve its performance? How does Spark handle memory management and garbage collection? Explain the role of Spark Driver and Executors. What is PySpark and how does it differ from Apache Spark? How do you create a SparkContext in PySpark? What is the purpose of SparkContext? What is RDD (Resilient Distributed Dataset)? How is it different from DataFrame and Dataset? What are the different ways to create RDD in PySpark? What is the use of persist() method in PySpark? How does it differ from cache() method? What is the use of broadcast variables in PySpark

TOP 50 AWS Glue Interview Questions

Image
  What is AWS Glue? AWS Glue helps in preparing data for Analysis by automated extract, transforming, and loading ETL processes. It supports MySQL, Microsoft SQL Server, PostgreSQL Databases which runs on Amazon EC2(Elastic Compute Cloud) Instances in an Amazon VPC(Virtual Private Cloud). AWS Glue is an extracted, loaded, transformed service which helps in automating time-consuming steps of Data Preparation for the analytics. What are the Benefits of AWS Glue? Benefits of AWS Glue are as follows: Fault Tolerance - AWS Glue is retrievable and the logs can be debugged. Filtering - AWS Glue uses filtering for bad data. Maintenance and Development - AWS Glue uses maintenance and deployment as the service is managed by AWS. What are the components used by AWS Glue? AWS Glue consists of: Data Catalog is a Central Metadata Repository. ETL Engine helps in generating Python and Scala Code. Flexible Scheduler helps in handling Dependency Resolution, Jo

How to build a CI/CD Pipeline in AWS using CodeCommit, CodeDeploy, CodePipeline: Hands-on!

 This tutorial helps to continuous integration and continuous deployment of your application from the local system to your QA or Staging or Production server. So please follow the below steps to get it done. Step-1: Install GIT and configure in your local system Go to git-scm.com and download it to your system then install it. Then after you need to configure the GIT on the local system using below command. $ git config - -global user.name “Trilochan Parida” $ git config - -global user.email “tri***@gmail.com” Step-2: Create CodeCommit repository (   YouTube   ) Go to CodeCommit service in the AWS console. Create a new repository with your project name. Copy the clone URL → Clone HTTPS. Before clone to your local system, you need the credentials to access this repository, so let’s create that first. Step-3: Create a new AWS IAM user and generate GIT credentials ( YouTube ) Add the user and create a group. Attach IAM policy “AWSCodeCommitFullAccess” and “AWSCodePipelineFullAccess”. Secu

AWS 2nd Part(Top 200 AWS Interview Questions & Answers )

Image
  Q1) What is AWS? Answer: AWS stands for Amazon Web Services. AWS is a platform that provides on-demand resources for hosting web services, storage, networking, databases and other resources over the internet with a pay-as-you-go pricing. Q2)  What are the components of AWS? Answer:EC2 – Elastic Compute Cloud, S3 – Simple Storage Service, Route53, EBS – Elastic Block Store, Cloudwatch, Key-Paris are few of the components of AWS. Q3)  What are key-pairs? Answer:Key-pairs are secure login information for your instances/virtual machines. To connect to the instances we use key-pairs that contain a public-key and private-key. Q4)  What is S3? Answer:S3 stands for Simple Storage Service. It is a storage service that provides an interface that you can use to store any amount of data, at any time, from anywhere in the world. With S3 you pay only for what you use and the payment model is pay-as-you-go. Q5)  What are the pricing models for EC2instances? A

Popular posts from this blog

Spark SQL “case when” and “when otherwise”

Top Hive Commands with Examples

SPARK : Ways to Rename column on Spark DataFrame