Posts

Showing posts with the label Hive

TOP PYSPARK INTERVIEW QUESTION 2023

What is Apache Spark and how does it differ from Hadoop? What are the benefits of using Spark over MapReduce? What is a Spark RDD and what operations can be performed on it? How does Spark handle fault-tolerance and data consistency? Explain the difference between Spark transformations and actions. What is a Spark DataFrame and how is it different from an RDD? What is Spark SQL and how does it work? How can you optimize a Spark job to improve its performance? How does Spark handle memory management and garbage collection? Explain the role of Spark Driver and Executors. What is PySpark and how does it differ from Apache Spark? How do you create a SparkContext in PySpark? What is the purpose of SparkContext? What is RDD (Resilient Distributed Dataset)? How is it different from DataFrame and Dataset? What are the different ways to create RDD in PySpark? What is the use of persist() method in PySpark? How does it differ from cache() method? What is the use of broadcast variables in PySpark

Top 70 + Hadoop Interview Questions and Answers: Sqoop, Hive, HDFS and more

Image
  HDFS Interview Questions - HDFS 1. What are the different vendor-specific distributions of Hadoop? The different vendor-specific distributions of Hadoop are Cloudera, MAPR, Amazon EMR, Microsoft Azure, IBM InfoSphere, and Hortonworks (Cloudera). 2. What are the different Hadoop configuration files? The different Hadoop configuration files include: hadoop-env.sh mapred-site.xml core-site.xml yarn-site.xml hdfs-site.xml Master and Slaves 3. What are the three modes in which Hadoop can run? The three modes in which Hadoop can run are : Standalone mode : This is the default mode. It uses the local FileSystem and a single Java process to run the Hadoop services. Pseudo-distributed mode : This uses a single-node Hadoop deployment to execute all Hadoop services. Fully-distributed mode : This uses separate nodes to run Hadoop master and slave services. 4. What are the differences between regular FileSystem and HDFS? Regular FileSystem : In regular FileSystem, data is mainta

Hive failed renaming table with error "New location for this table already exist" ?

  How to rename a hive table without changing location? If you got this type error :  Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. New location for this table dbname.tablename already exists : hdfs://hanameservice/user/hive/warehouse/dbname.db/table name: Ans : Rename Table ALTER TABLE table_name RENAME TO new_table_name;  we can do that. You just need to follow below three commands in sequence. Lets say you have a external table test_1 in hive. And you want to rename it test_2 which should point test_2 location not test_1. Then you need to convert this table into Managed table using below command. test_1 -> pointing to test_1 location ALTER TABLE db_name.test_1 SET TBLPROPERTIES('EXTERNAL'='FALSE'); Rename the table name. ALTER TABLE db_name.test_1 RENAME TO db_name.test_2; Again convert the managed table after renaming to external table. ALTER TABLE db_name.test_2 SET TBLPROPERTIES('EXTERNAL'

Top Hive Commands with Examples

Image
      In this blog post, let’s discuss top Hive commands with examples: If you are familiar with SQL, it’s a cakewalk. Many users can simultaneously query the data using Hive-QL. What is HQL? Hive defines a simple SQL-like query language to querying and managing large datasets called Hive-QL ( HQL ). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce framework to perform more sophisticated analysis. Components of Hive: Metastore : Hive stores the schema of the Hive tables in a Hive Metastore. Metastore is used to hold all the information about the tables and partitions that are in the warehouse. By default, the metastore is run in the same process as the Hive service and the default Metastore is DerBy Database. SerDe : Serializer, Deserializer gives instructions to hive on how to process a record. Hive Commands : Data Definition Language (DDL ) DDL statements are used to

Popular posts from this blog

Spark SQL “case when” and “when otherwise”

Top Hive Commands with Examples

SPARK : Ways to Rename column on Spark DataFrame