Monday, September 29, 2014

Database Design & Performance Tuning interview questions and answers

1. What is Normalization?

Keyword:
reduce redundancy, divide large tables to smaller tables, create relationships between tables.

Answer:
Normalization is the process of organizing the fields and tables of a relational database to minimize redundancy. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.

The objectives of normalization are:
a. Free the database of modification anomalies;
b. Minimize redesign when extending the database structure;
c. Make the data model more informative to users;
d. Avoid bias towards any particular pattern of querying.


2. What is First Normal Form?

Keyword:
Each column contains only atomic values

Answer:
First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.

Let's assume we have the following employee table:
employee_name        telephone
Jack                         123-456-7890
Tom                         123-321-5678, 123-321-5679

In case of an employee has more than one telephone number, the column "telephone" contains more than one value. This design violates 1NF.
To make it comply with 1NF, we can change the employee table as follow:
employee_name        telephone
Jack                         123-456-7890
Tom                         123-321-5678
Tom                         123-321-5679

3. What is Second Normal Form?

Keyword:
1NF + each column depends on the whole candidate key

Answer:
Second normal form (2NF) is normal form higher than 1NF. A table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of a candidate key.

Let's assume we have the following project table containing projects and their participants:
project_id    employee_id   project_name
1             1             Database upgrade
1             3             Database upgrade
2             4             Email System Setup
2             7             Email System Setup
2             9             Email System Setup

This table has a composite primary key [project_id, employee_id]. The non-key attribute is [project_name].
In this case, [project_name] only depends on [project_id], which is only part of the primary key. So this design violates 2NF.

To make it comply with 2NF, we can split the table into two tables:
project table:
project_id    project_name
1             Database upgrade
2             Email System Setup

project_participant table:
project_id    participant
1             1
1             3
2             4
2             7
2             9

4. How to store tree structure in a relational database?

Keyword:
Adjacency list
Materialized Path
Nested Set

Answer:
Generally, there are three ways to store tree structure in the database:
a. Adjacency list
Each record holds a reference (parent_id) to the parent as a foreign key.
For example:
node_id    node_value    parent_id
1               a                   0
2               b                   1
3               c                   1
4               d                   2

b. Materialized Path
Each record stores the full path from the root.
For example:
node_id     node_value     path
1                a                    1
2                b                    1/2
3                c                     1/3
4                d                    1/2/4

c. Nested Set
Number the records according to a tree traversal, which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes: left_id, right_id.
node_id    node_value    left_id    right_id
1               a                   1            8
2               b                   2            5
3               c                   6            7
4               d                   3            4

5. Give some examples of sql injection.

Answer:
Example 1:  SQL Injection Based on 1=1 is Always True
Let's say we have the following code snippet to build a SQL query to check username and password:
String userName = getUserInput("userName");
String password = getUserInput("password");
String sql = "SELECT * FROM user WHERE user_name= '" + userName + "' AND password =''' + password +"'";
If the user input userName: "a" and password: "a OR 1=1", then the sql query becomes:
SELECT * FROM user WHERE user_name= 'a' AND password= 'a' OR 1=1
Since 1=1 is always true, it will return all rows from the user table.

Example 2: SQL Injection Based on Batched SQL Statements
Let's say we have the following code snippet to get a user:
String userId = getUserInput("userId");
String sql = "SELECT * FROM user WHERE user_id=" + userId;
If the user input the string: "1; DROP table user",  then the sql query becomes:
SELECT * FROM user WHERE user_id=1; DROP table user

6. How to prevent SQL injection attacks?

Answer:
a. Reduce the attack surface. Ensure that all excess database privileges are revoked and that only those routines that are intended for end-user access are exposed. Though this does not entirely eliminate SQL injection vulnerabilities, it mitigates the impact of the attacks.

b. Avoid dynamic SQL with concatenated input. Dynamic SQL built with concatenated input values presents the easiest entry point for SQL injections. Avoid constructing dynamic SQL this way.

c. Use Parameterized statements. Parameterized statements, like JDBC PreparedStatement, eliminate the possibility of SQL injections and enhance performance.

d. Filter and sanitize input. Assume all user-submitted data is evil and validate and sanitize everything.

7. What is query execution plan?

Keyword:
an ordered set of steps used to access data,
SQL Query Analyzer, EXPLAIN PLAN statement

Answer:
A query execution plan is an ordered set of steps used to access data in a RDBMS. Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, with widely varying performance. When a query is submitted to the database, the query optimizer evaluates some of the different, correct possible plans for executing the query and returns what it considers the best alternative. Because query optimizers are imperfect, database users and administrators sometimes need to manually examine and tune the plans produced by the optimizer to get better performance.

A given database management system may offer one or more mechanisms for returning the plan for a given query. Some packages feature tools which will generate a graphical representation of a query plan, such as SQL Query Analyzer in SQL Server. Another mechanism for retrieving the query plan involves querying a virtual database table after executing the query to be examined, such as the EXPLAIN PLAN statement in Oracle.

8. Explain the general steps of database performance tuning.

Keyword:
Business Rules -> Data Design -> Application Design -> DB Logical Structure -> SQL Statements -> Query Plan -> Memory Allocation -> I/O and Storage -> Resource Contention -> OS

Answer:
Step 1. Tune the Business Rules
For optimal performance, you may need to adapt business rules. These concern the high-level analysis and design of an entire system.

Step 2. Tune the Data Design
In the data design phase, you need to determine what data is needed by your applications and consider what relations are important, and what their attributes are, and how to structure the information to best meet performance goals. For example, in this step, you need to determine the primary and foreign key indexes.

Step 3. Tune the Application Design
Business executives and application designers should translate business goals into an effective system design. Business processes concern a particular application within a system, or a particular part of an application.

Step 4. Tune the Logical Structure of the Database
This step primarily concerns fine-tuning the index design to ensure that the data is neither over- nor under-indexed. For example, you may need to create additional indexes to support the application.

Step 5. Tune Database Operations
In this step, you may need to tune your SQL statements to take full advantage of the SQL language and RDBMS specific features designed to enhance application processing.

Step 6. Tune the Access Paths
In this step, you may need to use query execution plan tools to analyze your SQL statements to determine the best query plan.

Step 7. Tune Memory Allocation
In this step, you may need to detect database memory allocation issues and tune memory allocation strategies.

Step 8: Tune I/O and Physical Structure
Tuning I/O and physical structure involves distributing data, storing data in data blocks for best access, creating extents large enough for your data, etc.

Step 9: Tune Resource Contention
Take care to reduce the following types of contention: Block contention, Shared pool contention, Lock contention, Latch contention, etc.

Step 10: Tune the Underlying Platform
In the last step, you may need to tune the underlying OS performance.

More Database Design & Performance Tuning interview questions and answers: SQL Interview Notes

  • What is Denormalization?
  • What is Third Normal Form?
  • What is Referential Integrity?
  • How to implement one-to-one, one-to-many and many-to-many relationships while designing tables?
  • Give some examples of optimize SQL query statements?
  • What is full table scan? When to use it and how to avoid it?
  • Give some tips on creating proper index?
  • Explain general guidelines that determine when to use clustered index and non-clustered index?
  • What is the advantage and disadvantage of storing binary data in database?
  • What is Database Partitioning?
  • ......

SQL Interview Notes

100+ frequently asked SQL and Database interview questions with concise summaries and detailed answers.  
Topics include: SQL Basic Concepts, SQL DDL & DML, Advanced SQL, Database Design and Performance Tuning.  




JavaScript Interview Notes

100+ frequently asked JavaScript interview questions with concise summaries and detailed answers. Topics include: JavaScript Basics, DOM, BOM, Object-Oriented JavaScript, Function, Scope, Closure, JSON, XML, Ajax, jQuery. 
Download on the AppStore    Get it On Google Play


Java Interview Notes

300+ frequently asked Java interview questions with concise summaries and detailed answers. Topics include: Java & OOP, Strings & Collections, IO JVM & GC, Multithreading, Generics Reflection & Annotations, Design Patterns, Java EE, Spring, JPA & Hibernate.

Advanced SQL interview questions and answers

1. What is View?

Keyword:
virtual table, does not contain actual data.
contains columns and rows from one or more tables.

Answer:
A View is a virtual table that contains columns and rows from one or more tables. A view does not contain actual data, it is a set of queries that are applied to one or more tables that is stored within the database as an object. After creating a view from some table(s), it used as a reference of those tables and when executed, it shows only those data which are already mentioned in the query during the creation of the View.

2. How to set auto increment column in SQL?

Answer:
In SQL Server, we use the IDENTITY keyword:
CREATE TABLE employee (
    employee_id int IDENTITY(1,1) PRIMARY KEY,
    first_name varchar(64) NOT NULL,
    last_name varchar(64)
);

In MySQL, we use the AUTO_INCREMENT keyword:
CREATE TABLE employee (
    employee_id int NOT NULL AUTO_INCREMENT,
    first_name varchar(64) NOT NULL,
    last_name varchar(64),
    PRIMARY KEY(employee_id)
);

In Oracle, we need to create a sequence and then use it in DML:
CREATE SEQUENCE seq_employee MINVALUE 1 START WITH 1 INCREMENT BY 1 CACHE 20;

INSERT INTO employee(employee_id, first_name, last_name)
VALUES (seq_employee.NEXTVAL,'Jack','Smith’);


3. What are the advantages and disadvantages of using stored procedure?

Keyword:
Reduce server overhead, Reduce network traffic and latency, Encapsulate business logic, Delegate access-rights, Protect against SQL injection attacks
vendor-specific, lack of tool support, track versions.

Answer:
Advantages:
a. Reduce server overhead. Stored procedure statements are stored directly in the database, which may remove all or part of the compilation overhead. Stored procedure execution plans also can be reused and cached.
b. Reduce network traffic and latency. Stored procedures run directly within the database engine and have direct access to the data being accessed, which can reduce network communication costs.
c. Encapsulate business logic. Stored procedures allow programmers to embed business logic as an API in the database, which can simplify data management and reduce the need to encode the logic elsewhere in client programs. The database system can ensure data integrity and consistency with the help of stored procedures.
d. Delegate access-rights. Stored procedures can be granted access rights to the database that users who execute those procedures do not directly have.
e. Protect against SQL injection attacks. Stored procedure parameters will be treated as data even if an attacker inserts SQL commands. Also, some DBMSs will check the parameter's type.

Disadvantages:
a. Stored procedure languages are quite often vendor-specific. Switching to another vendor's database most likely requires rewriting any existing stored procedures.
b. Tool support for writing and debugging stored procedures is often not as good as for other programming languages.
c. Changes to stored procedures are more difficult to keep track of within a version control system than other code. Changes must be reproduced as scripts to be stored in the project history to be included, and differences in procedures can be more difficult to merge and track correctly.


4.  Explain the general steps of using Cursor in SQL.

Answer:
To use cursors in SQL, there are the following steps:
a. Declare a cursor
DECLARE cursor_employee CURSOR FOR SELECT employee_id, first_name, last_name FROM employee;

b. Open the cursor
OPEN cursor_employee;

c. Fetch the data into local variables as needed from the cursor, one row at a time
FETCH cursor_employee INTO ......;

d. Close the cursor
CLOSE cursor_employee;

5. What is Trigger?

Keyword:
auto execute in response to certain events: INSERT, DELETE or UPDATE

Answer:
A trigger is procedural code that is automatically executed in response to certain events (such as INSERT, DELETE or UPDATE) on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database.

For example, when a new record is added to the employee table, new records should also be created in the tables of the taxes, vacations and salaries.

6. What are the differences between Triggers and Stored Procedures?

Answer:
a. Stored procedures can be explicitly called and executed, while triggers can only be executed when an event (insert, delete, and update) is fired on the table on which the trigger is defined.
b. Stored procedures can be scheduled through a job, while triggers cannot be scheduled.
c. Stored procedure can take input parameters, while triggers cannot.
d. Stored procedures can return values, while triggers annot return a value.
e. Transaction statements like begin transaction, commit transaction, and rollback can be used inside stored procedures, but cannot be used inside triggers.

7. What is ACID?

Keyword:
Atomicity, Consistency, Isolation, Durability

Answer:
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.

Atomicity:
A transaction must be an atomic unit of work; either all of its data modifications are performed, or none of them is performed.

Consistency:
When completed, a transaction must leave all data in a consistent state. For example, all rules must be applied to the transaction's modifications to maintain all data integrity.

Isolation:
Modifications made by concurrent transactions must be isolated from the modifications made by any other concurrent transactions. A transaction either recognizes data in the state it was in before another concurrent transaction modified it, or it recognizes the data after the second transaction has completed, but it does not recognize an intermediate state.

Durability:
After a transaction has completed, its effects are permanently in place in the system. The modifications persist even in the event of a system failure.

8. What is Two-Phase Commit?

Keyword:
ensure the integrity of data in a distributed transaction
commit-request phase, commit phase

Answer:
Two-Phase Commit (2PC) is designed to ensure the integrity of data in a distributed transaction.
Two-Phase Commit mechanism consists of two phases:
a. The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion).
b. The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources and their respective portions.

More Advanced SQL interview questions and answers: SQL Interview Notes

  • How to create a View from multiple tables?
  • What are the advantages and disadvantages of views?
  • What is Sequence in SQL?
  • What is Cursor in SQL?
  • What is the difference between Stored Procedure and Function in SQL?
  • Explain some ways to optimize stored procedures.
  • How to invoke a trigger on demand?
  • What is a Transaction in SQL?
  • What is Lock in SQL?
  • What is Isolation Level in SQL?
  • ......

SQL Interview Notes

100+ frequently asked SQL and Database interview questions with concise summaries and detailed answers.  
Topics include: SQL Basic Concepts, SQL DDL & DML, Advanced SQL, Database Design and Performance Tuning.  




JavaScript Interview Notes

100+ frequently asked JavaScript interview questions with concise summaries and detailed answers. Topics include: JavaScript Basics, DOM, BOM, Object-Oriented JavaScript, Function, Scope, Closure, JSON, XML, Ajax, jQuery. 
Download on the AppStore    Get it On Google Play


Java Interview Notes

300+ frequently asked Java interview questions with concise summaries and detailed answers. Topics include: Java & OOP, Strings & Collections, IO JVM & GC, Multithreading, Generics Reflection & Annotations, Design Patterns, Java EE, Spring, JPA & Hibernate.

SQL DDL & DML interview questions and answers

1. What is the difference between TRUNCATE, DELETE and DROP?

Keyword:
DELETE removes some or all rows based on WHERE clause, can ROLLBACK;
TRUNCATE removes all rows, cannot ROLLBACK;
DROP removes a table, cannot ROLLBACK.

Answer:
DELETE is a DML statement used to remove some or all rows from a table. A WHERE clause can be used to only remove some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it. Delete operation will cause all DELETE triggers on the table to fire.

TRUNCATE is a DDL statement used to remove all rows from a table. TRUNCATE operation cannot be rolled back and no triggers will be fired, so it is faster and doesn't use as much undo space as a DELETE.

DROP is a DDL statement used to remove a table from the database. All the tables' rows, indexes and privileges will also be removed. No DML triggers will be fired. DROP operation cannot be rolled back.


2. What is the difference between WHERE clause and HAVING clause?

Keyword:
HAVING specifies a search condition for an aggregate, used after a GROUP BY.

Answer:
WHERE clause is used to specify a search condition for the rows returned. WHERE can be used with SELECT, UPDATE and DELETE. WHERE is used before a GROUP BY clause.
HAVING clause is used to specify a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. HAVING is typically used after a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.

An example of using WHERE without GROUP BY: get all departments whose department id is greater than 2
SELECT department_id, department_name
FROM departments
WHERE department_id > 2

An example of using HAVING and aggregation function: get all departments with sales greater than $1000
SELECT department, SUM(sales)
FROM orders
GROUP BY department
HAVING SUM(sales) > 1000;

3. How to find all employees containing the word "Tom", regardless of whether it was TOM, Tom or tom?

Answer:
Use LIKE operator and UPPER() function to build the WHERE condition:
SELECT * FROM employee WHERE UPPER(employee_name) LIKE '%TOM%'


4. How to find the highest salary in each department from employee table?

Keyword:
MAX, GROUP BY

Answer:
Use MAX function and GROUP BY clause:
SELECT department_id, MAX(salary) AS max_salary FROM employee GROUP BY department_id;


5. How to select TOP n records from a table?

Answer:
In SQL Server, use SELECT TOP N clause:
SELECT TOP n * FROM employee;

In Oracle, use RUWNUM pseudo-column:
SELECT * FROM employee WHERE ROWNUM <= n;

In MySQL / PostgreSQL, use LIMIT N clause:
SELECT * FROM employee LIMIT n;


6. How to convert data types in SQL?

Keyword:
CAST()
CONVERT()

Answer:
To convert an expression of one data type to another, we can use CAST() or CONVERT() function.
The syntax for CAST:
CAST(expression AS data_type [(length)])
The syntax for CONVERT:
CONVERT(data_type [(length)], expression[, style])

For example, the following query find the records that have a 3 in the first digit of their price:
SELECT product_name, price
FROM product
WHERE CAST(price AS int) LIKE '3%';

or:
SELECT product_name, price
FROM product
WHERE CONVERT(int, price) LIKE '3%';

7. How to get department information and department total salary from table employee and department where total salary greater than 10,000?

Keyword:
INNER JOIN + GROUP BY + HAVING

Answer:
Let's assume employee and department table structures are as follow:
employee table: employee_id, first_name, last_name, salary, department_id
department table: department_id, department_name

To get the result, use SUM() function, INNER JOIN, GROUP BY clause and HAVING clause:
SELECT e.department_id, d.department_name, SUM(e.salary)
FROM employee e INNER JOIN department d ON e.department_id = d.department_id
GROUP BY e.department_id
HAVING SUM(e.salary) > 10000;

8. How to add and remove columns in an existing table?

Keyword:
ALTER TABLE tablename ADD COLUMN / DROP COLUMN

Answer:
Use the ALTER TABLE to add and remove columns in an existing table.

For example, to add a new column "department_id" with default value "1" to employee table:
ALTER TABLE employee ADD COLUMN department_id INTEGER DEFAULT 1 NOT NULL;
To remove the column "department_id" from employee table:
ALTER TABLE employee DROP COLUMN department_id;

9. How to find the nth highest record in a table?

Answer:
Let's assume we have an employee table and we need to find the nth highest salary from this table.

In SQL Server, use subquery + DISTINCT + TOP:
SELECT TOP 1 salary FROM (SELECT DISTINCT TOP n salary FROM employee ORDER BY salary DESC) e ORDER BY salary;

In Oracle, use ROW_NUMBER() function:
SELECT salary FROM (SELECT e.salary, row_number() OVER (ORDER BY salary DESC) rn FROM employee e) WHERE rn = n;

In MySQL / PostgreSQL, use LIMIT clause:
SELECT salary FROM employee ORDER BY salary DESC LIMIT n - 1, 1;

10. How to copy data from one table to another table?

Keyword:
INSERT INTO SELECT
SELECT INTO

Answer:
INSERT INTO SELECT statement is used to copy data from one table to an existing table.

For example:
INSERT INTO employee_backup SELECT * FROM employee;
INSERT INTO user (user_name, address) SELECT name, address FROM employee;

SELECT INTO statement is used to copy data from one table to a new table.
For example:
SELECT * INTO employee_backup FROM employee;
SELECT employee_id, employee_name INTO employee_backup FROM employee WHERE department_id = 1;

More SQL DDL & DML interview questions and answers: SQL Interview Notes

  • What is the difference between EXSITS and IN?
  • How to get the current date and time in SQL?
  • How to combine two columns into one column in a SQL query?
  • How to get department information and department total salary from table employee and department?
  • How to find duplicate records in a table?
  • How to create FOREIGN KEY Constraint on a table?
  • How to create an index on a table?
  • How to use Subquery with EXISTS?
  • ......

SQL Interview Notes

100+ frequently asked SQL and Database interview questions with concise summaries and detailed answers.  
Topics include: SQL Basic Concepts, SQL DDL & DML, Advanced SQL, Database Design and Performance Tuning.  




JavaScript Interview Notes

100+ frequently asked JavaScript interview questions with concise summaries and detailed answers. Topics include: JavaScript Basics, DOM, BOM, Object-Oriented JavaScript, Function, Scope, Closure, JSON, XML, Ajax, jQuery. 
Download on the AppStore    Get it On Google Play


Java Interview Notes

300+ frequently asked Java interview questions with concise summaries and detailed answers. Topics include: Java & OOP, Strings & Collections, IO JVM & GC, Multithreading, Generics Reflection & Annotations, Design Patterns, Java EE, Spring, JPA & Hibernate.