Ready to face interview for Database System Concepts ? Do not worry, we are here to help you with job interview preparation. If you are preparing Database System Concepts interview and not sure which questions are likely asked in interview, we suggest you to go through Wisdomjobs interview questions and answers page to crack your job interview. Database System Concepts is primarily targeted to the people who are interested in learning the Database concepts. It has all the basic functionalities and queries that can be performed on a database. Strong technical skills are needed as there is huge competition. Below is the list of frequently asked Database System Concepts interview questions and answers which gets you ready to face the interviews:
Question 1. List Four Significant Differences Between A File-processing System And A Dbms?
Answer :
Some main differences between a database management system and a file-processing system are:
Answer :
Two disadvantages associated with database systems are listed below.
Question 3. Explain The Difference Between Physical And Logical Data Independence?
Answer :
Question 4. List Five Responsibilities Of A Database Management System?
Answer :
A general purpose database manager (DBM) has five responsibilities:
Question 5. What Are Five Main Functions Of A Database Administrator?
Answer :
Five main functions of a database administrator are:
Answer :
Programming language classification:
Note: Lisp and Prolog support some procedural constructs, but the core of both these languages is non-procedural. In theory, non-procedural languages are easier to learn, because they let the programmer concentrate on what needs to be done, rather than how to do it. This is not always true in practice, especially if procedural languages are learned first.
Answer :
Six major steps in setting up a database for a particular enterprise are:
Answer :
Let tgrid be a two-dimensional integer array of size n × m.
Question 9. Explain The Distinctions Among The Terms Primary Key, Candidate Key, And Super Key?
Answer :
A super key is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity in the entity set. A super key may contain extraneous attributes. If K is a super key, then so is any super set of K. A super key for which no proper subset is also a super key is called a candidate key. It is possible that several distinct sets of attributes could serve as candidate keys. The primary key is one of the candidate keys that is chosen by the database designer as the principal means of identifying entities within an entity set.
Answer :
Answer :
We have weak entities for several reasons:
Answer :
Answer :
The primary key of a weak entity set can be inferred from its relationship with the strong entity set. If we add primary key attributes to the weak entity set, they will be present in both the entity set and the relationship set and they have to be the same. Hence there will be redundancy.
Answer :
In a generalization–specialization hierarchy, it must be possible to decide which entities are members of which lower level entity sets. In a condition defined design constraint, membership in the lower level entity-sets is evaluated on the basis of whether or not an entity satisfies an explicit condition or predicate.User-defined lower-level entity sets are not constrained by a membership condition; rather, entities are assigned to a given entity set by the database user.
Condition-defined constraints alone can be automatically handled by the system. Whenever any tuple is inserted into the database, its membership in the various lower level entity-sets can be automatically decided by evaluating the respective membership predicates. Similarly when a tuple is updated, its membership in the various entity sets can be re-evaluated automatically.
Question 15. Explain The Distinction Between Total And Partial Constraints?
Answer :
In a total design constraint, each higher-level entity must belong to a lower-level entity set. The same need not be true in a partial design constraint. For instance, some employees may belong to no work-team.
Answer :
Underlined attributes indicate the primary key.
student (student-id, name, program)
course (courseno, title, syllabus, credits)
course-offering (courseno, secno, year, semester, time, room)
instructor (instructor-id, name, dept, title)
enrols (student-id, courseno, secno, semester, year, grade)
teaches (courseno, secno, semester, year, instructor-id)
requires (maincourse, prerequisite)
Question 17. List Two Reasons Why We May Choose To Define A View?
Answer :
Question 18. List Two Major Problems With Processing Update Operations Expressed In Terms Of Views?
Answer :
Views present significant problems if updates are expressed with them. The difficulty is that a modification to the database expressed in terms of a view must be translated to a modification to the actual relations in the logical model of the database.
Question 19. List Two Reasons Why Null Values Might Be Introduced Into The Database?
Answer :
Nulls may be introduced into the database because the actual value is either unknown or does not exist. For example, an employee whose address has changed and whose new address is not yet known should be retained with a null address. If employee tuples have a composite attribute dependents, and a particular employee has no dependents, then that tuple’s dependents attribute should be given a null value.
Question 20. Show That, In Sql, <> All Is Identical To Not In?
Answer :
Let the set S denote the result of an SQL subquery. We compare (x <> all S) with (x not in S). If a particular value x1 satisfies (x1 <> all S) then for all elements y of S x1 = y. Thus x1 is not a member of S andmust satisfy (x1 not in S). Similarly, suppose there is a particular value x2 which satisfies (x2 not in S). It cannot be equal to any element w belonging to S, and hence (x2 <> all S)
will be satisfied. Therefore the two expressions are equivalent.
Answer :
create view salinfo as select manager-name, avg(salary) from manages m, works w where m.employee-name = w.employee-name group by manager-name
Updates should not be allowed in this view because there is no way to determine how to change the underlying data. For example, suppose the request is “change the average salary of employees working for Smith to $200”. Should everybody who works for Smith have their salary changed to $200? Or should the first (or more, if necessary) employee found who works for Smith have their salary adjusted so that the average is $200?Neither approach really makes sense.
Answer :
We define triggers for each relation whose primary-key is referred to by the foreign-key of some other relation. The trigger would be activated whenever a tuple is deleted from the referred-to relation. The action performed by the trigger would be to visit all the referring relations, and delete all the tuples in them whose foreign-key attribute value is the same as the primary-key attribute value of the deleted tuple in the referred-to relation. These set of triggers will take care of the on delete cascade operation.
Answer :
The assertion-name is arbitrary. We have chosen the name Perry. Note that since the assertion applies only to the Perry ridge branch we must restrict attention to only the Perry ridge tuple of the branch relation rather than writing a constraint on the entire relation.
create assertion perry check (not exists (select * from branch where branch-name = ’Perryridge’ and assets = (select sum (amount) from loan where branch-name = ’Perryridge’)))
Answer :
create trigger check-delete-trigger after delete on account referencing old row as row for each row delete from depositor
where depositor.customer-name not in ( select customer-name from depositor where account-number <> orow.account-number ) end
Answer :
To insert (account-number, name) into the view deer-park we insert the tuple (Deer Park, account-number, null) into the account relation and the tuple (name, account-number) into the depositor relation.
Updates to the views no-debt and avg-bal present serious problems. If we insert into the no-debt view, the system must reject the insertion if the customer has a loan. The overhead of updating through this view is so high that most systems would disallow update. The avg-bal view cannot be updated since the result of an aggregate operation depends on several tuples, not just one.
Answer :
Usually, a well-designed view and security mechanism can avoid conflicts between ease of access and security. However, as the following example shows, the two purposes do conflict in case the mechanisms are not designed carefully.
Suppose we have a database of employee data and a user whose view involves employee data for employees earning less than $10,000. If this user inserts employee Jones, whose salary is $9,000, but accidentally enters $90,000, several existing database systems will accept this update as a valid update through a view. However, the user will be denied access to delete this erroneous tuple by the security mechanism.
Answer :
Index and resource authorization should be special categories to allow certain users to create relations (and the indices to operate on them) while preventing these time-consuming and schema-changing operations from being available to many users. Separating index and resource authorization allows a user to build an index on existing relations, say, for optimization purposes, but allows us to deny that user the right to create new relations.
Answer :
Database systems have special requirements which are typically more refined than most operating systems. For example, a single user may have different privileges on different files throughout the system, including changing indices and attributes which file systems typically don’t monitor. The advantage of using the operating system’s security mechanism is that it simplifies the database system and can be used for simple (read/write) security measures.
Question 29. What Are Two Advantages Of Encrypting Data Stored In The Database?
Answer :
Answer :
A scheme for storing passwords would be to encrypt each password, and then use a hash index on the user-id. The user-id can be used to easily access the encrypted password. The password being used in a login attempt is then encrypted and compared with the stored encryption of the correct password. An advantage of this scheme is that passwords are not stored in clear text and the code for decryption need not even exist!
Answer :
Question 32. Why Are Certain Functional Dependencies Called Trivial Functional Dependencies?
Answer :
Certain functional dependencies are called trivial functional dependencies because they are satisfied by all relations.
Question 33. In Designing A Relational Database, Why Might We Choose A Non-bcnf Design?
Answer :
BCNF is not always dependency preserving. Therefore, we may want to choose another normal form (specifically, 3NF) in order to make checking dependencies easier during updates. This would avoid joins to check dependencies and increase system performance.
Question 34. Explain Why 4nf Is A Normal Form More Desirable Than Bcnf?
Answer :
4NF is more desirable than BCNF because it reduces the repetition of information. If we consider a BCNF schema not in 4NF (see Exercise 7.28), we observe that decomposition into 4NF does not lose information provided that a loss less join decomposition is used, yet redundancy is reduced.
Question 35. Explain How Dangling Tuples May Arise. Explain Problems That They May Cause?
Answer :
Dangling tuples can arise when one tuple is inserted into a decomposed relation but no corresponding tuple is inserted into the other relations in the decomposition. They can cause incorrect values to be returned by queries which form the join of a decomposed relation since the dangling tuple might not be included. dangling tuples can be avoided by the specification of referential integrity constraints.
Answer :
Each of the applications includes large, specialized data items (e.g., a program module, a graphic image, digitized voice, a document). These data items have operations specific to them (e.g., compile, rotate, play, format) that cannot be expressed in relational query languages. These data items are of variable length making it impractical to store them in the short fields that are allowed in records for such database systems. Thus, the data model, data manipulation language, and data definition language need to be changed. Also, long-duration and nested transactions are typical of these applications. Changes to the concurrency and recovery subsystems are likely to be needed.
Answer :
An entity is simply a collection of variables or data items. An object is an encapsulation of data as well as the methods (code) to operate on the data. The data members of an object are directly visible only to its methods. The outside world can gain access to the object’s data only by passing pre-defined messages to it, and these messages are implemented by the methods.
Question 38. Explain Why Ambiguity Potentially Exists With Multiple Inheritance?
Answer :
A class inherits the variables and methods of all its immediate super classes. Thus it could inherit a variable or method of the same name from more than one super-class. When that particular variable or method of an object of the sub-class is referenced, there is an ambiguity regarding which of the super classes provides the inheritance.
For instance, let there be classes teacher and student, both having a variable department. If a class teaching Assistant inherits from both of these classes, any reference to the department variable of a teaching Assistant object is ambiguous.
Answer :
Tuple equality is determined by data values. Object identity is independent of data values, since object-oriented systems use built-in identity.
Answer :
An edge from class A to class B in the DAG representing inheritance means that an object of class B is also an object of class A. It has all the properties that objects of class A have, plus additional ones of its own. In particular, it inherits all the variables and methods of class A. It can of course provide its own implementations for the inherited methods. And edge from class A to class B in the object containment DAG means that an object of class A contains an object of class B. There need not be any similarities in the properties of A and B. Neither B nor A inherit anything from the other. They function as independent types, to the extent that an object of class A can access the variables of the B object contained in it only via the B object’s methods.
Answer :
Creation, destruction and access will typically be more time consuming and expensive for persistent objects stored in the database, than for transient objects in the transaction’s local memory. This is because of the over-heads in preserving transaction semantics, security and integrity. Since a transient object is purely local to the transaction which created it and does not enter the database, all these over-heads are avoided. Thus, in order to provide efficient access to purely local and temporary data, transient objects are provided by persistent programming languages.
Answer :
Persistent pointers can be implemented as Abstract Data Types (ADTs). These ADTs should provide the typical pointer operations like incrementing and dereferencing, so their usage and regular pointer usage is uniform. Regular pointers on the other hand are usually built-in types, implemented as part of the language.
Question 43. If An Object Is Created Without Any References To It, How Can That Object Be Deleted?
Answer :
If an object is created without any references to it, it can neither be accessed nor deleted via a program. The only way is for the database system to locate and delete such objects by itself. This is called garbage collection. One way to do garbage collection is by the method of mark and sweep. First, the objects referred to directly by programs are marked. Then references from these objects to other objects are followed, and those referred objects are marked. This procedure is followed repeatedly until no more unmarked objects can be reached by following reference chains from the marked objects. At this point, all these remaining unmarked objects are deleted. This method is correct; we can prove that if no new objects are marked after a round of mark and sweep, the remaining unmarked objects are indeed unreferenced.
Answer :
A database system must provide for such features as transactions, queries (associative retrieval of objects), security, and integrity. A persistent object system may not offer such features.
Answer :
If the type of an attribute is x, then in each tuple of the table, corresponding to that attribute, there is an actual object of type x . If its type is ref(x), then in each tuple, corresponding to that attribute, there is a reference to some object of type x. We choose a reference type for an attribute, if that attribute’s intended purpose is to refer to an independent object.
Answer :
SQL functions are primarily a mechanism for extending the power of SQL to handle attributes of complex data types (like images), or to perform complex and non-standard operations. Embedded SQL is useful when imperative actions like displaying results and interacting with the user are needed. These cannot be done conveniently in an SQL only environment. Embedded SQL can be used instead of SQL functions by retrieving data and then performing the function’s operations on the SQL result. However a drawback is that a lot of query-evaluation functionality may end up getting repeated in the host language code.
Answer :
Your answer will be based on the computers and storage media that you use. Typical examples would be hard disk, floppy disks and CD-ROM drives.
Question 48. How Does The Remapping Of Bad Sectors By Disk Controllers Affect Data-retrieval Rates?
Answer :
Remapping of bad sectors by disk controllers does reduce data retrieval rates because of the loss of sequentiality amongst the sectors. But that is better than the loss of data in case of no remapping!
Answer :
RAID level 1 (mirroring) is the one which facilitates rebuilding of a failed disk with minimum interference with the on-going disk accesses. This is because rebuilding in this case involves copying data from just the failed disk’s mirror. In the other RAID levels, rebuilding involves reading the entire contents of all the other disks.
Answer :
In the reserved space method, a query comparing the last existing field in a record to some value requires only one read from the disk. This single read is preferable to the potentially many reads needed to chase down the pointers to the last field if the pointer method is used.
Answer :
Using the pointer method, a join operation on attributes which are only in the anchor block can be performed on only this smaller amount of data, rather than on the entire relation, as would be the case using the reserved space method. Therefore, in this join example, the pointer method is preferable.
Answer :
If we allocate related records to blocks, we can often retrieve most, or all, of the requested records by a query with one disk access. Disk accesses tend to be the bottlenecks in databases; since this allocation strategy reduces the number of disk accesses for a given operation, it significantly improves performance.
Answer :
The typical OS uses LRU for buffer replacement. This is often a bad strategy for databases. As explained in Section 11.5.2 of the text, MRU is the best strategy for nested loop join. In general no single strategy handles all scenarios well, and ideally the database system should be given its own buffer cache for which the replacement policy takes into account all the performance related issues.
Answer :
An overflow block is used in sequential file organization because a block is the smallest space which can be read from a disk. Therefore, using any smaller region would not be useful from a performance standpoint. The space saved by allocating disk storage in record units would be overshadowed by the performance cost of allowing blocks to contain records of multiple files.
Answer :
A physical OID needs to have a unique identifier in addition to a pointer to a physical storage location. This is required to prevent dereferences of dangling pointers.
Answer :
If an object gets forwarded multiple times, the retrieval speed will decrease because accessing it will require accessing the series of locations from which the object has been successively forwarded to the current location. Multiple accesses can be avoided by always keeping in the oldest location the latest address of the object. This can be done by checking while forwarding whether this object has already been forwarded and in that case updating the forwarding address at the oldest location. Thus, atmost two accesses will be required.
Answer :
A dangling pointer is a pointer to an area which no longer contains valid data.
In the unique-id scheme to detect dangling pointers, physical OIDsmay contain a unique identifier which is an integer that distinguishes the OID from the identifiers of other objects that happened to be stored at the same location earlier, and were deleted or moved elsewhere. The unique identifier is also stored with the object, and the identifiers in an OID and the corresponding object should match. If the unique identifier in a physical OID does not match the unique identifier in the object to which that OID points, the system detects that the pointer is a dangling pointer, and signals an error.
Question 58. When Is It Preferable To Use A Dense Index Rather Than A Sparse Index?
Answer :
It is preferable to use a dense index instead of a sparse index when the file is not sorted on the indexed field (such as when the index is a secondary index) or when the index file is small compared to the size of memory.
Answer :
Reasons for not keeping several search indices include:
Question 60. What Is The Difference Between A Primary Index And A Secondary Index?
Answer :
The primary index is on the field which specifies the sequential order of the file. There can be only one primary index while there can be many secondary indices.
Answer :
In general, it is not possible to have two primary indices on the same relation for different keys because the tuples in a relation would have to be stored in different order to have same values stored together.We could accomplish this by storing the relation twice and duplicating all values, but for a centralized system, this is not efficient.
Answer :
Open hashing may place keys with the same hash function value in different buckets. Closed hashing always places such keys together in the same bucket. Thus in this case, different buckets can be of different sizes, though the implementation may be by linking together fixed size buckets using overflow chains. Deletion is difficult with open hashing as all the buckets may have to inspected before we can ascertain that a key value has been deleted, whereas in closed hashing only that bucket whose address is obtained by hashing the key value need be inspected. Deletions are more common in databases and hence closed hashing is more appropriate for them. For a small, static set of data lookups may be more efficient using open hashing. The symbol table of a compiler would be a good example.
Answer :
The causes of bucket overflow are :-
To reduce the occurrence of overflows, we can :-
Answer :
A range query cannot be answered efficiently using a hash index, we will have to read all the buckets. This is because key values in the range do not occupy consecutive locations in the buckets, they are distributed uniformly and randomly throughout all the buckets.
Answer :
Let us consider a two-dimensional grid array. When a bucket overflows, we can split the ranges corresponding to that row and column into two, in both the linear scales. Thus the linear scales will get one additional entry each, and the bucket is split into four buckets. The ranges should be split in such a way as to ensure that the four resultant buckets have nearly the same number of values.
There can be several other heuristics for deciding how to reorganize the ranges, and hence the linear scales and grid array.
Answer :
The existence bitmap for a relation can be calculated by taking the union (logical-or) of all the bitmaps on that attribute, including the bitmap for value null.
Answer :
Note that indices must operate on the encrypted data or someone could gain access to the index to interpret the data.Otherwise, the index would have to be restricted so that only certain users could access it. To keep the data in sorted order, the index scheme would have to decrypt the data at each level in a tree. Note that hash systems would not be affected.
Answer :
In general it is not desirable to force users to choose a query processing strategy because naive users might select an inefficient strategy. The reason users would make poor choices about processing queries is that they would not know how a relation is stored, nor about its indices. It is unreasonable to force users to be aware of these details since ease of use is a major object of database query languages. If users are aware of the costs of different strategies they could write queries efficiently, thus helping performance. This could happen if experts were using the system.
Answer :
Hash indices enable us to perform point look up (eg. σA=r(relation)) operations very fast, but for range searches the B+-tree index would be much more efficient. If there is a range query to be evaluated, and only a hash index is available, the better strategy might be to perform a file scan rather than using that index.
Answer :
We merge the leaf entries of the first sorted secondary index with the leaf entries of the second sorted secondary index. The result file contains pairs of addresses, the first address in each pair pointing to a tuple in the first relation, and the second address pointing to a tuple in the second relation. This result file is first sorted on the first relation’s addresses. The relation is then scanned in physical storage order, and addresses in the result file are replaced by the actual tuple values. Then the result file is sorted on the second relation’s addresses, allowing a scan of the second relation in physical storage order to complete the join.
Answer :
If there are multiple tuples in the inner relation with the same value for the join attributes, we may have to access that many blocks of the inner relation for each tuple of the outer relation. That is why it is inefficient. To reduce this cost we can perform a join of the outer relation tuples with just the secondary index leaf entries, postponing the inner relation tuple retrieval. The result file obtained is then sorted on the inner relation addresses, allowing an efficient physical order scan to complete the join.
Hybrid merge–join requires the outer relation to be sorted. The above algorithm does not have this requirement, but for each tuple in the outer relation it needs to perform an index lookup on the inner relation. If the outer relation is much larger than the inner relation, this index lookup cost will be less than the sorting cost, thus this algorithm will be more efficient.
Question 72. List The Acid Properties. Explain The Usefulness Of Each?
Answer :
The ACID properties, and the need for each of them are:-
Answer :
Even in this case the recovery manager is needed to perform roll-back of aborted transactions.
Answer :
There are several steps in the creation of a file. A storage area is assigned to the file in the file system, a unique i-number is given to the file and an i-node entry is inserted into the i-list. Deletion of file involves exactly opposite steps.
For the file system user in UNIX, durability is important for obvious reasons, but atomicity is not relevant generally as the file system doesn’t support transactions. To the file system implementor though, many of the internal file system actions need to have transaction semantics. All the steps involved in creation/deletion of the file must be atomic, otherwise there will be unreferenceable files or unusable areas in the file system.
Answer :
Database systems usually perform crucial tasks whose effects need to be atomic and durable, and whose outcome affects the real world in a permanent manner. Examples of such tasks are monetary transactions, seat bookings etc. Hence the ACID properties have to be ensured. In contrast, most users of file systems would not be willing to pay the price (monetary, disk space, time) of supporting ACID properties.
Answer :
The possible sequences of states are:-
Answer :
If a transaction is very long or when it fetches data from a slow disk, it takes a long time to complete. In absence of concurrency, other transactions will have to wait for longer period of time. Average response time will increase. Also when the transaction is reading data from disk, CPU is idle. So resources are not properly utilized. Hence concurrent execution becomes important in this case. However, when the transactions are short or the data is available in memory, these problems do not occur.
Question 78. Explain The Distinction Between The Terms Serial Schedule And Serializable Schedule.
Answer :
A schedule in which all the instructions belonging to one single transaction appear together is called a serial schedule. A serializable schedule has a weaker restriction that it should be equivalent to some serial schedule. There are two definitions of schedule equivalence – conflict equivalence and view equivalence. Both of these are described in the chapter.
Answer :
Most of the concurrency control protocols (protocols for ensuring that only serializable schedules are generated) used in practice are based on conflict serializability—they actually permit only a subset of conflict serializable schedules. The general form of view serializability is very expensive to test, and only a very restricted form of it is used for concurrency control.
Answer :
A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data items previously written by Ti, the commit operation of Ti appears before the commit operation of Tj. Recoverable schedules are desirable because failure of a transaction might otherwise bring the system into an irreversibly inconsistent state. Nonrecoverable schedules may sometimes be needed when updates must be made visible early due to time constraints, even if they have not yet been committed, which may be required for very long duration transactions.
Answer :
A cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data items previously written by Ti, the commit operation of Ti appears before the read operation of Tj. Cascadeless schedules are desirable because the failure of a transaction does not lead to the aborting of any other transaction. Of course this comes at the cost of less concurrency. If failures occur rarely, so that we can pay the price of cascading aborts for the increased concurrency, noncascadeless schedules might be desirable.
Question 82. What Benefit Does Strict Two-phase Locking Provide? What Disadvantages Result?
Answer :
Because it produces only cascadeless schedules, recovery is very easy. But the set of schedules obtainable is a subset of those obtainable from plain two phase locking, thus concurrency is reduced.
Answer :
Rigorous two-phase locking has the advantages of strict 2PL. In addition it has the property that for two conflicting transactions, their commit order is their serializability order. In some systems users might expect this behavior.
Answer :
It is relatively simple to implement, imposes low rollback overhead because of cascadeless schedules, and usually allows an acceptable level of concurrency.
Answer :
The proof is in Buckley and Silberschatz, “Concurrency Control in Graph Protocols by Using Edge Locks,” Proc. ACM SIGACT-SIGMOD Symposium on the Principles of Database Systems, 1984.
Answer :
It would make no difference. The write protocol is such that the most recent transaction to write an item is also the one with the largest time stamp to have done so.
Answer :
A transaction is rolled back because a newer transaction has read or written the data which it was supposed to write. If the rolled back transaction is re-introduced with the same time stamp, the same reason for rollback is still valid, and the transaction will have be rolled back again. This will continue indefinitely.
Answer :
When a transaction explicitly locks a node in shared or exclusive mode, it implicitly locks all the descendents of that node in the same mode. The transaction need not explicitly lock the descendent nodes. There is no difference in the functionalities of these locks, the only difference is in the way they are acquired, and their presence tested.
Answer :
An exclusive lock is incompatible with any other lock kind. Once a node is locked in exclusive mode, none of the descendents can be simultaneously accessed by any other transaction in any mode. Therefore an exclusive and intend-shared declaration has no meaning.
Answer :
If a transaction needs to access a large a set of items, multiple granularity locking requires fewer locks, whereas if only one item needs to be accessed, the single lock granularity system allows this with just one lock. Because all the desired data items are locked and unlocked together in the multiple granularity scheme, the locking overhead is low, but concurrency is also reduced.
Answer :
In the concurrency control scheme choosing Start(Ti) as the time stamp of Ti gives a subset of the schedules allowed by choosing Validation(Ti) as the time stamp. Using Start(Ti) means that whoever started first must finish first. Clearly transactions could enter the validation phase in the same order in which they began executing, but this is overly restrictive. Since choosing Validation(Ti) causes fewer non conflicting transactions to restart, it gives the better response times.
Answer :
Using the commit bit, a read request is made to wait if the transaction which wrote the data item has not yet committed. Therefore, if the writing transaction fails before commit, we can abort that transaction alone. The waiting read will then access the earlier version in case of a multiversion system, or the restored value of the data item after abort in case of a single-version system. For writes, this commit bit checking is unnecessary. That is because either the write is a “blind” write and thus independent of the old value of the data item or there was a prior read, in which case the test was already applied.
Answer :
Deadlock avoidance is preferable if the consequences of abort are serious (as in interactive transactions), and if there is high contention and a resulting high probability of deadlock.
Question 94. If Deadlock Is Avoided By Deadlock Avoidance Schemes, Is Starvation Still Possible?
Answer :A transaction may become the victim of deadlock-prevention rollback arbitrarily many times, thus creating a potential starvation situation.
Answer :
The phantom phenomenon arises when, due to an insertion or deletion, two transactions logically conflict despite not locking any data items in common. The insertion case is described in the book. Deletion can also lead to this phenomenon. Suppose Ti deletes a tuple from a relation while Tj scans the relation. If Ti deletes the tuple and then Tj reads the relation, Ti should be serialized before Tj . Yet there is no tuple that both Ti and Tj conflict on.
An interpretation of 2PL as just locking the accessed tuples in a relation is incorrect. There is also an index or a relation data that has information about the tuples in the relation. This information is read by any transaction that scans the relation, and modified by transactions that update, or insert into, or delete from the relation. Hence locking must also be performed on the index or relation data, and this will avoid the phantom phenomenon.
Question 96. Devise A Time Stamp-based Protocol That Avoids The Phantom Phenomenon?
Answer :In the text, we considered two approaches to dealing with the phantom phenomenon by means of locking. The coarser granularity approach obviously works for time stamps as well. The B+-tree index based approach can be adapted to time stamping by treating index buckets as data items with time stamps associated with them, and requiring that all read accesses use an index. We now show that this simple method works. Suppose a transaction Ti wants to access all tuples with a particular range of search-key values, using a B+- tree index on that search-key. Ti will need to read all the buckets in that index which have key values in that range. It can be seen that any delete or insert of a tuple with a key-value in the same range will need to write one of the index buckets read by Ti. Thus the logical conflict is converted to a conflict on an index bucket, and the phantom phenomenon is avoided.
Answer :
Volatile storage is storage which fails when there is a power failure. Cache, main memory, and registers are examples of volatile storage. Nonvolatile storage is storage which retains its content despite power failures. An example is magnetic disk. Stable storage is storage which theoretically survives any kind of failure (short of a complete disaster!). This type of storage can only be approximated by replicating data.
In terms of I/O cost, volatile memory is the fastest and non-volatile storage is typically several times slower. Stable storage is slower than non-volatile storage because of the cost of data replication.
Answer :
The shadow-paging scheme is easy to implement for single-transaction systems, but difficult for multiple-transaction systems. In particular it is very hard to allow multiple updates concurrently on the same page. Shadow paging could suffer from extra space overhead, but garbage collection can take care of that. The I/O overhead for shadow paging is typically higher than the log based schemes, since the log based schemes need to write one record per update to the log, whereas the shadow paging scheme needs to write one block per updated block.
Question 99. Explain The Difference Between A System Crash And A "disaster."?
Answer :
In a system crash, the CPU goes down, and disk may also crash. But stable-storage at the site is assumed to survive system crashes. In a “disaster”, everything at a site is destroyed. Stable storage needs to be distributed to survive disasters.
Answer :
Porting is relatively easy to a shared memory multiprocessor machine. Databases designed for single-processor machines already provide multitasking, allowing multiple processes to run on the same processor in a time shared manner, giving a view to the user of multiple processes running in parallel. Thus, coarse-granularity parallel machines logically appear to be identical to single-processor machines, making the porting relatively easy. Porting a database to a shared disk or shared nothing multiprocessor architecture is a little harder.
Answer :
The drawbacks would be that two interprocess messages would be required to acquire locks, one for the request and one to confirm grant. Interprocess communication is much more expensive than memory access, so the cost of locking would increase. The process storing the shared structures could also become a bottleneck. The benefit of this alternative is that the lock table is protected better from erroneous updates since only one process can access it.
Answer :
In a client-server system with page shipping, when a client requests an item, the server typically grants a lock not on the requested item, but on the page having the item, thus implicitly granting locks on all the items in the page. The other items in the page are said to be prefetched. If some other client subsequently requests one of the prefetched items, the server may ask the owner of the page lock to transfer back the lock on this item. If the page lock owner doesn’t need this item, it de-escalates the page lock that it holds, to item locks on all the items that it is actually accessing, and then returns the locks on the unwanted items. The server can then grant the latter lock request. If the unit of data shipping is an item, there are no coarser granularity locks; even if prefetching is used, it is typically implemented by granting individual locks on each of the prefetched items. Thus when the server asks for a return of a lock, there is no question of de-escalation, the requested lock is just returned if the client has no use for it.
Answer :
With increasing scale of operations, we expect that the number of transactions submitted per unit time increases. On the other hand, we wouldn’t expect most of the individual transactions to grow longer, nor would we require that a given transaction should execute more quickly now than it did before. Hence transaction scale-up is the most relevant measure in this scenario.
Answer :
Since the part which cannot be parallelized takes 20% of the total running time, the best speed up we can hope for has to be less than 5.
Answer :
Increasing contention for shared resources prevents linear scale-up with increasing parallelism. In a shared memory system, contention for memory (which implies bus contention) will result in falling scale-up with increasing parallelism. In a shared disk system, it is contention for disk and bus access which affects scale-up. In a shared-nothing system, inter-process communication overheads will be the main impeding factor. Since there is no shared memory, acquiring locks, and other activities requiring message passing between processes will take more time with increased parallelism.
Answer :
In a distributed system, all the sites typically run the same database management software, and they share a global schema. Each site provides an environment for execution of both global transactions initiated at remote sites and local transactions. The system described in the question does not have these properties, and hence it cannot qualify as a distributed database.
Answer :
With the central server, each site does not have to remember which site to contact when a particular data item is to be requested. The central server alone needs to remember this, so data items can be moved around easily, depending on which sites access which items most frequently.Other house-keeping tasks are also centralized rather than distributed, making the system easier to develop and maintain. Of course there is the disadvantage of a total shutdown in case the server becomes unavailable. Even if it is running, it may become a bottleneck because every request has to be routed via it.
Question 108. Discuss The Relative Advantages Of Centralized And Distributed Databases?
Answer :
Answer :
Answer :
Data transfer on a local-area network (LAN) is much faster than on a wide-area network (WAN). Thus replication and fragmentation will not increase throughput and speed-up on a LAN, as much as in a WAN. But even in a LAN, replication has its uses in increasing reliability and availability.
Question 111. When Is It Useful To Have Replication Or Fragmentation Of Data?
Answer :
Replication is useful when there are many read-only transactions at different sites wanting access to the same data. They can all execute quickly in parallel, accessing local data. But updates become difficult with replication. Fragmentation is useful if transactions on different sites tend to access different parts of the database.
Answer :
Autonomy is the amount of control a single site has over the local database. It is important because users at that site want quick and correct access to local data items. This is especially true when one considers that local data will be most frequently accessed in a database. Transparency hides the distributed nature of the database. This is important because users should not be required to know about location, replication, fragmentation or other implementation aspects of the database.
Answer :
In remote backup systems all transactions are performed at the primary site and the data is replicated at the remote backup site. The remote backup site is kept synchronized with the updates at the primary site by sending all log records. Whenever the primary site fails, the remote backup site takes over processing.
The distributed systems offer greater availability by having multiple copies of the data at different sites whereas the remote backup systems offer lesser availability at lower cost and execution overhead.
In a distributed system, transaction code runs at all the sites whereas in a remote backup system it runs only at the primary site. The distributed system transactions follow two-phase commit to have the data in consistent state whereas a remote backup system does not followtwo-phase commit and avoids related overhead.
Answer :
Consider the balance in an account, replicated at N sites. Let the current balance be $100 – consistent across all sites. Consider two transactions T1 and T2 each depositing $10 in the account. Thus the balance would be $120 after both these transactions are executed. Let the transactions execute in sequence: T1 first and then T2. Suppose the copy of the balance at one of the sites, say s, is not consistent – due to lazy replication strategy – with the primary copy after transaction T1 is executed and let transaction T2 read this copy of the balance. One can see that the balance at the primary site would be $110 at the end.
Answer :
The centralized approach has the problem of a possible bottleneck at the central site and the problem of electing a new central site if it goes down. The distributed approach has the problem that many messages must be exchanges to keep the system fair, or one site can get ahead of all other sites and dominate the database.
Database system concepts Related Tutorials |
|
---|---|
Python Tutorial | Teradata Tutorial |
Adv Java Tutorial | SQL Database Tutorial |
JBOSS Tutorial | JSON (JavaScript Object Notation) Tutorial |
Database System Concepts Practice Test
All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.