Some common VSAM problems - VSAM

Some common VSAM problems
Here we describe some problems which may affect the processing and the existence of your VSAM data sets. To simplify our search for a solution to each problem, we can use the three “Whats”, for example, occurrence, recovery, and avoidance.

In each subsection we cover the three “Whats”:

  • What happened to cause this problem?
  • What must you do in order to recover your data now?
  • What must you do to avoid this problem in the future?

Broken data sets can be caused by many different circumstances, including user errors. When diagnosing these types of problems, the first thing that must be done is to identify what is actually wrong with the data set. The first sign of a problem is the VSAM or the z/OS system messages. A single error can often generate numerous messages. You should focus your attention on the return code presented and the companion explanation. This return code will be the one passed by the system component that first encountered the error.

In this section we group the errors by categories. However, this is not an easy task. Some of the categories overlap and even interact with others. For example, a bad channel program may be caused by an improper sharing, which caused a structural damage.

Lack of virtual storage
The following messages may indicate a lack of virtual storage:

  • IDC3351I ** VSAM {OPEN|CLOSE|I/O} RETURN CODE IS return-code
  • 136 (Close): Not enough virtual storage was available in the program's address space for a work area for Close.
    132 (Open): One of the following errors occurred:
    Not enough storage was available for work areas.
    The format-1 DSCB or the catalog cluster record is incorrect.
    136 (Open): Not enough virtual storage space is available in the program's address space for work areas, control blocks, or buffers.
    40 (I/O): Insufficient virtual storage in the user's address space to complete the request.
  • IEC161I 001 [(087)]-ccc,jjj, sss,ddname,dev,ser,xxx, dsname,cat

What happened?
Due to the lack of virtual storage, an abend occurs. In this case, a symptom dump may beincluded.

What to do for recovery?
Because the data set processing was interrupted (abended) in apparently unknown
circumstances, there are two cases:

  • VSAM data set is being accessed by a subsystem as CICS, then CICS was doing the synchpoint, journaling, and is able to recover your data by rolling it back.
  • VSAM data set being accessed by your program; then you should correct the virtual storage problem and re-run the program (if possible) or restore the backup and re-run the job.

Sometimes, the message is not the result of an abend. It can be an alert as with IEC161I, where BLDVRP macro indicates that there was not enough virtual storage to satisfy the request done by System Management Buffering (SMB).

SMB gets the available storage, and processing goes on.

What to do to avoid future problems?
If possible,increase your region below or above, or decrease the common area below 16 MB, or force your software to be in R31 mode.

Determine if the VSAM buffers and their control blocks are below or above the 16 MB line. Ifbelow, read “Locating VSAM buffers above 16 MB” to learn how to move them above with integrity.

Initial loading problems
The following messages may indicate initial load problems:

  • IDC3308I ** DUPLICATE RECORD xxx
    The output data set of a REPRO command already contains a record with the same key or record number.
  • IDC3351I ** VSAM {OPEN|CLOSE|I/O} RETURN CODE IS return-code
  • 8 (I/O): You attempted to store a record with a duplicate key, or there is a duplicate record for an alternate index with the unique key option

    12 (I/O): You attempted to store a record out of ascending key sequence in skip-sequential mode; record had a duplicate key; for skip-sequential processing, your GET, PUT, and POINT requests are not referencing records in ascending sequence; or, for skip-sequential retrieval, the key requested is lower than the previous key requested. For shared resources, the buffer pool is full.

    116 (I/O): During initial data set loading (that is, when records are being stored in the data set the first time it is opened), GET, POINT, ERASE, direct PUT, and skip-sequential PUT with OPTCD=UPD are not allowed.

    During initial data set loading, for initial loading of a relative record data set, the request was other than a PUT insert.

What happened?
These messages point to problems with initial load or mass insertion(also called skip sequential) of a VSAM cluster.

Initial load of your data set can be done by IDCAMS REPRO or by a program of yours.

When loading a VSAM KSDS data set, the logical records must be sorted in a key sequenced order. No out-of-sequence or duplicated keys are allowed.

If the duplicated keys message applies to the initial load of an alternate index (AIX) cluster, remember that for AIX is possible to have duplicated keys, but in this case you should not use the UNIQUE parameter, which would definitely cause this error.

Mass insertion resembles initial load in the sense that all the added logical records also need to be sorted by a key field. The difference is that in mass insertion, we are loading sequentially logical records to a data set which already has previous data.

What to do for recovery?
Here, there is not a need for recovery. Your data is saved in the input file. It is a matter of sorting and re-running the program.

What to do to avoid future problems?
After correcting the error, introduce a procedure in production to avoid having the same error again.

Mismatch between catalog and data set
The following error messages may indicate a mismatch between catalog and data set information:

  • IDC3351I ** VSAM OPEN RETURN CODE IS 108
    108: Attention message: the time stamps of a data component and an index component do not match. This indicates that either the data or the index has been updated separately from the other. Check for possible duplicate VVRs.
  • IDC3351I DATA SET IS ALREADY OPEN FOR OUTPUT OR WAS NOT CLOSED CORRECTLY
  • The data set is already OPEN for output by a user on another system, or was not previously closed.
  • IDC11709I DATA HIGH-USED RBA IS GREATER THAN HIGH-ALLOCATED RBA
  • The data component high-used relative byte address is greater than the high-allocated relative byte address. Supportive messages display pertinent data, and processing continues.

  • IDC11712I DATA HIGH-ALLOCATED RBA IS NOT A MULTIPLE OF CI SIZE
    The high-allocated relative byte address is not an integral multiple of the control interval size.
  • IDC11727I INDEX HIGH-USED RBA IS GREATER THAN HIGH-ALLOCATED RBA
  • The index component high-used relative byte address is greater than the high-allocated relative byte address.

  • IDC3350I synad[SYNAD] NO RECORD FOUND from VSAM

What happened?
The data set may be intact, but the catalog information describing the data set mismatch problems. It sometimes results in an open failure.

The most common discrepancies between the catalog and cluster are these:

  • There were different time stamps between index and data components.
  • HURBA and HARBA not correctly updated, mainly caused by an abend, without a normal close.
  • Open-for-output bit on for a closed cluster; mainly caused by an abend, without a normal close, or other task accessing from other system.
  • RBAs fields in the VVR do not match the data set attributes, for example:
    If the RBA of the high-level index CI is corrupted, you are not be able to perform direct requests against the data set.
  • If the RBA of the sequenceset index CI is corrupted, you are not able to perform sequential access.

    These last two discrepancies are not covered here.

A LISTCAT output (or CSI report) can help with the documentation for problem determi- nation.

What to do for recovery?
IDCAMS VERIFY is a requirement in this type of problem. In this case, VERIFY can correct HURBA and verify the open-for-output bit.

_ Different time stamps; Open does not abend your task, so continuation or abend of the
application depends on the program that issues the OPEN. In general, it will keep processing, but another problem is likely to occur.

We suggest you run VERIFY (to be sure and document the mismatch) and then run EXAMINE to guarantee no structural damage exists for KSDS and VRRDS, with a test for the index option only (INDEXTEST).

Completion of EXAMINE without error proves that there are no structural damages. If the index component shows damage at this point, it must be restored before further use. Note that EXAMINE may provide messages containing only informational data that may not require restoring the cluster.

_ HURBA and HARBA not correctly updated.
If the HURBA is not updated, when the data set is subsequently opened and the user's program attempts to process records beyond end-of-data or end-of-key range, a read operation results in a “no record found” error, and a write operation might write records over previously written records. To avoid this, you can use the VERIFY command which corrects the catalog information.

_ Open-for-output bit on for a closed cluster.
At next OPEN, VSAM implicitly issues a VERIFY command, when it detects an open-for-output indicator on and issues an informational message (maybe the one that you are seeing) stating whether the VERIFY command is successful.

If a subsequent OPEN is issued for output, VSAM turns off the open-for-output indicator at successful CLOSE. If the data set is opened for input, however, the open-for-output indicator is left on.

Hardware errors
Messages that indicate hardware errors include:

  • IOS000I dev,chp,err,cmd,stat,dcbctfd,ser,mbe,eod,jobname,sens text
    The system found an uncorrectable I/O error in device error recovery. Text is one of the following types:
    Channel interface, or protocol error
    Device has exceeded long busy timeout
    Permanent error — volume fenced
    Permanent error — device reported unknown message code = cde
    Channel control, data, chaining, program, protection, interface check
    Unable to obtain sense data from the device
  • IDC3351I ** VSAM {OPEN|CLOSE|I/O} RETURN CODE IS return-code
    184 (Open): An uncorrectable I/O error occurred while VSAM was completing outstanding I/O requests.
    246 (Close): The compression management services (CMS) close function failed.
    184 (Open): An uncorrectable I/O error occurred while VSAM was completing an I/O request.
    245 (I/O): A severe error was detected by the compression management services (CMS) during compression processing.
    246 (I/O): A severe error was detected by the compression management services (CMS) during decompression processing.
    250 (I/O): A valid dictionary token does not exist for the compressed data set. The data record cannot be decompressed.

What happened?
Hardware I/O errors usually mean that the I/O hardware (channel, controller, device) had a problem executing that I/O. For compressed VSAM data sets, you may have hardware problems with the CPU compression assist function.

The first thing to do is break down the message to get more details on the error. An IDCAMS LISTCAT (if possible) of the data set is also helpful to give the attributes of the file in the logical 3390/3380, such as: the physical record size, the device type and the CCHH of all extents.
LOGREC output is also valuable. A GTF CCW trace may be necessary to run DIAGNOSE on the problem, however, for such tools, you may need to recreate the situation.

VERIFY IGGCSIVS is a program from SYS1.SAMPLIB accessing Catalog Search Interface (CSI) which produces a list of data set names defined in a given catalog that reside on a specific volume. Such a list might be helpful in a recovery situation affecting that volume.

When accessing with VSAM macros,a VSAM data set where a physical error was detected, the register 15 comes with return code equal to 12.

What to do for recovery?
In the case of a media error, do not use ICKDFS to run an Analyze and Inspect function. The characteristics of the physical devices that make up the RAID devices family do not allow the use of the ICKDSF commands that perform installation, media maintenance and problem determination functions, such as Install, Analyze, and Inspect.

If the problem happens with the compress assist feature, run the program again, switching off data compression in the data class.

What to do to avoid future problems?
If the error is in the DASD controller, keep a log of such type of occurrences to force a better quality of the manufacturer or change to other. Another solution (for some media problems) is to implement RAID-1 dual copy (in the same controller) or remote copy (in two controllers), mainly for your most critical data, such as logs.

Bad data or bad channel program
The following message may indicate bad data or bad channel program:
IDC3351I ** VSAM {OPEN|CLOSE|I/O} RETURN CODE IS return-code

  • 140 (Open): The catalog indicates this data set has an incorrect physical record size.
  • 16 (I/O): Record not found.
  • 88 (EOV): A previous extend error has occurred during EOV processing of the data set.

What happened?
This problem may be pervasive, but in general, it is usually caused by two major reasons:

  • Duplicate VVRs
  • A bad channel, causing data overlays and corrupting indexes

The existence of damaged or duplicate VVRs on a volume may cause data sets to be overlaid with data from other data sets.

VSAM volume record (VVR) is a logical record within VSAM volume data set (VVDS). VVDS is a data set that describes the dynamic characteristics of VSAM and system-managed data sets residing on a given DASD volume. Together with the BCS, it is a part of an integrated catalog facility.

There are many things that can cause the channel program to be bad. The most common causes of a bad channel program is that the data that describes the data set is bad. If a bad VVR is picked up at Open time, VSAM may try to access cylinder and tracks that do not belong to the data set getting various I/O errors.

If another program overlays the VSAM data set, this can cause the channel program to fail at that spot where the other data exists. For instance, if the CI size of the VSAM file that is broken is 4 KB, the channel program is built to read records of that size. If another program has overlaid the file with records of, say, 16 KB size, the channel programs for the record size of 4 KB fails on all cylinder/tracks/heads that do not have this record size. This situation is usually referred to as bad data.

Theoretically, system code problems can cause VSAM to build channel program incorrectly, or in some cases they may be built correctly, but they are getting modified incorrectly and redriven by ERP or even third party products. Luckily, these reasons are not too common, even with new device types.

With problems like these, it is important to get as much information about the data set as you can before the customer restores it. We suggest using:

  • LISTCAT or CSI.
  • DFSMSdss (PRINT command) or Ditto to print the 3390/3380 logical cylinder/ track/ head that the I/O error is occurring. It can indicate whether there is any data at all on the track, or if the data that is there belongs to a different data set.
  • SMF records, mainly the type 6x connected to catalog and VSAM data sets.

After the data set has been recovered, the only “history” that can be EXAMINEed is the SMF records. If the problem “clears up” after the data set is closed, but without the data set being recovered, you might suspect a problem with an internal control block being overlaid, rather than something on DASD.

What to do for recovery?
This overlay is a hard failure and the data set has to be manually restored from a backup. Often, in this case, the bad data itself gives a clue as to what data set or application has caused the overlay. Trying to avoid the restore for a bad KSDS data component, you may try to skip past the bad data records, and recover only those records that can be properly read.

A DIAGNOSE command even after the data set has been recovered can check for this problem (since only a DELETE VVR can get rid of an orphan after one occurs). Luckily, many enhancements have been introduced to Catalog and Open processes in the last few years to check for duplicate VVRs at OPEN time, so this should be less of a problem.

What to do to avoid future problems?
Experience has shown that the majority of such errors are caused by improper sharing.

Because such types of errors may also be caused by system errors, you may want to investigate the possibility of APARs and PTFs related to the problem.

Structural damage
The following message may indicate structural damage:
IDC3351I ** VSAM {OPEN|CLOSE|I/O} RETURN CODE IS return-code

  • 128 (Close): Index search horizontal chain pointer loop encountered.
  • 190 (Open): An incorrect high-allocated RBA was found in the catalog entry for this data set. The catalog entry is bad and will have to be restored.
  • 76 (Open): Attention message: The interrupt recognition flag (IRF) was detected for a data set opened for input processing. This indicates that DELETE processing was interrupted.
  • 4 (I/O): End of data set encountered (during sequential retrieval), or the search argument is greater than the high key of the data set. Either no EODAD routine is provided, or one is provided and it returned to VSAM and the processing program issued another GET.
  • 32 (I/O): An RBA specified that does not give the address of any data record in the data set
  • 128 (I/O)): A loop exists in the index horizontal pointer chain during index search processing.
  • 144 (I/O): Incorrect pointer (no associated base record) in an alternate index.
  • 156 (I/O): An addressed GET UPD request failed because the control interval,flag was on, or an incorrect control interval was detected during keyed processing.In the latter case, the control interval is incorrect for one of the following reasons:
    A key is not greater than the previous key.
    A key is not in the current control interval.
    A spanned record RDF is present.
    A free space pointer is incorrect.
    The number of records does not match a group RDF record count.
    A record definition field is incorrect.
    An index CI format is incorrect (logical I/O error)

What happened to cause this problem?
KSDS or VRRDS VSAM data set organizations can “break” in more ways than other data sets, because they have an index component with logical pointers to other data and index CIs. If these pointers become corrupted, data can be lost or duplicated.

Also,much structural information about such data sets is located in the ICF catalog. For example, two RBA fields in the VVR are very important in accessing a KSDS data set, for example:

  • If the RBA of the high-level index CI is corrupted, you are not be able to perform direct requests against the data set.
  • If the RBA of the sequence set index CI is corrupted, you are not able to perform sequential access.

Then, if these fields are corrupted, errors may prevent you from accessing the data even though the data is intact. Also, it is possible that the index CI’s horizontal chain is destroyed, inhibiting the access to the data. One common way these fields can get corrupted is due to overlays of the AMDSB control block while the data set was open, which then get updated back to the VVDS at close time. Another way is through improper sharing of the data set during initial load mode processing.

Because these fields are stored in the ″Statistics Block″ (AMDSB), jobs that only opened the data set for input still update this information at close time and for that reason do not dismiss any possibilities just because the job was not updating the file.

SMF records are very helpful when diagnosing VVR damage. By investigating the SMF record from all systems, improper access of the data set can be identified as well as the time frame of the corruption.

What to do for recovery?
When dealing with KSDS/VRRDS, it is crucial that EXAMINE is run on the data set as part of the diagnosis. Also, the sooner the EXAMINE is run after the data set is broken, the better, since some types of damage can actually cause more breakage until the data set is so badly broken it is impossible to tell what actually happened first. Remember that the EXAMINE command provides details about the nature of data set damage.

Sometimes, the IDCAMS DIAGNOSE command can be used to check the data set for structural error in the catalog itself.

When losing the index in a KSDS/VRRDS, one possible recovery path is to read the data (in physical sequential mode) via its data component. Here you may use an assembler program (MACRF=ADR in ACB and OPTCD=ADR in RPL), or by IDCAMS Repro. Then, classify by the key and use an IDCAMS Define and REPRO to recreate the KSDS.

What to do to avoid future problems?
Experience has shown that some of such errors are caused by improper sharing.

Because such types of errors maybe caused by system errors, you may want to investigate the possibility of APARs and PTFs related to the problem.

Improper sharing
The following message may indicate improper sharing:

  • IDC3351I ** VSAM {OPEN|CLOSE|I/O} RETURN CODE IS return-code
  • – 16 (I/O): Record not found.
    – 20 (I/O): Record already held in exclusive control by another requester.
    – 28 (I/O): Data set cannot be extended because VSAM cannot allocate additional DASD space.
    – 88 (Open): A previous extend error has occurred during EOV processing of the data set.
    – 96 (Open): Attention message: an unusable data set was opened for input.
    – 116 (Open): Attention message: the data set was not properly closed or was not opened. If the data set was not properly closed, then data may be lost if processing continues. The data set was not properly closed.
    – 236 (I/O): Validity.
  • IDC11705I INDEX RECORD CONTAINS DUPLICATE INDEX POINTERS pointer-value

What happened?
Improper sharing is one of the most common causes of broken data sets. This covers some of the things to check for, to make sure the share optionsn are proper. Here are some causes of sharing problems:

  • Sharing a data set across regions (cross-region) or across systems (cross-system) without using proper enqueuing procedures to protect data set integrity.
  • Sharing a data set across systems — even using the appropriated share option — but without propagating ENQ name SYSVSAM around the GRS ring. This is the *most* common user error. It results in duplicate index pointers in the high level index records.
  • When a data set is defined using the model parameter, and the original data set has SPEED as an attribute of the index, data set damage can occur. VSAM does not support speed as an attribute for the index (speed is only supported for the data).
  • Another area related to broken data sets that is specific to CBUF processing is the VSI control block. Every time a data set is opened on a system for CBUF processing, a VSI is built for the data set and added to the VSI chain.
  • This control block is then updated by the user to communicate information from one region to another. If the user does this improperly, a broken data set can result.

Documentation is of paramount importance to address sharing problems. The necessarydocuments should be obtained at each system (or address space) accessing the troubled data set. The major document needed here is the IDCAMS LISTC or CSI report. List the catalog entry for the affected data set to show the allocation and RBA data.

The SMF 62 and 64 records can help you determine if the users have the data set open for output from different applications at the same time. A common user error in this area is applications or ISV products that were not intended to be run from multiple systems (and so they have no logic to serialize updates), but the customer is using them in this manner.

What to do for recovery?
Here is a list of recovery options:
Use the Access Method Services VERIFY command to attempt to close the data set properly. In a cross-system shared DASD environment, the ACBERFLG = 116 may mean the data set was not properly closed. The warning "data set not properly closed" may indicate an error in a VSAM data set. It may mean one of the following:

  • The "end of the data set" indicators (data set high-used RBA/CI) and so on) may be invalid.
  • There may be missing records.
  • There may be duplicate records.
  • The data set statistics fields in the catalog may be invalid.

If VSAM OPEN cannot successfully do a VERIFY then a user cannot do a VERIFY either.

  • Execute the IDCAMS EXAMINE command on the data set. Completion of EXAMINE without error proves that damage did not occur in a previous job. If the data set shows damage at this point, it must be restored before further use.
  • Proceed with the application job execution.
  • Execute IDCAMS EXAMINE on the data set when the job completes.
  • If damage to the cluster has occurred, run EXAMINE on SMF records from all systems which do have the ability to access the DASD volume. If shared access to the data set has occurred, correct or eliminate the contention for the data set.

What to do to avoid future problems?
You should issue the VERIFY command every time you open a VSAM cluster that is shared across systems or address spaces.

Mismatch between catalog and VTOC
In this, we do not cover much catalog recovery; however, some of the catalog problems arementioned as the ones associated with the IDC3009I message.

  • IDC3351I ** VSAM {OPEN|CLOSE|I/O} RETURN CODE IS return-code
  • – 132 (Open): One of the following errors occurred:
    • Unable to read JFCB or Scheduler Work Block
    • Unable to connect to redo log during DFSMStvs open
    • The forward recovery log associated with the data set was altered.
    • DFSMStvs failed to write a record to the forward recovery log.
    • Caller TASK cancelled
    • The data set is in a forward recovery required state.
  • IDC3009I VSAM CATALOG RETURN CODE IS return-code — REASON CODE IS IGGOCLaa — reason-code

VSAM does not produce expected output
Incorrect output failures can be identified by the following results:

  • Expected output is missing.
  • Output is different than expected.
  • Output should not have been generated.
  • System indicates damage to the VTOC or VTOC index.
  • ISMF panel information or flow is erroneous.

Incorrect output can be the result of a previous failure and can often be difficult to analyze because the component affected might not be the one that caused the problem. Review previous messages, abends, console logs, or other system responses. They could indicate the source of the failure.

Accumulating as much information as possible
It can help you isolate or resolve your problem, and the IBM Support Center will request it if trap or trace information is needed, such as the following.

  • When was the problem first noticed?
  • How was the problem identified (good output versus bad output)?
  • Were any system changes or maintenance recently applied? For example, a new device, software product, APAR, or PTF?
  • Does the problem occur with a specific data set, device, time of day, and so forth?
  • Does the problem occur in batch or TSO mode?
  • Is the problem solid or intermittent?
  • Can the problem be re-created?
  • EXAMINE the system and console logs for failure-related abends, messages, or return codes. A damaged VSAM data set can also cause incorrect output.
  • Add any failure-related return codes to the keyword string, exactly as the system presents them. You can also add the abend or message type-of-failure keywords to the incorrect output keyword string to define the symptoms more closely:
  • Determine whether failure-related record management return codes and reason codes exist.
  • VSAM provides return codes in register 15 and reason codes in either the access method control block (ACB) or the request parameter list (RPL).

    Reason codes in the ACB indicate VSAM open or close errors. Reason codes in the RPL indicate VSAM record management error indications returned to the caller of record management. Reason codes returned to the caller of record management in the RPL indicate VSAM record management errors.

  • Determine whether you have a damaged VSAM data set.Some incorrect output failures involve a damaged VSAM data set.The EXAMINE command provides details about the nature of data set damage. If these service aids indicate that the data set is not damaged, inform the IBM Support Center if you call for assistance. If they indicate that the data set is damaged, keep a copy of the output for possible use by the IBM Support Center. Be prepared to describe the type of data set damage. You should attempt to recover the data set and rerun the failing job to determine whether the problem is resolved.

OEM problems
Overlay of control blocks, abends, loops, deadlocks and performance problems can occur due to errors in the use of VSAM data sets. A large number of problems have been reported by many IBM products and non-IBM products that use VSAM data sets. These problems are described in a number of information APARs in IBMLINK. At the time of writing this book, 270 problems have been documented in these APARs as follows.

  • II10001 - Problems 1-60
  • II11013 - Problems 61-104
  • II11513 - Problems 105-157
  • II12140 - Problems 158-200
  • II12615 - Problems 201-238
  • II13278 - Problems 239-270

Enqueue issues
OPEN message IEC161I 052-084 is a common informational message. Most often it simply means that another job already has the dataset open when this job is trying to open it. As it usually indicates a scheduling problem rather than a system problem. In this section we provide information on how to determine what jobs are in contention for the same dataset.

VSAM OPEN processing determines this condition by checking the GRS data. Depending on the shareoptions (SHR) of the dataset, and the attributes of the OPEN, VSAM will enqueue against major name SYSVSAM with a minor name of the dataset name/catalog name plus other information. Thus, GRS is the mechanism OPEN processing uses to insure serialization of the OPEN process itself as well as the shareoptions of the file.

When VSAM issues an ENQUEUE for a SYSVSAM resource, VSAM will add an ENQRNIND indicator to the wanted resource name:

  • ENQRNIND = B = BUSY
  • ENQRNIND = I = INPUT
  • ENQRNIND = N = Non RLS Open
  • ENQRNIND = O = OUTPUT
  • ENQRNIND = R = RESERVE
  • ENQRNIND = S = Sphere
  • ENQRNIND = C = CLOSE
  • ENQRNIND = R = RLS Read
  • ENQRNIND = W = RLS Write

The ENQ for BUSY will be held throughout the period of the initial operation being performed, that is, OPEN, EOV, CLOSE, TCLOSE or CHECKPOINT RESTART. When the operation is complete or is failed, then, the 'B' resource will be dequeued. Example:

  • ENQ DATASET/CAT/B (OPEN) EXCLUSIVE
  • ENQ DATASET/CAT/O
  • DEQ DATASET/CAT/B (process data)
  • ENQ DATASET/CAT/B (CLOSE) EXCLUSIVE
  • DEQ DATASET/CAT/O
  • DEQ DATASET/CAT/B

The first place to look to determine which job is enqueued on the dataset is GRS data. You can view this data with the DISPLAY GRS command or through a monitor program. Look at both the global and local queues under the major name of SYSVSAM for the dataset that is getting the OPEN failure. Unfortunately, the GRS data is transient and may well change before you can find the job that was responsible for the message, but, do try the GRS command. Issue this GRS command from all systems that can share the DASD:

D GRS,RES=(SYSVSAM,*)

The next place to look is the console log. You may be able to determine if there were any backups/ copies/ dumps being taken at the time or even if there were unexpected batch jobs running at the time.

Another place to look for information is the SMF type 62 (SMF62) and SMF64 records. This will show what jobs were accessing the dataset at the time of the error. This also may not be conclusive since some access of the dataset will not produce SMF records (that is, Media Manager, DB2, DFSMS/dss).

Ensure the availability of the resource by means of JCL DD statements. This means, check your JCL and ensure that you have requested the proper disposition, that is, DISP=SHR.

Use the AMS command ALTER, to reset the Update Inhibit indicator in the data set's catalog record then, rerun the job. This means, run LISTC against the failed cluster, check for an attribute of INH-UPDATE. If found ON, then this dataset is in READONLY mode and will fail an OPEN for UPDATE request. Use the AMS command 'ALTER UNINHIBIT' to place the file in READ/WRITE mode, then, rerun the job.This shows a sample job for Cluster DFP1.JIMONE.KSDS (on a D/T3390 with volser=339000).

Example: Sample AMS JCL

In a VSAM catalog, INHIBIT UPDATE bit is in the BCS DINC record. In an ICF catalog, INHIBITUPDATE bit is in the VVDS VVR record.

If you are NOT able to determine what job is the cause of the IEC161I message, open a problem record with the Support Center and include your RMID of IDA0192B. You will be given a SLIP to capture a dump (which includes the GRSQ data) at the time the message is issued.

The SLIP will be set in CSECT IDA0192B of load module IDA0192A upon entry to a procedure named, PROBDT2B. IBM will need to know the PTF level of IDA0192B in order to resolve the offset of xxx in the SLIP. The user will need an AMBLIST of IDA0192A. A sample of the slip is shown in Example below.

Example: SLIP to determine the job causing IEC16I
SLIP SET,IF,A=SYNCSVCD,L=(IDA0192A,xxx),DATA=(13R?+F8,EQ,34),
SDATA=(ALLNUC,CSA,GRSQ,LPA,LSQA,RGN,SQA,SUM,TRT),END

IDA0192B is an offset into load module IDA0192A and PROBDT2B is an offset into IDA0192B. xxx is the sum of these 2 offsets. The value of xxx and values specified in the SLIP DATA parms may change with different releases and maintenance levels of CSECT IDA0192B. Be sure to check with IBM Support Personnel to ensure an accurate SLIP for your level.

Migration issues
Please refer to the relevant migration manuals to be aware of changes you need to make when you migrate to a new release of z/OS.

Enhancements to calculation of default CISIZE is an example.

Performance considerations
In a sense a performance problem may appear as a lack of availability.

Deadlocks
Deadlocks are a performance and also a problem determination subject. Here we cover define a deadlock, how to prevent them, how to detect a deadlock and what to do when we have one.

What is a deadlock?
Locks are used in VSAM by the DFSMS lock manager when the same control block structures are shared by strings in task programs. In such a case, Share Options do not apply. If the VSAM data set is processed in non-RLS mode there is one lock per CI, when in RLS there is a lock per each logical record.

Then, contention for VSAM CIs locks (or logical records locks) can lead to deadlocks, as such: A delays B in one lock and B delays A in other lock.

How to prevent a deadlock

The major rule is try to avoid to lock more than one logical records concurrently. Also a key recommendation is to avoid consistent reads (CR). If the RLS exploiter cannot follow such rules, it could create a hierarchy of locks. Then, when more than one lock is required concurrently, they need to be required in a pre-determined sequence.

How to detect a deadlock
The DFSMS lock manager provides a deadlock detection routine, the frequency with which it runs is determined by the installation. This routine considers a string task program in a deadlock situation, if it is waiting for a lock for more than an installation specified amount of time. There are two deadlock detection routines:

  • Deadlocks within a system
  • Deadlocks between systems

The frequencies of the deadlock detection routines are specified in GDSMSxx parameter ofSys1.Parmlib.

In a Sysplex, the first system that is initialized with an IGDSMSxx member having a valid DEADLOCK_ DETECTION specification determines this keyword for the other systems in the Sysplex. You can change this value through the SETSMS or SET SMS commands.

This keyword specifies the intervals for running local and global deadlock detection routine.

The first subparameter nnnn is the local deadlock detection cycle and specifies the interval in seconds for detecting deadlocks within a system.

The second subparameter is the global deadlock detection cycle and specifies the interval for detecting deadlocks between systems. This value is specified as the number of local detection cycles that occur before global deadlock detection is initiated.

To determine if a string task program is in a dead lock situation DFSMS lock manager compares the RLSTMOUT keyword with the amount of time a string task program is waiting for a lock.

RLSTMOUT({nnnn|0})
Specifies the maximum time, in seconds, that a VSAM RLS request must wait for a required lock before the request is assumed to be in deadlock.

You can specify a value between 0 to 9999 (in seconds). A value of 0 means that the VSAM RLS request has no time out value and the request waits for as long as necessary to obtain the required lock.

RLSTMOUT can be specified only once in a sysplex and applies across all systems in the sysplex.

What to do when a deadlock is detected
When a lock request is found in a deadlock, VSAM rejects the request in wait.

This results in the VSAM request completing with a deadlock error response.Your applications must be prepared to accept locking error return codes that may be returned on GET or POINT NRI requests. However, normally such errors do not occur.

Beware of some VSAM restrictions
Keep this important restriction in mind when using the REPRO command.
REPRO to empty VSAM data set
If you try to repro into an empty VSAM data set, you will get an error IDC3351I ** VSAM OPEN RETURN CODE =160. This is a permanent restriction of VSAM. One work-around is to "prime" the dataset by adding and deleting one record. This will increment the HI-U-RBA to a value other than zero, and allow the REPRO to proceed.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

VSAM Topics