Usage Note 48705: A one-to-many merge with common variables that are not the BY variables will have values from the many data set after the first observation, states. In a one-to-many merge with common variables that are not the BY variables, customers sometimes expect that the value for the common variable from the 'one' data set will be retained throughout the BY group if the 'one' variable.
One-to-one merging is similar to a one-to-one reading, with two exceptions: you use the MERGE statement instead of multiple SET statements, and the DATA step reads all observations from all data sets. Data set COMBINED shows the result. One-to-One Reading and One-to-One Merging. Match-Merging The following figure shows the results of match-merging. In a one-to-one merge, the number of observations in the new data set equals the number of observations in the largest data set that was named in the MERGE statement. If you use the MERGENOBY= SAS system option, you can control whether SAS issues a message when MERGE processing occurs without an associated BY statement. As a SAS® Programmer, one of our common tasks is to merge data from two or more datasets. Most merges are 1-to-1 or 1-to-many, i.e. There is at least one dataset with a sequence of variables that create a unique record identifier. But what about the case where there is not a unique record in both cases, known as a many-to-many merge?
Chapter Contents | Previous | Next |
What You Need to Know before Combining Information Stored In Multiple SAS Data Sets |
Application requirements vary, but there are common factors for allapplications that access, combine, and process data. Once you have determinedwhat you want the output to look like, you must
- determine how the input data is related
- ensure that the data isproperly sorted or indexed, if necessary
- select the appropriate access method to process the input data
- select theappropriate SAS tools to complete the task.
The Four Ways That Data Can Be Related |
You must be able to identify the existing relationships in your data.This knowledge is crucial for understanding how to process input data in orderto produce desired results. All related data fall into one of these four categories,characterized by how observations relate among the data sets:
- one-to-one
- one-to-many
- many-to-one
- many-to-many.
Toobtain the results you want, you should understand how each of thesemethods combines observations, how each method treats duplicate values ofcommon variables, and how each method treats missing values or nonmatchedvalues of common variables. Some of the methods also require that you preprocessyour data sets by sorting them or by creating indexes. See the descriptionof each method in Combining SAS Data Sets: Methods.
One-to-One
In the following example, observations in data sets SALARY and TAXESare related by common values for EmployeeNumber.
One-to-One Relationship
One-to-Many and Many-to-One
In the following example, observations in data sets ONE and TWO arerelated by common values for variable A. Values of A are unique in data setONE but not in data set TWO.
One-to-Many Relationship
In the following example, observations in data sets ONE, TWO,andTHREE are related by common values for variable ID. Values of ID are uniquein data sets ONE and THREE but not in TWO. For values 2 and 3 of ID, a one-to-manyrelationship exists between observations in data sets ONE and TWO, and a many-to-onerelationship exists between observations in data sets TWO and THREE.
One-to-Many and Many-to-One Relationships
Many-to-Many
In the following example, observations in data sets BREAKDOWN and MAINTENANCEare related by common values for variable Vehicle. Values of Vehicle are notunique in either data set. A many-to-many relationship exists between observationsin these data sets for values AAA and CCC of Vehicle.
Many-to-Many Relationship
Access Methods: Sequential versus Direct |
Overview
Once you have established data relationships, the next step is to determinethe best mode of data access to relate the data. You can access observationssequentially in the order in which they appear in the physical file. Or youcan access them directly, that is, you can go straight to an observation ina SAS data set without having to process each observation that precedes it.
Sequential Access
The simplest and perhaps most common way to process data with a DATAstep is to read observations in a data set sequentially. You can read observationssequentially using the SET, MERGE, UPDATE, or MODIFY statements. You can alsouse the SAS File I/O functions, such as OPEN, FETCH, and FETCHOBS.
Direct Access
- by an observation number
- by the value of one or more variables through a simple or compositeindex.
To access observations directly by their observation number, use thePOINT= option with the SET or MODIFY statement. The POINT= option names avariable whose current value determines which observation a SET or MODIFYstatement reads.
To access observations directly based on the values of one or more specifiedvariables, you must first create an index for the variables and then readthe data set using the KEY= statement option with the SET or MODIFY statement.An index is a separate structure that contains the data valuesof the key variable or variables, paired with a location identifier for theobservations containing the value.
Note: You can also use the SAS FileI/O functions such as CUROBS, NOTE, POINT and FETCHOBS to access observationsby observation number.
Overview of Methods for Combining SAS Data Sets |
- concatenating
- interleaving
- one-to-onereading
- one-to-one merging
- matchmerging
- updating.
Concatenating
Concatenating Two Data Sets
Interleaving
Interleaving Two Data Sets
One-to-One Reading and One-to-One Merging
One-to-One Reading and One-to-One Merging
Match-Merging
Match-Merging Two Data Sets
Updating
UPDATE replaces an existing file with a new file, allowing you to add,delete, or rename columns. MODIFY performs an update in place by rewritingonly those records that have changed, or by appending new records to the endof the file.
Note that by default, UPDATE and MODIFY do not replace nonmissing valuesin a master data set with missing values from a transaction data set.
Updating a Master Data Set
Overview of Tools for Combining SAS Data Sets |
Using Statements and Procedures
Access Method | |||||
---|---|---|---|---|---|
Statement or Procedure | Action Performed | Sequential | Direct | Can Use withBYstatement | Comments |
BY | controls the operation of a SET, MERGE, UPDATE, or MODIFY statementin the DATA step and sets up special grouping variables. | NA | NA | NA | BY-group processing is a means of processing observations that havethe same values of one or more variables. |
MERGE | reads observations from two or more SAS data sets and joins them intoa single observation. | X | X | When using MERGE with BY, the data must be sorted or indexed on theBY variable. | |
MODIFY | processes observations in a SAS data set in place. (Contrast with UPDATE.) | X | X | X | Sorted or indexed data are not required for use with BY, but are recommendedfor performance. |
SET | reads an observation from one or more SAS data sets. | X | X | X | Use KEY= or POINT= statement options for directly accessing data. |
UPDATE | applies transactions to observations in a master SAS data set. UPDATEdoes not update observations in place; it produces an updated copy of thecurrent data set. | X | X | Both the master and transaction data sets must be sorted by or indexedon the BY variable. | |
PROC APPEND | adds the observations from one SAS data set to the end of another SASdata set. | X | |||
PROC SQL | reads an observation from one or more SAS data sets; reads observationsfrom up to 32 SAS data sets and joins them into single observations; manipulatesobservations in a SAS data set in place; easily produces a Cartesian product. | X | X | X | All three access methods are available in PROC SQL, but the access methodis chosen by the internal optimizer. |
PROC SQL is the SAS implementationof Structured Query Language. In addition to expected SQL capabilities, PROCSQL includes additional capabilities specific to SAS, such as the use of formatsand SAS macro language.
Using Error Checking
You can use the _IORC_ automatic variable and the SYSRC autocall macroto perform error checking in a DATA step. Use these tools with the MODIFYstatement or with the SET statement and the KEY= option. For more informationabout these tools, see Error Checking When Using Indexes to Randomly Access or Update Data.
Sas One To Many Match Merge
How to Prepare Your Data Sets |
- Know the structure and the contents of the data sets.
- Look at sources of commonproblems.
- Ensure that observations are in the correct order, or that theycan be retreived in the correct order (for example, by using an index).
- Test your program.
Knowing the Structure and Contents of the Data Sets
To help determine how your data are related, look at the structure ofthe data sets. To see the data set structure, execute the DATASETS procedure,the CONTENTS procedure, or access the SAS Explorer window in your windowingenvironment to display the descriptor information. Descriptor informationincludes the number of observations in each data set, the name and attributesof each variable, and which variables are included in indexes. To print asample of the observations, use the PRINT procedure or the REPORT procedure.
You can also use functions such as VTYPE, VLENGTH, and VLENGTHX to showspecific descriptor information. For a short description of these functions,see the Variable Information functions in Functions and CALL Routines. For complete information about these functions, see 'Functions andCALL Routines' in SAS Language Reference: Dictionary.
Looking at Sources of Common Problems
- variables that have the same name but that represent differentdata
SAS includes only one variable of a given name in the new data set.If you are merging two data sets that have variables with the same names butdifferent data, the values from the last data set that was read are writtenover the values from other data sets.
To correct the error, you can rename variables before you combine thedata sets by using the RENAME= data set option in the SET, UPDATE, or MERGEstatement, or you can use the DATASETS procedure.
- common variables with the same data but differentattributes
The way SAS handles these differences depends on which attributes aredifferent:
- type attribute
If the type attribute is different, SAS stops processing the DATA stepand issues an error message stating that the variables are incompatible.
To correct this error, you must use a DATA step to re-create the variables.The SAS statements you use depend on the nature of the variable.
- length attribute
If the length attribute isdifferent, SAS takes the length from thefirst data set that contains the variable. In the following example, all datasets that are listed in the MERGE statement contain the variable Mileage.In QUARTER1, the length of the variable Mileage is four bytes; in QUARTER2,it is eight bytes and in QUARTER3 and QUARTER4, it is six bytes. In the outputdata set YEARLY, the length of the variable Mileage is four bytes, which isthe length derived from QUARTER1.
To override the default and set the length yourself, specify the appropriatelength in a LENGTH statement that precedes the SET, MERGE, MODIFY,or UPDATE statement.
- label, format, and informat attributes
If any of these attributes are different, SAS takes theattribute fromthe first data set that contains the variable with that attribute. However,any label, format, or informat that you explicitly specify overrides a default.If all data sets contain explicitly specified attributes, the one specifiedin the first data set overrides the others. To ensure that the new outputdata set has the attributes you prefer, use an ATTRIB statement.
You can also use the SAS File I/O functions such as VLABEL, VLABELX,and other Variable Information functions to access this information. Fora short description of these functions, see the Variable Information functionsin Functions and CALL Routines by Category. For complete information about these functions, see 'Functions and CALLRoutines' in SAS Language Reference: Dictionary.
- type attribute
Application requirements vary, but there are common factors for allapplications that access, combine, and process data. Once you have determinedwhat you want the output to look like, you must
- determine how the input data is related
- ensure that the data isproperly sorted or indexed, if necessary
- select the appropriate access method to process the input data
- select theappropriate SAS tools to complete the task.
The Four Ways That Data Can Be Related |
You must be able to identify the existing relationships in your data.This knowledge is crucial for understanding how to process input data in orderto produce desired results. All related data fall into one of these four categories,characterized by how observations relate among the data sets:
- one-to-one
- one-to-many
- many-to-one
- many-to-many.
Toobtain the results you want, you should understand how each of thesemethods combines observations, how each method treats duplicate values ofcommon variables, and how each method treats missing values or nonmatchedvalues of common variables. Some of the methods also require that you preprocessyour data sets by sorting them or by creating indexes. See the descriptionof each method in Combining SAS Data Sets: Methods.
One-to-One
In the following example, observations in data sets SALARY and TAXESare related by common values for EmployeeNumber.
One-to-One Relationship
One-to-Many and Many-to-One
In the following example, observations in data sets ONE and TWO arerelated by common values for variable A. Values of A are unique in data setONE but not in data set TWO.
One-to-Many Relationship
In the following example, observations in data sets ONE, TWO,andTHREE are related by common values for variable ID. Values of ID are uniquein data sets ONE and THREE but not in TWO. For values 2 and 3 of ID, a one-to-manyrelationship exists between observations in data sets ONE and TWO, and a many-to-onerelationship exists between observations in data sets TWO and THREE.
One-to-Many and Many-to-One Relationships
Many-to-Many
In the following example, observations in data sets BREAKDOWN and MAINTENANCEare related by common values for variable Vehicle. Values of Vehicle are notunique in either data set. A many-to-many relationship exists between observationsin these data sets for values AAA and CCC of Vehicle.
Many-to-Many Relationship
Access Methods: Sequential versus Direct |
Overview
Once you have established data relationships, the next step is to determinethe best mode of data access to relate the data. You can access observationssequentially in the order in which they appear in the physical file. Or youcan access them directly, that is, you can go straight to an observation ina SAS data set without having to process each observation that precedes it.
Sequential Access
The simplest and perhaps most common way to process data with a DATAstep is to read observations in a data set sequentially. You can read observationssequentially using the SET, MERGE, UPDATE, or MODIFY statements. You can alsouse the SAS File I/O functions, such as OPEN, FETCH, and FETCHOBS.
Direct Access
- by an observation number
- by the value of one or more variables through a simple or compositeindex.
To access observations directly by their observation number, use thePOINT= option with the SET or MODIFY statement. The POINT= option names avariable whose current value determines which observation a SET or MODIFYstatement reads.
To access observations directly based on the values of one or more specifiedvariables, you must first create an index for the variables and then readthe data set using the KEY= statement option with the SET or MODIFY statement.An index is a separate structure that contains the data valuesof the key variable or variables, paired with a location identifier for theobservations containing the value.
Note: You can also use the SAS FileI/O functions such as CUROBS, NOTE, POINT and FETCHOBS to access observationsby observation number.
Overview of Methods for Combining SAS Data Sets |
- concatenating
- interleaving
- one-to-onereading
- one-to-one merging
- matchmerging
- updating.
Concatenating
Concatenating Two Data Sets
Interleaving
Interleaving Two Data Sets
One-to-One Reading and One-to-One Merging
One-to-One Reading and One-to-One Merging
Match-Merging
Match-Merging Two Data Sets
Updating
UPDATE replaces an existing file with a new file, allowing you to add,delete, or rename columns. MODIFY performs an update in place by rewritingonly those records that have changed, or by appending new records to the endof the file.
Note that by default, UPDATE and MODIFY do not replace nonmissing valuesin a master data set with missing values from a transaction data set.
Updating a Master Data Set
Overview of Tools for Combining SAS Data Sets |
Using Statements and Procedures
Access Method | |||||
---|---|---|---|---|---|
Statement or Procedure | Action Performed | Sequential | Direct | Can Use withBYstatement | Comments |
BY | controls the operation of a SET, MERGE, UPDATE, or MODIFY statementin the DATA step and sets up special grouping variables. | NA | NA | NA | BY-group processing is a means of processing observations that havethe same values of one or more variables. |
MERGE | reads observations from two or more SAS data sets and joins them intoa single observation. | X | X | When using MERGE with BY, the data must be sorted or indexed on theBY variable. | |
MODIFY | processes observations in a SAS data set in place. (Contrast with UPDATE.) | X | X | X | Sorted or indexed data are not required for use with BY, but are recommendedfor performance. |
SET | reads an observation from one or more SAS data sets. | X | X | X | Use KEY= or POINT= statement options for directly accessing data. |
UPDATE | applies transactions to observations in a master SAS data set. UPDATEdoes not update observations in place; it produces an updated copy of thecurrent data set. | X | X | Both the master and transaction data sets must be sorted by or indexedon the BY variable. | |
PROC APPEND | adds the observations from one SAS data set to the end of another SASdata set. | X | |||
PROC SQL | reads an observation from one or more SAS data sets; reads observationsfrom up to 32 SAS data sets and joins them into single observations; manipulatesobservations in a SAS data set in place; easily produces a Cartesian product. | X | X | X | All three access methods are available in PROC SQL, but the access methodis chosen by the internal optimizer. |
PROC SQL is the SAS implementationof Structured Query Language. In addition to expected SQL capabilities, PROCSQL includes additional capabilities specific to SAS, such as the use of formatsand SAS macro language.
Using Error Checking
You can use the _IORC_ automatic variable and the SYSRC autocall macroto perform error checking in a DATA step. Use these tools with the MODIFYstatement or with the SET statement and the KEY= option. For more informationabout these tools, see Error Checking When Using Indexes to Randomly Access or Update Data.
Sas One To Many Match Merge
How to Prepare Your Data Sets |
- Know the structure and the contents of the data sets.
- Look at sources of commonproblems.
- Ensure that observations are in the correct order, or that theycan be retreived in the correct order (for example, by using an index).
- Test your program.
Knowing the Structure and Contents of the Data Sets
To help determine how your data are related, look at the structure ofthe data sets. To see the data set structure, execute the DATASETS procedure,the CONTENTS procedure, or access the SAS Explorer window in your windowingenvironment to display the descriptor information. Descriptor informationincludes the number of observations in each data set, the name and attributesof each variable, and which variables are included in indexes. To print asample of the observations, use the PRINT procedure or the REPORT procedure.
You can also use functions such as VTYPE, VLENGTH, and VLENGTHX to showspecific descriptor information. For a short description of these functions,see the Variable Information functions in Functions and CALL Routines. For complete information about these functions, see 'Functions andCALL Routines' in SAS Language Reference: Dictionary.
Looking at Sources of Common Problems
- variables that have the same name but that represent differentdata
SAS includes only one variable of a given name in the new data set.If you are merging two data sets that have variables with the same names butdifferent data, the values from the last data set that was read are writtenover the values from other data sets.
To correct the error, you can rename variables before you combine thedata sets by using the RENAME= data set option in the SET, UPDATE, or MERGEstatement, or you can use the DATASETS procedure.
- common variables with the same data but differentattributes
The way SAS handles these differences depends on which attributes aredifferent:
- type attribute
If the type attribute is different, SAS stops processing the DATA stepand issues an error message stating that the variables are incompatible.
To correct this error, you must use a DATA step to re-create the variables.The SAS statements you use depend on the nature of the variable.
- length attribute
If the length attribute isdifferent, SAS takes the length from thefirst data set that contains the variable. In the following example, all datasets that are listed in the MERGE statement contain the variable Mileage.In QUARTER1, the length of the variable Mileage is four bytes; in QUARTER2,it is eight bytes and in QUARTER3 and QUARTER4, it is six bytes. In the outputdata set YEARLY, the length of the variable Mileage is four bytes, which isthe length derived from QUARTER1.
To override the default and set the length yourself, specify the appropriatelength in a LENGTH statement that precedes the SET, MERGE, MODIFY,or UPDATE statement.
- label, format, and informat attributes
If any of these attributes are different, SAS takes theattribute fromthe first data set that contains the variable with that attribute. However,any label, format, or informat that you explicitly specify overrides a default.If all data sets contain explicitly specified attributes, the one specifiedin the first data set overrides the others. To ensure that the new outputdata set has the attributes you prefer, use an ATTRIB statement.
You can also use the SAS File I/O functions such as VLABEL, VLABELX,and other Variable Information functions to access this information. Fora short description of these functions, see the Variable Information functionsin Functions and CALL Routines by Category. For complete information about these functions, see 'Functions and CALLRoutines' in SAS Language Reference: Dictionary.
- type attribute
Ensuring Correct Order
Merge Sas Datasets One To Many
If you use BY-group processing withthe UPDATE, SET, and MERGE statementsto combine data sets, ensure that the observations in the data sets are sortedin the order of the variables that are listed in the BY statement, or thatthe data sets have an appropriate index. If you use BY-group processing ina MODIFY statement, your data does not need to be sorted, but sorting thedata improves efficiency. The BY variable or variables must be common to bothdata sets, and they must have the same attributes. For more information, see BY-Group Processing in the DATA Step.
Testing Your Program
One To Many Match Sas
Chapter Contents | Previous | Next | Top of Page |