Home > Databases > Subset Selection

Subset Selection

The Subset Selection command allows you to select a subset of the current database. The subset can be used to “sample” a large database, a process that is useful in various statistical procedures. You can select n records at random or select every nth record starting with the first. The dialog box displayed by this command lets you choose the selection method and specify the value for n.

Randomly select n records will set the operator #SUBSET to 1 for n randomly chosen records from the current database.

Select every nth record will the operator #SUBSET to 1 for every nth record in the current database, starting with the first one. Records are selected in order of RecID.

n specifies how many (if records select randomly) or what interval (if every nth record selected).

The output from this command is a variable named #SUBSET, which has ones marking the selected records and zeros marking unselected records. You can use #SUBSET in a selection expression to work with the subset of the database. For example, to display the selected records, use the Print Data command and enter “#SUBSET” in the selection expression field. The #SUBSET variable can be used in a more complex selection expression, if you wish. For example, the expression:

#SUBSET #AND (Sex = 1)

selects records of the subset for Male employees (Sex=1).

The value established for #SUBSET remains in effect until you use the Subset Selection command again or exit ProVal.

If you want to preserve the value of #SUBSET for use in a later ProVal session, you can do this using the Database | Edit Data | Define Field by Expression command. Define a variable by specifying #SUBSET as the expression. The resulting variable will have the same value as #SUBSET and will be saved in the database.