Descriptive Statistics

The Descriptive Statistics command allows you to compute one or more of the following statistics for values in a numeric field or fields in the current database:

Count
Sum
Mean
Standard Deviation
Geometric Mean
Minimum
1st Quartile
Median
3rd Quartile
Maximum

Select the desired statistics and the fields to which the statistics will be applied, for instance, the Mean of the Salary field. Click the Run button to see the results; or use some of the other options to fine-tune the statistics even further (such as grouping them into age groups).

Other options include:

Grouping Fields

If you wish to group results (such as by status or by age and service), click the Grouping Fields button. The first dialog box displayed lets you select the fields used to define the groups. When coded fields (such as Sex or Status) are used for grouping, each distinct value defines a group. If you select a field (such as Salary or Date of Termination) without discrete values, use break point values to transform the field values into intervals, such as less than 100, 100 to 150, 150 to 200, and 200 and over. To enter break points, click the Breakpoint button and enter a list of numbers or dates separated by spaces.

For example, if you enter the break points

100 150 200

the field (call it “x”) will be divided into the following intervals:

x < 100

100 <= x < 150

150 <= x < 200

200 <= x .

The break point values specify the inclusive lower bound for each interval. A shorthand notation is available for entering break points. The following expression is equivalent to “100 150 200”:

From 100 to 200, step 50

Break points are optional; if you omit them, the raw numeric or date values will be counted. This might make sense, for example, if there were only a few different salaries or termination dates.

Table Layout

Click the Table Layout button to arrange the way output will be displayed. When this button is clicked, a dialog box displays several spreadsheets, the first of which controls table arrangement.

Following is a sample spreadsheet:

Grouping Field	Dimension	Totals?
<Statistics>	Rows2
<Data Fields>	Table
Age	Rows	Yes
Service	Columns	Yes

The grouping field column is for reference and cannot be changed, but the other columns can be modified as follows:

Dimension: This column controls where each Statistic, Data Field, and Grouping Field (Age and Service in the example above) is placed. In the example above, one table will be displayed for each Data Field (for example, Salary). Within each table, Age groups will be displayed in Rows and Service groups in Columns. Within each Age group, Statistics (such as Count and Mean) will be displayed in Rows2.

Note: This is the method used to create an age/service scatter matrix.

Totals?: This column lets you specify whether statistics should be generated in total or just for groups. Fields with “º” cannot be changed; totals do not apply to Statistics or Data Fields.

Two additional spreadsheets allow you to specify the sequence in which Statistics and Data Fields are displayed. The spreadsheet for reordering Statistics is illustrated below:

Order	Statistic
1	Count
2	Mean

To change the order, move the cursor to the desired item and enter a new sequence number. For example, if you wanted the Statistic Mean to be listed first, you would move to the second row under Seq and enter a 1. The row for Mean will move the top of the list. This process can be repeated until the items are in the desired order.

Historical Data

Click the Hist. Data button to include fields from other databases, such as prior year data, as if they were fields in the current database. These fields can be used as part of a selection expression (e.g., "salary / py.salary > 1.05"), as grouping fields, or as data fields to count, sum, etc. The syntax is “prefix.field”, as in “py.salary” where “py” is the prefix associated with a database and “salary” is a field in that database. To use, check Display fields from historical databases and specify:

Any number of databases along with prefixes
Key field(s) to match records between the current database and other databases. If a key field is of character type, check Match case in keys if upper and lower case letters should be considered distinct. Alt key lets you optionally specify a secondary key field (or fields). If a matching record cannot be found based on the primary key field, a secondary check is performed using the alternate key. This can be helpful in situations where employee ID numbers are corrected from one year to the next. For example, suppose the primary key field is EEID and the alternate key field is OLDEEID. If no match is found in EEID, ProVal will look for a match between the alternate key field (OLDEEID) in the current year data and the primary key field (EEID) in the prior year data.

Selection Expression

If you want to list only a subset of the records in the database, enter a Selection Expression in the appropriate field of the main Print Data dialog box. To recall an expression you previously saved in the library, click the Selection Library button. Note that some of the expressions in the library may not be usable because they refer to fields that are not defined in the current database. An error message will be displayed if you attempt to recall such an expression.

Writing Output to a File

When you are viewing the output, the File button can be used to write the output to an ASCII file. Clicking this button opens a dialog box that lets you enter the name of the file and specify a format for the output. In the file name field, enter a full DOS-style drive, path, name, and extension for the output file, as in C:\DATA\FILE1.TXT. The Clip button at the bottom of the dialog box can be used to insert the name of the clipboard file into the file name field. Under Format, the following three options are available:

1. As printed, with line drawing characters

2. As printed, but using only ASCII symbols

3. Comma-delimited values

The first two choices produce output that is similar to the listing displayed on the screen. The first choice writes to the file the same IBM PC line-drawing characters that are used to draw boxes and lines on the screen. The second choice is similar to the first but uses only standard ASCII symbols (| - and +) instead of line-drawing characters. This choice should be used if your printer or text editor will not accept line-drawing characters. The third choice is quite different; it is used when you will be importing the data into another software package that reads files in comma-delimited format. If you choose the third choice, a comma will be placed between each value, and character values will be enclosed in double-quotes. Each line of the output will look something like this:

"35 <= x < 40 Count","",6.00,2.00,"",2.00,"","","","",10.00

Name may be any descriptive phrase, including spaces, under which to save your Descriptive Statistics settings.