Row Weight

Formulize allows you to give more weight (i.e., importance) to some rows of data than to others. This comes into play in two different contexts: first, when smoothing your data (see the Prepare Data page); second, when running a formula search (see the Set Target page). This page will discuss row weighting within the context of a formula search, but the same principles and techniques can be applied when smoothing data.

Row weight can be specified either by using a row-weight variable or by using a row-weight expression.

Using a row-weight variable

Once a row-weight variable has been established, the rest of the data in each row will be weighted in proportion to the value of the row-weight variable in that row. To establish a row-weight variable, first enter the name and values of the variable in a column of the spreadsheet, then select that variable name in the "Row Weight" drop-down box on the Set Target tab.

The following are some common usage scenarios:

  • Suppose that for each data point you have a confidence value—perhaps you determined it while collecting the data or computed it in some other program. Create a variable (i.e., a column) containing those values, designate it as the row-weight variable, and Formulize will weight the data accordingly.
  • Suppose you want to give extra weight to a few important data points—the zero points of some variable, for example. You could fill a column with ones (by entering "=1") then manually change the value in the important rows to 10, or 100, or 451, or whatever value represents the relative weight you want to bestow on that row.
  • Suppose you want to balance your data by giving more weight to rare events than to common ones. More specifically, suppose you want to model credit-card fraud, and 99.99% of the data points are legitimate transactions while 0.01% are fraudulent. You could create a variable whose value is 1 in rows representing legitimate transactions and 9999 (i.e., 99.99% / 0.01%) in rows representing fraud, thereby creating equal pressure to model both legitimate and fraudulent cases.

Using a row-weight expression

Some row-weighting schemes can be more easily achieved with a row-weight expression than with a row-weight variable. It works like this: for each row, the row-weight expression is evaluated using the values in that row, and the row is weighted with the result. You can enter your own expression, but before we get to that let's look at the built-in options.

1/occurrences(yourVariable)

You'll find expressions of this form, one for each of your variables, in the "Row Weight" drop-down box. These expressions provide a quick way to balance data. To illustrate, let's imagine a toy data set containing just three values of one variable.

var x
1 99
2 99
3 86

The value returned by occurrences(x) is the number of times a given row's value of x occurs in the data set, so in this case it would return 2 in the first row, 2 in the second row, and 1 in the third row. Selecting 1/occurrences(x) as your row weight would therefore give the first row a weight of 1/2, the second row a weight of 1/2, and the third row a weight of 1.

Returning to the credit-card-fraud example, we could simplify things by creating variable z with a value of 0 (or whatever) in rows representing legitimate transactions and 1 (or whatever) in rows representing fraudulent ones. Selecting a row weight of 1/occurrences(z) would then create equal pressure to model legitimate and fraudulent transactions, and no one would have to do any nasty counting or calculation to determine row weights. And if new data were added, the weights would be automatically adjusted to maintain the balance.

Other expressions

Other expressions can be entered into the editable field of the "Row Weight" drop-down box. Any function on the Building Blocks List can be used as part of a row-weight expression. There's also a special variable available, <row>, which takes on the value of the row number. Here are some examples, assuming a data set containing variables x and y:

  • abs(x) gives row weights in proportion to the absolute value of x.
  • 1/abs(x-y) gives row weights in inverse proportion to the difference between x and y.
  • 1/<row> gives row 1 a weight of 1, row 2 a weight of 1/2, row 3 a weight of 1/3…
  • 0.5 + 0.5*(<row> <= 100) gives rows 1 through 100 a weight of 1 and remaining rows a weight of 0.5. (Note: <= returns 1 if satisfied, 0 if unsatisfied.)