Set Target Tab

This view lets you choose what type of formula to search for and how to search for it.


Search relation

Edit the formula to specify the type of relationship you want to model. For example, if you want to model the variable z as a function of x and y, enter z=f(x,y). If you want to ignore y, you could enter just z=f(x). More complex expressions such as z=f(x*x+y*y), z=f(x)+f(y), and z=f(x)*g(y) are also possible. The Target Expression Examples page provides many additional examples, including the modeling of differential equations, polynomial equations, and binary classification. This flexibility in specifying the form of the target solution gives you a lot of power to search for complex relationships.

Building blocks

Addition, subtraction, modulus, floor, factorial, Gaussian, If-Then-Else—these are a few of the 46 (and counting) building blocks that Formulize will be happy to combine in a few trillion ways as it seeks out good solutions. Check the boxes next to building blocks you want included in the mix. (See the Building Blocks List.)

Which building blocks should you choose? Expert knowledge will help you here. Which of the building blocks tend to show up in your field? Which ones are found in solutions to problems related to yours? Which ones are suggested by graphs of your data? Which ones just seem like good candidates based on your intuition (expert or otherwise)?

The trade-off to keep in mind is this: Limiting the number of building blocks will speed up your search and may increase the likelihood that Formulize will find an exact solution; on the other hand, disabling too many building blocks could preclude the discovery of an exact solution if a necessary operation is disabled. So choose carefully.

Error metric

This drop-down box allows you to choose how potential solutions are assessed. The default setting, where absolute error is minimized, works well in most cases, but you can also choose to minimize squared error, worst-case error, logarithm error, median error, interquartile absolute error, or signed difference. Additionally, you can choose to maximize the correlation coefficient or the R-squared goodness of fit, or you can try our experimental hybrid that considers both absolute error and correlation. The Error Metrics page gives details.

You can also create an error metric of your own. Find details in this blog post: "Custom Error Metrics and Special Search Relations"

Row weight

You can designate one of your variables as an indicator of how much relative weight (i.e., importance) you want Formulize to give to the data in each row. For example, if the designated row-weight variable has a value of 10 in the first row and 20 in the second row, data in the second row will be given twice the weight of the data in the first row. It's also possible to set row weight by entering an expression. For detailed instructions and usage scenarios see the Row Weight page.

Data splitting

Formulize divides your data into a training set, which it uses to generate solutions, and a validation set, which it uses to check the accuracy of those solutions. This drop-down box gives you the option of "finding a global model", in which case training rows and validation rows are distributed randomly, or "predicting future values", in which case training rows come first and validation rows follow. A third option, "custom mode", allows you to choose between these two modes of distribution (by checking or not checking "shuffle") and also allows you to specify the percentage of rows to be used in each set. (Note that training and validation sets can overlap. In fact, in certain situations—for example, when the data set is small or has very little or no noise—you may want both sets to include 100% of the data.

Base and prior solutions

You can start Formulize off on the right track by entering equations that give partial solutions or that express relationships you believe will play some role in an eventual solution. The Prior Solutions page gives details.