Symbolic regression

From Wiki @ Karl Jones dot com
Revision as of 08:29, 9 September 2016 by Karl Jones (Talk | contribs) (Created page with "'''Symbolic regression''' is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in te...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Symbolic regression is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.

Description

No particular model is provided as a starting point to the algorithm. Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. (Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique.) New equations are then formed by recombining previous equations, using genetic programming.

By not requiring a specific model to be specified, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge.

It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective.

The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures, thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system.

See also

External links