Data
In short
The raw information used to train models. No data, no AI. It’s the fuel behind everything.
Data is to a model what textbooks are to a student. The better the textbooks, the better the student learns. Give them wrong examples and they’ll be confused.
For Training — whether supervised or unsupervised — we always need data, and preferably good data (see Data Quality). The data needs to be split properly for training (see Data Splitting).
Anything that can be represented in a numerical format can be used as data for a model. And everything can be represented numerically — that’s the only thing computers understand (see Numerical Representation).
Because data can get heavy to manage, there’s Data Engineering to help with cleaning and preparing it, Data Analysis to draw conclusions from it, and Data Science to combine it with models for actual business value.
Related
- Data Quality - garbage in, garbage out
- Data Splitting - train / validation / test
- Data Engineering - managing and preparing data
- Training - data powers the training process