July 5, 2019
An AI and Machine Learning Glossary
Artificial intelligence presents fantastic opportunities for many industries, and sophilabs is excited to be a part of this growing field in technology. For the average lay person, though, it can sometimes be a little tricky to keep up with the terminology. We've put together this short glossary to define some of the most commonly used terms in the field.
Metric that indicates the fraction of correct predictions out of the total number of predictions a machine learning model made. A model that is highly accurate turns out a low number of false negatives. 1
In machine learning, a mathematical model that allows a computer to make predictions or decisions based on the data it receives, rather than making predictions or decisions it was specifically programmed to make. 2
Artificial Intelligence (AI)
Technology that gives computers properties of human intelligence, such as the ability to reason, solve problems, make decisions, and learn from past experience. Some applications of artificial intelligence include robotics, autonomous vehicles, pattern recognition, and machine learning. 3
Very large sets of data, usually from multiple sources, that may present data management challenges due to the data's volume, inconsistent quality, the variety of types of data, and the high velocity at which the data is received. In machine learning, big data can be used to train more accurate models that can make better predictions and sounder decisions. 4
Also virtual assistant. Software that can have a conversation with a human user via text or voice by understanding the user's input, identifying the user's intent, and responding accordingly. Simple chatbots understand the user's input based on pre-programmed keywords, whereas smart chatbots use artificial intelligence to understand and adapt to the user's requests. 5
A subfield of artificial intelligence and machine learning that is concerned with helping computers recognize images and understand their content the way a human would. Some areas of computer vision are image classification, image captioning, and object localization, detection, and segmentation. 6
A field that aims to summarize, describe, and visualize data in order to understand it better and find ways to approach a problem. 7
Subfield in data analysis that involves creating graphs, charts, and other visuals in order to get a better understanding of data. 8
A collection of separate but related sets of information that a computer manipulates as a single unit. 9
Deep learning refers to a machine learning model that has multiple layers of neural networks. These multiple layers allow the model to deal with high levels of complexity and abstraction. For example, a trained deep learning model is able to discern and classify unlabelled and unstructured data, such a collection of millions of photos, videos, texts, and audio recordings. 10
Type of computer vision focused on the task of finding human faces in a photo.
Type of computer vision focused not only on detecting a human face, but identifying whose face it is.
A false negative occurs when a machine learning model fails to predict or identify something it is supposed to find. For example, in a machine learning model that is meant to detect cancer, a true negative result would occur when the model states there is no cancer, and the patient is in fact cancer-free. A false negative, however, would state that there is no cancer when cancer is actually present. False negatives and false positives can be used to measure how well a machine learning algorithm performs.
A false positive occurs when a machine learning model predicts something that doesn't happen or identifies something that is not actually there. In a cancer-detecting machine learning model, for example, a true positive result states that a patient has cancer when the patient does indeed have cancer. A false positive states that a patient has cancer when they are in fact cancer-free. False positives, along with false negatives, can be used to measure a machine learning algorithm's performance.
Subfield of computer vision that enables a computer to identify people, objects, buildings, or other variables in a photo.
Machine Learning (ML)
A field in artificial intelligence that uses algorithms to enable a computer to make decisions or predictions based on the data it receives, thus allowing a computer to "learn" from data rather than follow programmed instructions.
Named-Entity Recognition (NER)
A data extraction task that finds and categorizes named-entities in a text. Named-entities may include names of individuals, organizations, locations, and products, or numerical expressions of time and monetary value. NER is an application of natural language processing (NLP). 11
Natural Language Processing (NLP)
Field in artificial intelligence that enables computers to read, understand, and manipulate human language. NLP allows computers to understand both written text and speech. Applications of NLP include email filtering systems, speech-to-text conversion, voice commands, automatic translations, named-entity recognition (NER), and sentiment analysis. 12
Set of algorithms in a machine learning model. Some capabilities of neural networks include classifying and labeling data (e.g., detecting and recognizing faces and voices); finding similarities and anomalies (e.g., comparing documents or detecting fraud); and making predictions (e.g., about human health, consumer trends, or a variety of other topics). 13
Also node. The basic unit in a neural network. A neuron receives an input, does some computation with it, produces an output, and determines whether the signal should travel further within the network. 14
Also mathematical optimization or mathematical programming. Branch of applied mathematics in which the goal is to select the best option. Optimization is essential to many machine learning algorithms, which aim to approximate the optimal solution without calculating it. 15
In machine learning, overfitting occurs when the program models the training data "too well," meaning that anomalies in the training data are learned and acquired as concepts. This has a negative effect on the machine learning program's accuracy, and as a result it may identify or classify data incorrectly. 16
Metric that considers true positive predictions over the total number of positive predictions (true and false) that a machine learning model makes. A very precise machine learning model has a low number of false positives. 17
Programming language that is widely used in the field of machine learning. To read more about why Python is a great language for machine learning, check out our recent blog post.
In machine learning, a way to train an algorithm through experience by using rewards (when it performs well) and punishments (when it performs poorly). The algorithm learns by trying to maximize its rewards and minimize its punishments. 18
Also opinion mining. A natural language processing (NLP) problem that aims to understand attitudes and opinions expressed in written or spoken language and turn them into structured data. Sentiment analysis has practical applications in customer service, marketing, and public relations, and is often used to analyze product reviews and social media content. 19
Ability of a computer to understand human speech. Applications of speech recognition technology include voice commands and dictation via speech-to-text conversion. Not to be confused with voice recognition (see definition below).
The most common process by which a machine learning algorithm is trained. During this process, the algorithm is given training data, and its predictions are corrected by an answer key. In this way, the algorithm is able to learn and then adjust and improve its performance. 20
The data a computer receives that allows it to build a machine learning model. For example, to teach a computer to recognize a handwritten letter A, it has to be fed many examples of what a handwritten A looks like, as well as examples of handwritten letters that are not A.
Occurs when a machine learning model has poor performance and can't model the training data. The solution is usually to try different algorithms. 21
Training process for a machine learning algorithm in which data is not labelled previously, so there is no answer key. The goal of unsupervised learning is not to produce correct answers like in supervised learning, but rather, to discover ways to understand the data by finding associations or putting it into groups. 22
The ability of a computer to identify the individual speaker based on the pitch of their voice and patterns in their speech. Voice recognition can be applied to a variety of areas, including security and authentication systems and criminal investigations. 23