What is AI?

Superwised learnings

  • email --> is_spam (spam filtering)

  • audio --> text transcription (speech recognition)

  • english --> french (machine translation)

  • visual inspection --> defect detection (quality control)

  • sequence of words --> next word prediction (chatbot)

LLM's

  • repeated predicts the next word

  • input, output.

  • my favorite drink, is.

  • my favorite drink is coffee.

  • my favorite drink is coffee, and I like it because it is.

  • output is sanitised, to be not offensive, or harmful.

  • Neural networks have improved the quality of LLMs, as you train them on more data, they get better.

What is data? (dataset)

No title

  • Data is unique to your business.

  • What is input and what is output?

  • size_of_house + no_of_bedrooms + location --> price_of_house

How do acquire data?

  • Manual labeling

  • Observing user behaviors user_id, time, price, has_purchased?

  • Observing machine behaviors machine_id, time, temperature, pressure, has_failed? Can tell if the machine is going to fail, based on the data.

  • Download it from websites / partnerships, (keep in mind licensing and copyright) Kaggle, UCI Machine Learning Repository, etc.

  • All data is not created equal, some data is more valuable than others.

  • More data doesn't always mean better results, quality of data is more important than quantity.

  • Data can have missing values, outliers, and noise. ( we need to clean up data before processing it )

  • Some time of data like images, audio, text, are unstructured data, which means they don't have a predefined format.

  • Techniques to deal with unstructured data is different from structured data.

The terminology of AI

ML vs data science

  • ML algos that uses data-sets, to get output from input. size_of_house + no_of_bedrooms + location --> price_of_house Field of study that gives computers the ability to learn without being explicitly programmed. Results in a software model that can execute to generate outputs for a certain input.

  • Data science team analyses data sets to find insights. "Did you know newly renovated houses sell for 20% more than non-renovated houses?" Science of extracting knowledge and insights from data. Side decks / Pitch decks for investors, to show them the insights.

Deep learning

  • Artificial neural networks.

  • Nodes and connections, similar to how the human brain works.

  • Its completely unrelated to how actual human brain works.

  • Takes input and outputs a prediction.

[[ Deep learning ], Machine learning, other tools], AI, other tools like knowledge graphs, rule-based systems, etc.]

What makes an AI company?

What makes a good internet company?

Shopping mall + Website != Internet company

  • A/B testing

  • Short iteration cycles

  • Data-driven decision making (engineers and product managers make decisions based on data, not executive opinions)

What makes a good AI company?

  • Strategic data acquisition

  • Unified data platform

  • Using AI to automate repetitive tasks

  • New roles and responsibilities (ML engineers, data scientists, etc.)

What ML can and cannot do?

Input (A)
Output (B)
Application

email

spam? (0/1)

spam filtering

audio

text transcripts

speech recognition

English

Chinese

machine translation

ad, user info

click? (0/1)

online advertising

image, radar info

position of other cars

Self-driving car

image of phone

defect? (0/1)

visual inspection

sequence of words

the next word

chatbot

Imperfect rule of thumb: If a human can do it less than one sec of thought, then ML can do it too.

Feasible

  • Learning simple concepts

  • Having lots of data

eg:

  • Self driving cars, can guess where the other cars it saw are, based on their previous positions and what in front of them.

Not feasible

  • Complex concepts

  • With less sample data

eg:

  • If AI is trained on a certain type of data, it will not work on other types of data. (eg, if an AI is trained on lateral chest xrays to detect pneumonia, it will not work on frontal chest x-rays, or x-rays not aligned properly)

  • Human gestures interpretation, like roadworker stopping a car, is not feasible for AI to learn, as it requires complex understanding of human behavior and context, with large amount of data. (Even we struggle to understand it sometimes)

Non technical explanation of deep learning

mermaid
flowchart LR
    price[Price] --> n1((N1))
    shipping_cost[Shipping Cost] --> n1
    n1 --> |affordability| d1((Demand))

    marketing[Marketing] --> n2((N2))
    material[Material] --> n3((N3))

    n2 --> | awareness | d1
    n3 --> | perceived quality | d1
    marketing --> n3
    price --> n3
  • The above neural network is a simple example of how deep learning works. It takes multiple inputs (price, shipping cost, marketing, material) and processes them through nodes (N1, N2, N3) to produce an output (demand).

  • It figures out the relationships on its self.

  • Feed lots of input for (price, shipping cost, marketing, material, demand) and it will figure out the relationships on its own.

  • For eg in a facedetection model, it will figure out the relationships between pixels, edges, and shapes to identify parts of the face, like eyes, nose, mouth, etc. And then combine them to identify the face as a whole.

Staring point of an AI project

What is the workflow of a machine learning project?

Key steps of a machine learning project

Alexa

  1. Collect data of people saying (Alexa or other trigger words)

  2. Train the model to recognize the trigger words (Iterate many times until its good enough)

  3. Deploy model (Get data back, maintain and update model) (may or may not be possible, depending on the privacy and security policies of the company)

Self-driving car

  1. Collect data of car positions (Red squares on pictures)

  2. Train the model to predict the position of other cars

  3. Deploy model

What is the workflow of a data science project?

Optimizing a sales funnel

  1. Collect data of user behavior on the website (visits, clicks, time spent, etc.)

  2. Analyze the data to find insights "Overseas users leave when they find high shipping costs" "Spend fewer marketing dollars on overseas users"

  3. Suggest hypotheses to improve the sales funnel "Reduce shipping costs for overseas users" Re-analyze the data to see if the changes had an impact

Optimizing the manufacturing line

Mix clay, shape mug, add glaze, fire kiln, final inspection

  1. Collect data of the manufacturing process (temperature, pressure, time, etc.)

  2. Analyze the data to find insights "High temperature leads to more defects" "Reduce temperature to reduce defects" "Because ambient temperature is warmer in the afternoon, we need to reduce the temperature in the afternoon"

  3. Suggest hypotheses to improve the manufacturing process, or yield "Reduce temperature to reduce defects" Re-analyze the data to see if the changes had an impact

Every job function needs to learn how to use data

  1. Data science can optimise the sales funnel, machine learning, can automate lead sorting.

  2. Data science can help optimize the manufacturing process, machine learning can automate quality control.

  3. Data science can help optimize the recruiting funnel, machine learning can automate resume screening. (Your system should be ethical and not biased)

  4. Data science can help optimise the user-experience of a website, machine learning can automate content recommendations, can suggest push notifications, etc.

  5. Data science can help suggest what to plant when and where, machine learning can detect where weeds are and help automate weeding.

How to choose an AI project?

No title

  1. What AI can do (AI experts)

  2. Valuable for your business (domain experts)

  3. Select a project that's overlapping

Framework for brainstorming AI projects

  1. Thinking about automating tasks rather than automating jobs.

  2. What are the main drivers of business value?

  3. What are the main pain points in your business?

You can make progress even without big data

  1. Having more data almost never hurts.

  2. Data makes some businesses defensible. (difficult for new players to come in)

  3. Even with small datasets, you can still make progress.

Due diligence before starting an AI project

  1. What can AI do?, Whats? valuable for your business?

  2. Should overlap b/w the above steps.

Technical diligence

  1. Can a AI system meet desired performance? (eg, 95% accuracy)

  2. How much data is needed to achieve that performance?

  3. Engineering timeline

Business diligence

  1. Lowering costs

  2. Increases revenue

  3. Launch a new product or service

Ethical diligence

  1. Does it make the society better?

Build vs Buy?

  1. ML projects can be in-house or outsourced.

  2. DS projects are more commonly in-house. (Its so closely tied to your business, it makes sense to keep it in-house)

  3. Some things will be industry standard, don't reinvent the wheel. (Don't try to outrun a train)

Working with an AI team

  1. Specify an acceptance criteria

    • Goal: detect defects in coffee mugs, in 95% of cases. (statistically, avg)

    • Provide AI team with a dataset to measure performance. (test set, doesn't have to be tool large, 1000-2000 samples is enough)

  2. Data

    • Training set (ok, defect, used to train the model, and create A -> B mapping)

    • Test set (will not be used to train the model, used to measure performance of the model, should be representative of the real world data)

Don't expect 100% accuracy

  • Limitation of ML

  • Insufficient data

  • Mislabeled labels

  • Ambiguous labels

AI tools

ML Frameworks

  • PyTorch, TensorFlow, Hugging Farce, PaddlePaddle, Scikit-learn, R.

Reasearch Publications

  • Arxiv

Open source projects

  • Github

Building AI in your company

Smart speaker example

  • "Hey device, tell me a joke"

  • Trigger word detection, (input audio, output has_trigger_word_spoken)

  • Speech recognition, (input audio, output text)

  • Intent recognition, (input text, output intent (joke? time? music? call? weather?))

  • Execute action, (If its a joke, then get a joke from the database, and return it as text)

AI Pipeline

mermaid
flowchart LR
    trigger_word_detection[Trigger word detection] --> speech_recognition[Speech recognition] --> intent_recognition[Intent recognition] --> execute_action[Execute action]
  • "Hey device, set timer for 10 minutes"

  • Trigger word detection, (input audio, output has_trigger_word_spoken)

  • Speech recognition, (input audio, output text)

  • Intent recognition, (input text, output intent (set_timer?, timer_duration?))

  • Execute action, (If its a set_timer, then set the timer for timer_duration minutes)

Self driving cars

  • Image / Radar / Lidar

  • Object detection, (car detection, pedestrian detection, traffic sign detection, etc.)

  • Lane detection, (detect the lanes on the road)

    • Outputs the position of the lanes.

  • Trajectory prediction, (predict where the detected objects will be in the future)

    • Outputs the predicted position and speed of the detected objects.

  • Motion planning, (how to move the car, based on the detected objects, without collisions)

    • Outputs the path and speed of the car.

    • Path should avoid obstacles, and follow traffic rules.

  • Steer / Accelerate / Brake

mermaid
flowchart LR
    image_radar_lidar[Sensor data]
    subgraph object_detection
        lane_detection[Lane detection]
        car_detection[Car detection]
        pedestrian_detection[Pedestrian detection]
        traffic_light_detection[Traffic light detection]
        obstacle_detection[Obstacle detection]
    end
    trajectory_prediction[Trajectory prediction]
    image_radar_lidar --> object_detection --> trajectory_prediction --> motion_planning --> steer_accelerate_brake[Steer / Accelerate / Brake]

Roles and responsibilities in an AI team

  • Software Engineer

    • E.g., joke execution, timer execution, etc.

  • ML Engineer

    • Data gathering

    • Train a neural network

    • Test output

  • ML Researcher

    • Research new algorithms

    • Improve existing algorithms

  • Applied Learning Scientist

    • Somewhere b/w ML engineer and ML researcher

  • Data Scientist

    • Examine data and provide insights

    • Create dashboards, reports and presentations to team/executives.

  • Data Engineer

    • Organize data

    • Make sure data is saved securely, easily accessible and cost effectively.

  • AI product manager

    • Help decide what to build; whats feasible and valuable

You can start with a small team. You don't need a large team to start an AI project. Just you with a AI course and a dataset is enough to start.

AI transformation playbook

  1. Execute small pilot projects to gain momentum

    • Can be in house or outsourced.

    • Show traction within 6/12 months.

    • Its more important for the first project to be successful, than to be big.

  2. Build an in-house AI team

    • CEO, CAIO (Chief AI Officer)

      • AI team (central AI team)

      • Business unit 1

      • Business unit 2

      • Business unit 3 (gift card)

    • The central AI team, will be more like a consultancy, that helps the business units to implement AI in their projects.

    • They can help build company wise data infrastructures / platforms.

    • Better for AI team to have separate funding, rather than relying on the business units for funding.

  3. Provide broad AI training

    • Executives and business leaders (What AI can do your enterprise, AI strategy, Resource allocation)

    • Pod leaders (Project direction, resource allocation, monitoring progress)

    • AI engineers (100hrs of training, Build and ship AI software, gather data, execute on specific AI projects)

  4. Develop an AI strategy

    • Leverage AI to create an advantage specific to your industry sector.

    • Virtuous cycle of AI ( Better product --> More users --> More data --> Better product )

    • Consider creating a data strategy

      • Strategic data acquisition (Offer free services to collect data, Gmail)

      • Unified data warehouses (Collect data from all business units, and store it in a central place)

    • Create network effects and platform advantages

      • In industries where "winner takes all" is common, like social media, search engines, etc, AI can help accelerate the network effects.

  5. Develop internal and external communications

    • Investor relations

    • Government relations (regulations, compliance, etc.)

    • Customer / user education

    • Talent / recruitment

    • Internal communications

AI pitfalls to avoid

  1. Don't expect AI to solve all your problems, Be realistic about what AI can do.

  2. Don't just hire 3/4 ML engineers and expect them to solve all your problems, You need a team with diverse skill sets, including data scientists, data engineers, software engineers, etc.

  3. Don't expect AI projects to be successful in the first try, AI projects are iterative, you need to be prepared to fail and learn from your mistakes.

  4. Traditional project planning doesn't work for AI projects, Work with your AI team to define the scope, timeline, and acceptance criteria for the project. AI KPIs are different from traditional software KPIs, you need to define them based on the AI project.

  5. You don't need a superstar AI engineer to start an AI project, You can start with a small team, with online training.

Taking your first step in AI

  1. Get friends to learn about AI

  2. Start brainstorming projects

  3. Hire a few ML/DS people to help

  4. Hire or appoint an AI leader

  5. Discuss with CEO/Board possibilities of AI transformation

Survey of major AI application areas

Supervised learning

  1. Computer vision

    • Image classification (whole image is names) / Object recognition (parts of the image are named)

    • Facial recognition

    • Object detection, (finds position of objects in an image, and classifies them, draws a box around the object)

    • Image segmentation, (is this pixel part of a face? or a car? or a tree?, draws precise boundaries of objects in an image)

    • Tracking (follows objects in a video, like a car, or a person)

  2. Natural language processing (NLP)

    • Text classification (spam detection, sentiment analysis, etc.)

    • Information retrieval (Search engines, question answering)

    • Name entity recognition (NER) (extracts names, dates, locations, etc. from text)

    • Machine translation (translates text from one language to another)

  3. Speech processing

    • Microphone records very rapid air-pressure changes in the air

    • Takes as input audio, and outputs text

    • Trigger word detection (detects if a specific word is spoken, like "Alexa", "Hey Google", etc.)

    • Speaker ID, listens to someone speak and identifies who it is

    • Speech synthesis (text to speech, converts text to audio)

  4. Generative AI

    • Creates high quality content, like images, text, audio, etc.

    • Input prompt, output content

    • Can create images, videos, text, audio, music, etc.

  5. Robotics

    • Perception (figures out what is in the environment, based on sensor input data)

    • Motion planning (figures out how to move the robot, based on the perception data)

    • Control (executes the motion plan, and moves the robot)

  6. General Machine learning

    • Unstructured data (images, audio, text, etc.)

    • Structured data (tabular data, like excel sheets, databases, etc.)

Unsupervised learning

  1. Clustering

    • Price per packet vs No of packets sold

    • Detects purchase patterns in retail data

    • Groups similar items together, like customers, products, etc.

    • College kids purchase more energy drinks, and less coffee

    • Data is embedded in a high dimensional space, like price, quantity, location, etc.

    • Relationships between data points are constructed automatically, without any labels.

    • Can come up with new insights, like "customers who buy energy drinks also buy chips", or "customers who buy coffee also buy pastries".

  2. Transfer learning

    • A model that is trained to detect cars with 100,000 images can be used to detect golf carts with 100 golf cart images.

  3. Reinforcement learning

    • A drone leans to fly itself by trying different actions and getting feedback from the environment.

    • A pet dog learns to behave well by getting treats for good behavior and scolding for bad behavior.

    • Reinforcing good behavior and punishing bad behavior.

    • Uses a "reward signal" to tell when the AI is doing well or not.

    • Needs to re-iterate many times to learn the best actions. (We get a lot of data based on the training)

  4. Generative adversarial networks (GANs)

    • Synthetic data generation

    • AI super models generation

  5. Knowledge graph

    • A graph that represents knowledge in a structured way

    • Nodes represent entities, and edges represent relationships between entities

AI and Society

  1. AI and hype

    • We should neither be optimistic or pessimistic about AI.

    • AI is a very powerful tool, but it has its limitations. We can mitigate its potential harms and use it to create tremendous value.

  2. Limitations of AI

    • Explainablity is hard (AI needs to explain why it made a certain decision)

    • Bias, (If an AI is trained on biased data, it will produce biased results)

    • Susceptible to Adversarial attacks

  3. AI, developing economies and jobs

Bias

  • AI learning unhealthy stereotypes.

  • AI can be racist, sexist, and biased, from data.

  • This is because training data has more associations for men with programming than with women.

  • If a face recognition system is trained on a dataset that has more images of white faces than black faces, it will perform better on white faces than black faces.

  • Banks may suggest lower credit limits for black people than white people, even if they have the same credit score.

  • An resume screening AI may favor more men than women, if its training data is biased.

  • Reducing bias in AI systems is paramount.

Combating bias

  • Zero out the bias in the words (Lets say "White programmer" is associated with 0.8, and "Black programmer" is associated with 0.2, then we can zero out the bias by making both associations equal to 0.5, in the data space)

  • Use less biased data.

  • Use a more inclusive data. (Make sure most races are represented in the data)

  • Audit to figure out if the AI is biased.

  • Diverse workforce. Having more inclusive workforce, can help reduce bias in AI systems.

Adversarial attacks on AI

  • AI can be fooled to spit out sensitive information.

  • AI can classify a hummingbird as a hammer, by making minor perturbation (changes) to the image.

  • Physical attacks, like putting on a specific sticker on a stop sign, can make the AI think its a speed limit sign.

  • Putting on a certain type of glasses can make the AI think its a different person.

  • AI can be fooled to misclassify images, by adding noise to the image.

Defenses

  • Ongoing research.

  • Like a spam vs anti-spam, we may be in a arms race for some application.

  • AI generated video detector, to detect if a video is real or fake.

Adverse uses of AI

  • DeepFakes

  • Oppressive surveillance

  • Fake reviews / Fake comments (political bots)

  • Spam vs Anti Spam; Fraud vs Anti Fraud;

AI and developing economies

  • There will be less opportunities for low-skilled workers.

  • AI will automate away certain jobs, like data entry, customer support, etc.

AI and jobs

  • There is a lat of uncertainty about how AI will impact jobs.

  • AI will create more jobs than it will displace.