IT Related: Deep Dive into LLMs

based on https://www.youtube.com/watch?v=7xTGNNLPyMI&t=227s

1. Step is pretraining:

Ref: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1

2. Tokenization:

https://tiktokenizer.vercel.app/

cl100k_base is used by chatgpt

3. Neural network training.

you get a string of tokens that is 15 trilion in lenght. The training process is taking a random (zero to 8000) tokens

Neural network returns the probablity of what token comes next. Initially (when network is not trained, those probabilities are random).

Neural network internals

https://bbycroft.net/llm

what is important is that there are number of parameters (hundrs of thousands) that developer of a model must adjust and to transform inputs into outputs. We need to find a good settings of those parameters to that preditions match up with the patters seen in training.

4. Inference. TO generate new data from the model. Goal: to see what petterns it has internalized in the paramters of its network.

5, Trainig of model.

Example of GPT-2 (Generatively Pre-trained transformer).

https://github.com/karpathy/llm.c/discussions/677

Try training your mode: https://lambdalabs.com/

terraform provider: https://registry.terraform.io/providers/elct9620/lambdalabs/latest/docs

Base model is the one that comes out from first iteration of training. This is an internet text token simulator. This is NOT useful yet. We need an assistant. Those models are not being released by companies that train models routinely. However some have been relased.

Example: https://openai.com/index/gpt-2-1-5b-release/

what does it look like to release:

1. a python code (usually) that describes the sequence of operations in details that they make in the model: https://github.com/openai/gpt-2/blob/master/src/model.py

2. A set of parameters, this is where the values comes. for GPT-2 there was 1.6 billion paramters.

Here you can play with base models: https://app.hyperbolic.xyz/models/llama31-405b-base

when testing differnet string like 'what is 2+2?" you get different ansers, because the model is stochastic.

Model are good with memorization. if you paste a sentence from Wikipedia, it will complete it filling data from Wikipedia article. This not desirable. this is called regurgutation.

hallucination - a process where model tries to predict data, but it was not really trained and do not have this information.

in-context learning ability is a process of building structured prompts that model can learn from (for example providing a key-value pair to a model and it will continue this way).

IT Related

Monday, March 17, 2025

Deep Dive into LLMs

No comments:

Post a Comment