Workshop Description
This 3-day interactive workshop introduces the overarching principles guiding generative modeling and specifically Large-Scale Language Models (LLM), their application in Python for inference, and specific use-cases in Genomics. Experience with Python is necessary, and basic knowledge about ML workflows is preferred.
At the end of this workshop, you WILL be comfortable with loading, inferencing and experimenting with state-of-the-art LLMs in Python, and making small changes to suit your research interests in Genomics. You WILL NOT be exposed to the internal architecture of LLMs and training your own models.
Workshop Topics
Day 1
Day 1: Python Review and Introduction to Language Models
• Review of Python and ML fundamentals: Data structures fundamentals using Python on Google Colab, a review of NumPy, Scikit-Learn, and basic ML workflow (Classification vs. Regression, Training vs. Inference, Loss Functions, Cross-Validation, Train vs. Test splits).
• Transformers and LLMs (Part 1): A short theory section to introduce Transformers and its architecture, and its function as an ML model. Introduction to Tokenizers.
• Prompting and Conditional Generation: Designing a small playground in Python to use prompting to programmatically run LLMs to perform text generation.
Day 2
Day 2: Inferencing Language Models and Broad Use-cases
• Transformers and LLMs (Part 2): Encoder vs. Decoder-style LLM models. Building intuition about why these models may be more useful than others, and Overview of Applications in Text and Vision.
• LLM Store – HuggingFace: Continuation. Introduction to Huggingface, a library of pre-existing models trained by the community that can be programmatically downloaded and used for inference.
• Inferencing LLMs with Python (Part 2): Setting up a custom model for inference in Python using the Torch and Transformers libraries.
Day 3
Day 3: Genomics-specific Use-cases and Summary
• Genomic-specific Applications of LLMs (Part 1): Introduction to various LLMs for Genomics
• Genomic-specific Applications of LLMs (Part 2): Continuation. Inferencing from the DNA-ESA model for sequence alignment.
• Summary: Recap of the discussed topics and a summary of next steps.
Technical Requirements
Please attend the workshop with access to a computer and pre-installed Google Collab environment. This is an interactive session with many coding and implementation parts.
Instructor
Videos
Reviews
Workshop Details
Prerequisites: Experience with Python is necessary, and basic knowledge about ML workflows is preferred.
Length: 3 days, 3 hrs per day
Level: Intermediate
Location: Boyer 529
Seats Available: 28
Fall 2024 Dates
Nov. 12, 13, and 14
1:30 PM – 4:30 PM
REGISTRATION IS OPEN!