Popular Large Language Models : Free to use

7 min readApr 10, 2024

Introduction

Since the release of Gen AI chat models, they have been gaining popularity rapidly across various use cases in machine learning and understanding complex NLP (Natural language Processing) and NLU (Natural Language understanding) -based scenarios. However, the question remains: it is difficult to estimate when these use cases will be deployed in production, how much they will cost, and whether they will truly bring any ROI to the organization, considering factors such as performance, accuracy, and business value addition. Some open-source models, which are absolutely free to use, could potentially be game-changers in terms of cost-saving during development and testing before they are released into production.

so far, we have seen and used various models starting from Autoregressive models, Autoencoding models, Sequence-to-sequence models, Multimodal models, Retrieval-based models, Thanks to advancements in large language models and multimodal capabilities, where all-in-one solutions are possible. There is no doubt that large language models can solve most complex use cases, automate tasks, and even perform at a human level with greater performance. However, if cost is also a major concern to optimize in infrastructure, research, and maintaining data sensitivity like PHI / PII, below are some options you can try.

Popular Models : Free to use and even it can be used for production grade

Google Models

google/gemma-2b-it — Small but powerful enough to be used for simple use cases, such as text classification, document summarization, question answering, and RAG-based scenarios for organization level documents. these models are decoder based.
google/flan-t5-xxl — google T5, FLAN-T5, these models have been fine-tuned on more than 1000 additional tasks covering also multiple languages, supports English, German, French. T5 stands for : Text-To-Text Transfer Transformer.
google/gemma-1.1–7b-it —Instruction-tuned variants, utilizing RLHF (Reinforcement Learning from Human Feedback)-based models, are suitable for various text generation, question answering, and classification tasks. Due to their lightweight nature, these models can be easily deployed on laptops and CPU-based operating systems.
google/gemma-2b- lightweight, state-of-the-art open models from Google, as per google website these models are created like Gemini models. these models are text-to-text, decoder only. It can be used for all types of text based use cases, fine tunning with personal database, creating sub models to solve business problems.

Python code to call models using transformers library

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
input_text = "Explain what is large language models in brief."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

# Running from local machine using llama cpp
from langchain_community.llms import LlamaCpp
# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="<<Path to downloaded model>>",
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  
)

MOE -Mixture of Experts based Model — from Mistral AI

Mixtral 8x7B — One of the most widely downloaded models and hottest topics revolves around its architecture and how the Mixture of Experts can be a powerful mechanism to create large language models for future use. This model can be utilized across various complex use cases with greater performance and accuracy. Since these models are built on top of MOE layers rather than on FFN (dense feed forward network), there could be some challenges in applying them to RAG-based use cases with complex grammar requirements..
mistralai/Mistral-7B-Instruct-v0.2 — A very powerful model suitable for simple to complex use cases, it’s also one of my favorite. It’s fast, accurate, and has never disappointed me so far. It’s a good model for fine-tuning on organizational-level use cases, including all language-based classification, question answering, and it can even be used for RAG (Retrieval-Augmented Generation) use cases

Hugging Face Models

HuggingFaceH4/zephyr-7b-alpha — A nice model to be used for simple to complex use cases dealing with text, documents, and PDFs, especially when employing RAG. This model is a fine-tuned version of Mistralai/Mistral-7B-v0.1 using synthetic datasets through DPO — Direct Preference Optimization (for more details read research paper )
HuggingFaceH4/starchat-beta — for coding on various programming language, this can be good option, its fine tuned version of star coder models.
HuggingFaceH4/starchat2–15b-v0.1 — coding assistants models trained with synthetic data, DPO based architecture.
HuggingFaceH4/zephyr-7b-beta — Zephyr is another series of models trained on mistral based models, mix of publicly available, synthetic datasets. DPO, it will perform better on many types of text generation task but for specific use cases always fine tune with right data.
HuggingFaceM4/idefics-80b-instruct — This is very interesting model for image based chat and text generation, it stands for IDEFICS (which stands for Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS), looks very interesting but it is completely open sourced and can be used on various use cases where image, text are combined.
HuggingFaceM4/idefics-9b-instruct — multimodal for image and text based generation, chat comparing with GPT 4.

BigCode

bigcode/octocoder : OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST. if your use case deals with coding related problems and provide coding output merge with language models for creating complex data analysis, this will be good option to try for.

bigcode/santacoder — coding based model on 1.1B parameter models trained on the Python, Java, and JavaScript, very nice for quick and small use cases where coding requied like text to code to final output in graphs, charts or SQL based data generation use cases.
bigcode/starcoder2–15b- bigger model with 16 billion parameter, and can work with 600+ programming languages overall.
bigcode/starcoder2–3b- trained for 17 programming languages, The model was trained on GitHub code as well as additional selected data sources such as Arxiv and Wikipedia.

Codellama Family

codellama/CodeLlama-34b-Instruct-hf — very powerful code generators specially using transformers, python and it can even debug code for errors and code understanding.
codellama/CodeLlama-70b-Instruct-hf — trained on repository for the 70B instruct-tuned version in the Hugging Face Transformers, python and code understanding.

Meta models

meta-llama/Llama-2–70b-chat-hf — Amazing accuracy for complex scenarios, documents and text based use cases, but it needs lot of processing infrastructure, so if you are looking for solving business critical use case without compromising on accuracy this is good fit. Trained on 70Billion parameters.
meta-llama/Llama-2–7b-chat-hf — sweet and small model for general use cases, optimized for dialogue use cases.

Note — You can also directly use these models from Meta, but you will be required to register. Once the Meta team allows access, you can use them. Hence, HF-based extensions are free to use and download.

Microsoft Models

Phi 2 — Phi-2 is a Transformer with 2.7 billion parameters. good fit for all small use cases related to prompts using the QA format, the chat format, and the code format.

# Load model directly (Code from Transformer model sample)
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", trust_remote_code=True)

Other models : Trained for specific data sets

kashif/stack-llama-2 — Long-form question-answering on topics of programming, mathematics, and physics, if use cases deals with subject there you need to refer technical details this model will be good fit.
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO- interestingly this model was trained on GPT generated data on top of mistral based models, for most of the cases it generate very good accuracy and can be used for variety of use cases.
openchat/openchat-3.5–0106 — A general-purpose model fine-tuned on top of the Llama 2 family. It’s an excellent model for simple to complex use cases, ranging from answering questions on various topics, including science, history, geography, and more. The data cut-off date for this model is 2021.
tiiuae/falcon-7b — trained on latest data sets, 7 billion parameter and can answer, chat, customized use cases for fine tunning on custom data sets. free to use for commercial use as well.
bigscience/bloom — BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data. it is able to output coherent text in 46 languages and 13 programming languages, very good option for complex use cases for multiple language based use case, translation and even for RAG based use cases.

There are other LLM models that come with more advanced training methods and variability in datasets. However, for commercial use and organizational-level use cases, these are the best options if you want to optimize costs and experiment with various options, and deploy in production use.

Reference

Some of the most interesting work and research on all these models and methods of training can be found in the provided papers and articles. Also, for general real-time use cases, it’s completely based on complexity and data sensitivity. So, make your decision whether to go with API-based models or locally deployed models. The below articles and papers were referred to while writing this article.

MOE — https://arxiv.org/abs/2208.02813

link to research paper — https://arxiv.org/abs/2211.05100

Popular Large Language Models : Free to use

Written by Nagesh Somayajula