top of page
Writer's pictureLayla

How to enable ExecuTorch support in Layla? What are PTE models?

Updated: Sep 12

ExecuTorch

ExecuTorch is a framework built from the ground up by the PyTorch Team (which is the backbone of the majority of AI libraries you use day to day) for inferencing AI models on edge devices (such as mobile phones, wearables, etc.). For more info, read here: https://pytorch.org/executorch/stable/index.html


This gives better performance by optimising models suitable for mobile use.


Compare this to llama.cpp (which runs GGUF models): llama.cpp is more of a general inference engine that runs on a variety of platforms, mainly directed desktop/workstations/servers. It works on mobile as well. However, ExecuTorch is written for mobile use first and foremost, thus giving better performance optimisations. For more info about GGUF models, read here: https://www.layla-network.ai/post/what-are-gguf-models-what-are-model-quants


How to enable ExecuTorch support in Layla

Layla supports running ExecuTorch models.


The first step is to enable the ExecuTorch mini-app within Layla:

Executorch mini-app in Layla

This downloads some of the necessary libraries needed to run PTE models.


Opening the ExecuTorch app, you will see a list of recommended models that work with Layla:

Recommended Executorch models

You can download them and get started right away!


If you want to try different models, you can find them here: https://huggingface.co/l3utterfly


Look for the model repositories ending in "executorch":

Executorch models in l3utterfly's repo

Go to the Files and Versions tab in the repo, you will find three options:

Different context length files for each PTE model

ExecuTorch models are pre-compiled, so their context size is fixed (unlike llama.cpp where you can change them on the fly). Higher context sizes uses more memory, so choose one that is suitable for your phone. I suggest starting with 4096 and move up and down depending on the results.


Go to your Inference Settings to check if your models are loaded properly:

Inference settings in Layla

You can see the model icon has changed to include the Executorch logo when a suitable PTE is loaded. Make sure to select the Llama3 prompt! This is because all ExecuTorch models are LLama3 based (for now).


Differences between GGUF and PTE models

GGUFs and PTE (Executorch) models work transparently in Layla. This means all features will work out of the box no matter which model you select.


However, there are a few considerations:

  1. ExecuTorch model uses more memory. This is because ExecuTorch loads the whole model into memory at once, instead of using memory mapped files. This gives a more consistent performance, but at the cost of using more memory. So make sure to close all your background apps and make sure your phone has enough free RAM

  2. ExecuTorch does not support context shifting. GGUF models transparently extends the context by removing information from the start of the conversation. This gives the illusion that the conversation that continues indefinitely. ExecuTorch models will give an error when reaching the maximum context length.


127 views0 comments

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page