Forest Runtime is a highly adaptable runtime solution for executing compiled neural network models across various hardware platforms. It is characterized by its retargetable modular architecture that supports a wide range of applications, from datacenter-oriented tasks to mobile and TinyML deployments. This flexibility allows users to optimize AI model execution according to specific hardware capabilities and requirements.
By leveraging "hot batching" technology, Forest Runtime facilitates dynamic adjustment of batch sizes and input shapes, enhancing throughput and minimizing response times without compiler transformation. This approach significantly boosts execution speed, especially for modern models like transformers and BERT, ensuring maximum efficiency in data center environments.
Forest Runtime also excels in scaling capabilities by enabling model fusion and efficient resource management. These features minimize CPU and NPU synchronization overhead, maximizing hardware utilization for applications requiring multiple processing units and synchronous operation across various accelerator cards.