Is this your business? Claim it to manage your IP and profile
Calibrator for AI-on-Chips is designed to enhance precision and performance in AI System-on-Chips using post-training quantization techniques. By employing architecture-aware algorithms, this calibrator maintains high accuracy levels even in fixed-point architectures such as INT8. It supports heterogeneous multicore devices, ensuring compatibility with various processing engines and bit-width configurations. The product utilizes a sophisticated precision simulator for accurate quantization across data paths, leveraging hardware-specific controls for precise calibration. The included calibration workflow efficiently produces a quantization table that seamlessly integrates with compilers to fine-tune model precision without altering neural network topologies. Supporting interoperability with popular frameworks, the Calibrator for AI-on-Chips enhances performance without necessitating retraining. Users benefit from expedited quantization processes, which reduce the precision drop to minimal levels, thus ensuring high-quality outputs even for complex AI models.
The ONNC Compiler is an advanced suite of C++ libraries and tools designed to enhance the development of compilers for deep learning accelerators (DLAs). Targeting diverse system-on-chip (SoC) architectures, from single-core systems to complex heterogeneous setups, it transforms neural networks into machine instructions for various processing elements. This versatility allows for seamless integration across different SoC architectures with varied memory and bus configurations. Supported by major deep learning frameworks like PyTorch and TensorFlow, the ONNC Compiler provides significant flexibility in handling multiple machine instruction streams concurrently. Utilizing both single and multiple backend modes, it caters to a broad spectrum of IC designs. The comprehensive compiler flow, divided into frontend, middle-end, and backend stages, ensures optimal performance while minimizing the memory footprint through strategic data bandwidth and resource scheduling. Enhancing AI SoCs with a robust hardware/software co-optimization approach, the ONNC Compiler employs advanced strategies like software pipelining and DMA allocation. It effectively manages complex memory hierarchies and bus systems, ensuring efficient data movement and reducing overhead. This results in substantial RAM savings and higher processing efficacy in AI-centric systems.
Forest Runtime is a highly adaptable runtime solution for executing compiled neural network models across various hardware platforms. It is characterized by its retargetable modular architecture that supports a wide range of applications, from datacenter-oriented tasks to mobile and TinyML deployments. This flexibility allows users to optimize AI model execution according to specific hardware capabilities and requirements. By leveraging "hot batching" technology, Forest Runtime facilitates dynamic adjustment of batch sizes and input shapes, enhancing throughput and minimizing response times without compiler transformation. This approach significantly boosts execution speed, especially for modern models like transformers and BERT, ensuring maximum efficiency in data center environments. Forest Runtime also excels in scaling capabilities by enabling model fusion and efficient resource management. These features minimize CPU and NPU synchronization overhead, maximizing hardware utilization for applications requiring multiple processing units and synchronous operation across various accelerator cards.
EdgeThought is a groundbreaking solution designed to revolutionize on-device large language model (LLM) inferencing. This product addresses the increasing demand for advanced LLM capabilities directly on devices, providing a high-performance, low-cost alternative to traditional cloud-based solutions. By maximizing memory bandwidth utilization and minimizing response times, EdgeThought ensures efficient processing with minimal hardware requirements. The product supports a diverse range of modern neural networks, including models such as LLaMA2 and Mistral, making it versatile across various applications. With a focus on programmability and model flexibility, EdgeThought is equipped with a specialized instruction set designed for LLM tasks. This ensures compatibility with popular frameworks and tools, enhancing ease of integration into existing AI systems. EdgeThought's scalability is further evidenced by its ecosystem readiness. It seamlessly integrates with leading frameworks like HuggingFace Transformers and Nvidia Triton Inference Server, as well as offering fine-tuning capabilities through toolkits like QLoRA and LangChain. This feature-rich product positions itself as an essential tool for AI developers aiming to enhance on-device inferencing capabilities.
Join the world's most advanced semiconductor IP marketplace!
It's free, and you'll get all the tools you need to evaluate IP, download trial versions and datasheets, and manage your evaluation workflow!
To evaluate IP you need to be logged into a buyer profile. Select a profile below, or create a new buyer profile for your company.