Instructions to use KRAFTON/Raon-VisionEncoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KRAFTON/Raon-VisionEncoder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="KRAFTON/Raon-VisionEncoder", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("KRAFTON/Raon-VisionEncoder", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| raon-vision-encoder | |
| Copyright 2024-2026 Raon Vision Team | |
| This product includes software derived from the following projects: | |
| =============================================================================== | |
| OpenCLIP | |
| https://github.com/mlfoundations/open_clip | |
| Licensed under the MIT License (see LICENSES/MIT-OpenCLIP.txt) | |
| Copyright (c) 2012-2021 Gabriel Ilharco, Mitchell Wortsman, | |
| Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, | |
| John Miller, Hongseok Namkoong, Hannaneh Hajishirzi, Ali Farhadi, | |
| Ludwig Schmidt | |
| Used in: model/ and train/ packages (LocCa, CLIP, loss, factory, | |
| transformer, data pipeline, training loop, etc.) | |
| =============================================================================== | |
| OpenAI CLIP | |
| https://github.com/openai/CLIP | |
| Licensed under the MIT License (see LICENSES/MIT-OpenAI-CLIP.txt) | |
| Copyright (c) 2021 OpenAI | |
| Used in: model/tokenizer.py, model/bpe_simple_vocab_16e6.txt.gz | |
| =============================================================================== | |
| Meta Platforms, Inc. (MAE / MoCo v3) | |
| Licensed under the MIT License via OpenCLIP | |
| Copyright (c) Meta Platforms, Inc. and affiliates | |
| Used in: model/pos_embed.py (sincos position embedding utilities) | |
| =============================================================================== | |
| timm (pytorch-image-models) | |
| https://github.com/huggingface/pytorch-image-models | |
| Licensed under the Apache License 2.0 | |
| Copyright (c) Ross Wightman | |
| Used in: model/transform.py (ResizeKeepRatio) | |
| =============================================================================== | |
| References | |
| The following papers informed the design and implementation of features | |
| in this software. Code was independently implemented unless noted above. | |
| - CoCa: Yu et al., "CoCa: Contrastive Captioners are Image-Text Foundation Models", 2022 | |
| - SigLIP: Zhai et al., "Sigmoid Loss for Language Image Pre-Training", 2023 | |
| - SigLIP2: Tschannen et al., "SigLIP 2: Multilingual Vision-Language Encoders", 2025 | |
| - DINO: Caron et al., "Emerging Properties in Self-Supervised Vision Transformers", 2021 | |
| - DINOv2: Oquab et al., "DINOv2: Learning Robust Visual Features without Supervision", 2024 | |
| - SILC: Naeem et al., "SILC: Improving Vision Language Pretraining with Self-Distillation", 2023 | |
| - TIPS: Huang et al., "TIPS: Text-Image Pretraining with Spatial Awareness", 2024 | |
| - Koleo: Sablayrolles et al., "Spreading vectors for similarity search", ICLR 2019 | |
| - Gram Anchoring: Simeoni et al., "DINOv3", 2025 (independently implemented) | |
| - NaFlex: from SigLIP2 / PaLI (independently implemented in PyTorch) | |