Johann-Peter Hartmann PRO
johannhartmann
AI & ML interests
LLMs, Local LLMs, Transformers, Image Processing, Audio Processing, E-Commerce
Recent Activity
updated
a model 9 days ago
mayflowergmbh/bert-german-ler-onnx-int4 published
a model 9 days ago
mayflowergmbh/bert-german-ler-onnx-int4 liked
a model 23 days ago
ACE-Step/Ace-Step1.5 Organizations
Document & UI Intelligence
-
xlangai/Aguvis-7B-720P
8B • Updated • 14 • 9 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
cckevinn/SeeClick
Text Generation • 10B • Updated • 125 • 18
Medical MultiModal
Multimodal models that have been trained on medical datasets.
Computer Use Models
-
ByteDance-Seed/UI-TARS-72B-DPO
Image-Text-to-Text • 73B • Updated • 901 • 152 -
ByteDance-Seed/UI-TARS-7B-DPO
Image-Text-to-Text • Updated • 2.12k • 225 -
microsoft/OmniParser
Image-Text-to-Text • Updated • 364 • 1.71k -
jadechoghari/Ferret-UI-Llama8b
Image-Text-to-Text • Updated • 211 • 68
Multimodal Models
A collection of multimodal models for the gpu poor
-
google/paligemma-3b-pt-896
Image-Text-to-Text • 3B • Updated • 495 • 123 -
OpenGVLab/InternVL-Chat-V1-5
Image-Text-to-Text • Updated • 1.85k • 416 -
alexshengzhili/llava-v1.5-13b-dpo
Text Generation • Updated • 5 -
llava-hf/llava-v1.6-mistral-7b-hf
Image-Text-to-Text • 8B • Updated • 492k • 303
Music
Computer Use Models
-
ByteDance-Seed/UI-TARS-72B-DPO
Image-Text-to-Text • 73B • Updated • 901 • 152 -
ByteDance-Seed/UI-TARS-7B-DPO
Image-Text-to-Text • Updated • 2.12k • 225 -
microsoft/OmniParser
Image-Text-to-Text • Updated • 364 • 1.71k -
jadechoghari/Ferret-UI-Llama8b
Image-Text-to-Text • Updated • 211 • 68
Document & UI Intelligence
-
xlangai/Aguvis-7B-720P
8B • Updated • 14 • 9 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
cckevinn/SeeClick
Text Generation • 10B • Updated • 125 • 18
Multimodal Models
A collection of multimodal models for the gpu poor
-
google/paligemma-3b-pt-896
Image-Text-to-Text • 3B • Updated • 495 • 123 -
OpenGVLab/InternVL-Chat-V1-5
Image-Text-to-Text • Updated • 1.85k • 416 -
alexshengzhili/llava-v1.5-13b-dpo
Text Generation • Updated • 5 -
llava-hf/llava-v1.6-mistral-7b-hf
Image-Text-to-Text • 8B • Updated • 492k • 303
Medical MultiModal
Multimodal models that have been trained on medical datasets.