AI & ML interests

None defined yet.

Recent Activity

keshavsy  updated a collection 7 days ago
Qwen3-4B Model Organisms (Size Sweep)
keshavsy  updated a collection 7 days ago
Qwen3-4B Model Organisms (Size Sweep)
keshavsy  updated a collection 7 days ago
Qwen3-4B Model Organisms (Size Sweep)
View all activity

introspection-auditing 's collections 32

Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment.
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors.
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B LoRA adapters fine-tuned for sandbagging.
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment.
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B LoRA adapters fine-tuned for sandbagging.
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment.
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors.