GLM-4.7-Flash

https://news.ycombinator.com/rss Hits: 7
Summary

GLM-4.7-Flash ๐Ÿ‘‹ Join our Discord community. ๐Ÿ“– Check out the GLM-4.7 technical blog, technical report(GLM-4.5). ๐Ÿ“ Use GLM-4.7-Flash API services on Z.ai API Platform. ๐Ÿ‘‰ One click to GLM-4.7. Introduction GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency. Performances on Benchmarks Benchmark GLM-4.7-Flash Qwen3-30B-A3B-Thinking-2507 GPT-OSS-20B AIME 25 91.6 85.0 91.7 GPQA 75.2 73.4 71.5 LCB v6 64.0 66.0 61.0 HLE 14.4 9.8 10.9 SWE-bench Verified 59.2 22.0 34.0 ฯ„ยฒ-Bench 79.5 49.0 47.7 BrowseComp 42.8 2.29 28.3 Serve GLM-4.7-Flash Locally For local deployment, GLM-4.7-Flash supports inference frameworks including vLLM and SGLang. Comprehensive deployment instructions are available in the official Github repository. vLLM and SGLang only support GLM-4.7-Flash on their main branches. vLLM using pip (must use pypi.org as the index url): pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly pip install git+https://github.com/huggingface/transformers.git SGLang using pip install sglang from source, then update transformers to the latest main branch. transformers using with transformers as pip install git+https://github.com/huggingface/transformers.git and then run: import torch from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_PATH = "zai-org/GLM-4.7-Flash" messages = [{"role": "user", "content": "hello"}] tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH) inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt", ) model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path=MODEL_PATH, torch_dtype=torch.bfloat16, device_map="auto", ) inputs = inputs.to(model.device) generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False) output_text = tokenizer.decode(generated_i...

First seen: 2026-01-19 16:31

Last seen: 2026-01-20 00:32