探讨关于多模态大模型落地机器人行业发展
发布时间:2024-10-31 来源:http://www.quanyimoxing.com/
近期国内多家企业在“大模型+机器人”已实现技术突破。
Recently, many domestic enterprises have achieved technological breakthroughs in "big models+robots".
业内认为,随着技术的不断进步和应用场景的扩大,多模态大模型与机器人的需求将会不断增加,为企业提供了广阔市场空间。此外,与其他行业的合作也将为多模态大模型与机器人的发展带来新机遇,例如与医疗、制造等行业的合作,可实现更广泛的应用场景和商业价值。
The industry believes that with the continuous advancement of technology and the expansion of application scenarios, the demand for multimodal large models and robots will continue to increase, providing a broad market space for enterprises. In addition, cooperation with other industries will also bring new opportunities for the development of multimodal large models and robots, such as cooperation with industries such as healthcare and manufacturing, which can achieve a wider range of application scenarios and commercial value.
多模态机器人实现技术突破
Breakthrough in multimodal robot technology
截12月13日收盘,步科股份、埃夫特、绿的谐波等多只机器人概念股涨超4%。消息面上,特斯拉发布Optimus-Gen 2(第二代擎天柱)人形机器人视频,其搭载由特斯拉设计的执行器与传感器,行走速度提高30%,平衡力及全身控制均得到提高。
As of the close on December 13th, several robot concept stocks such as BuTech, Evertech, and Green Harmonic have risen by over 4%. On the news front, Tesla released a video of the Optimus Gen 2 (second generation Optimus Prime) humanoid robot, which is equipped with Tesla designed actuators and sensors, increasing walking speed by 30% and improving balance and full body control.
“多模态”AI是指能处理文本、音频、图像、视频和代码等多种形式内容的大模型。随着多模态大模型快速迭代,国际大厂不断关注其在机器人领域的应用,并在机器人规划、控制、导航等主要任务上进行了探索。
Multimodal AI refers to large models capable of processing various forms of content such as text, audio, images, videos, and code. With the rapid iteration of multimodal large models, international giants are constantly paying attention to their applications in the field of robotics and exploring their main tasks such as robot planning, control, and navigation.
止于善投资总经理何理告诉《证券日报》记者:“多模态大模型融合视觉、语音和传感器数据处理技术,极大丰富了机器人认知和决策层面。该技术在机器人中的应用,有望使机器人在复杂交互、自然语言理解和环境适应等领域迈出重大进步,激发其作为高度自主助手或劳动力的无限可能性。”
Zhi Zhi Shan Investment's General Manager He Li told Securities Daily reporters, "The fusion of multimodal large models with visual, speech, and sensor data processing technology greatly enriches the cognitive and decision-making levels of robots. The application of this technology in robots is expected to make significant progress in areas such as complex hybridization, natural language understanding, and environmental adaptation, stimulating their infinite possibilities as highly autonomous assistants or laborers
国内已有企业在此领域抢先布局。12月12日晚,奥比中光发布大模型机械臂1.0产品,可通过语音Prompts作为输入,利用多种大模型的理解能力和视觉感知能力,生成空间语义信息,让机械臂理解、执行动作。在其同步披露的视频中,机械臂成功完成了一系列语音口令,包括“把绿色方块放到黄色框中”“请恢复开始的状态”等。
Domestic enterprises have already taken the lead in this field. On the evening of December 12th, Obi Zhongguang released the Large Model Robot Arm 1.0 product, which can use voice Prompts as input and utilize the understanding and visual perception abilities of multiple large models to generate spatial semantic information, allowing the robot arm to understand and execute actions. In its synchronously disclosed video, the robotic arm successfully completed a series of voice commands, including "put the green square in the yellow box" and "please restore the initial state".
奥比中光联合创始人、CTO肖振中告诉《证券日报》记者:“公司希望通过工程化研究,使大模型机械臂在实际场景落地,包括提升机械臂自动绕开复杂障碍物来完成人类指令的能力,大模型+机械臂的泛化性问题,终实现通用场景落地。”
Xiao Zhenzhong, co-founder and CTO of Obi Zhongguang, told Securities Daily reporters, "The company hopes to use engineering research to enable the implementation of large model robotic arms in practical scenarios, including improving the ability of robotic arms to automatically bypass complex obstacles to complete human commands, solving the generalization problem of large models and robotic arms, and ultimately achieving universal scenario implementation
据不完全统计,中科创达、亿嘉和等上市公司亦于近期相继披露了基于多模态大模型的机器人研发进展情况。
According to incomplete statistics, listed companies such as Zhongke Chuangda and Yijiahe have recently disclosed their progress in robot research and development based on multimodal large models.
商业大规模应用仍需时间
Large scale commercial applications still require time
我国机器人行业已具备一定产业基础。头脑聪明、四肢灵活得多的模态机器人正成为多方竞逐未来产业的新赛道。
China's robotics industry has established a certain industrial foundation. Modal robots with intelligent minds and much more flexible limbs are becoming a new track for multi-party competition in future industries.
何理认为,在国内市场,企业已积极投入关键技术环节的研发和生产,尤其是在传感器、精密机械部件、执行器以及创新材料和轻量化结构件领域,展示了蓬勃发展势头。
He Li believes that in the domestic market, enterprises have actively invested in the research and development and production of key technological links, especially in the fields of sensors, precision mechanical components, actuators, innovative materials, and lightweight structural components, demonstrating a vigorous development momentum.
谐波减速器是工业机器人的核心零部件。绿的谐波披露,已较早完成工业机器人谐波减速器技术研发并实现规模化生产,在该领域率先实现了对进口产品的替代,极大降低了国产机器人企业的采购成本及采购周期。其推出的新一代Y系列谐波减速器,通过数理模型创新,轴承设计及加工工艺优化,其刚度指标较现有其他产品提升了一倍。
Harmonic reducer is the core component of industrial robots. Green harmonic disclosure has completed the research and development of industrial robot harmonic reducer technology earlier and achieved large-scale production. It has taken the lead in replacing imported products in this field, greatly reducing the procurement cost and procurement cycle of domestic robot enterprises. The new generation Y series harmonic reducer launched by it has doubled its stiffness index compared to other existing products through mathematical model innovation, bearing design and processing technology optimization.
肖振中对此表示认同,他告诉《证券日报》记者:“大语言模型(Large Language Model,LLM)结合视觉传感,会让各类机器人、机械臂落地到更多场景中,如工业制造、柔性物流、商用服务等。目前大模型跟实际数据的结合还存在一定差距,大模型运行消耗的算力也偏大,应用需要三五年的时间逐步落地,业务成熟可能需要更久。”
Xiao Zhenzhong agrees with this and told Securities Daily reporters: "The combination of Large Language Model (LLM) and visual sensing will enable various robots and robotic arms to land in more scenarios, such as industrial manufacturing, flexible logistics, commercial services, etc. At present, there is still a certain gap between the integration of large models and actual data, and the computing power consumed by the operation of large models is also relatively high. The application will take three to five years to gradually land, and business maturity may take longer
“但公司坚信这是正确的方向,前景广阔。”肖振中表示,奥比中光正搭建机器人及AI视觉中台,通过多模态视觉大模型及智能算法研发,结合机器人视觉传感器,形成自主移动定位导航和避障的完整产品方案,积极迎接智能机器人时代。
But the company firmly believes that this is the right direction with broad prospects, "said Xiao Zhenzhong. Obi Zhongguang is building a robot and AI vision platform, and through the research and development of multimodal vision models and intelligent algorithms, combined with robot vision sensors, has formed a complete product solution for autonomous mobile positioning, navigation, and obstacle avoidance, actively welcoming the era of intelligent robots.
本文的精彩内容来自:大型机器人模型制作 更多的详细内容请点击我们网站:http://www.quanyimoxing.com谢谢您的到来
The exciting content of this article comes from the production of large-scale robot models. For more detailed content, please click on our website: http://www.quanyimoxing.com Thank you for coming