[{"content":"ChatGPT has introduced an AI financial management feature for its Pro users, allowing direct connections to financial accounts to generate financial dashboards and provide spending advice based on real data. However, contrary to expectations, overseas users have largely rejected this new feature.\nThe Challenge of Trust in AI Finance The biggest hurdle for AI finance has never been technology, but user trust, which is vividly illustrated in this case. Why is there no skepticism when AI is embedded in native financial apps, but a universal model connecting to finance triggers collective resistance? Let\u0026rsquo;s analyze this from a different angle.\nOpenAI\u0026rsquo;s announcement screenshot showcasing the financial management feature for ChatGPT Pro users.\nAI Moves from Content Generation to Financial Management OpenAI\u0026rsquo;s launch of the AI financial management feature has been in the works for some time. In April of this year, it acquired the team from personal finance startup Hiro, whose founder, Ethan Bloch, and ten core members joined OpenAI. Hiro previously helped users manage over 6.8 billion yuan in assets as a \u0026ldquo;personal CFO\u0026rdquo; using AI.\nThis is not a spontaneous attempt by OpenAI but a collective choice in the AI industry. In the same week, ByteDance\u0026rsquo;s Doubao app launched a QR code payment feature, leveraging Douyin Pay to create a closed-loop from AI dialogue to shopping to payment, directly extending AI from a conversational tool to consumption scenarios.\nDomestic financial giants have been ahead in this regard. Alipay launched its AI financial assistant \u0026ldquo;Zhixiaobao\u0026rdquo; back in 2024, followed by a complete upgrade of Ant Group\u0026rsquo;s AI model-driven \u0026ldquo;Mxiaocai,\u0026rdquo; which provides users with market analysis, portfolio insights, and educational support, covering over 1 billion monthly active users.\nScreenshot of ChatGPT\u0026rsquo;s financial data dashboard showing expenditure details and investment returns.\nMarket signals are also clear. According to data cited by Investor Network, the global AI-driven smart investment advisory market is expected to grow from $6.6 billion in 2025 to $9.77 billion in 2026, with a compound annual growth rate of 47.9%.\nThis growth hides a key change: AI has officially moved from \u0026ldquo;content generation\u0026rdquo; to \u0026ldquo;decision support\u0026rdquo; in deep scenarios, with financial decision-making being the most sensitive and demanding type of decision.\nYounger users are actually ready for this. Research shows that 41% of Gen Z investors are willing to trust AI to manage their portfolios, and 67% of Gen Z traders have activated AI tools in their daily trading. Demand is rising, so why did ChatGPT\u0026rsquo;s new feature fail?\nUser Rejection: Not AI Finance, But Data Leakage OpenAI has implemented rules, clarifying that ChatGPT only has \u0026ldquo;read-only access\u0026rdquo; to financial accounts and cannot move user funds. Users can disconnect at any time, and synchronized data will be deleted within 30 days. The privacy chat mode does not use any financial data.\nHowever, users remain unconvinced, with comments on X overwhelmingly negative: \u0026ldquo;It\u0026rsquo;s crazy to hand over financial privacy to AI,\u0026rdquo; and \u0026ldquo;The word privacy is disappearing.\u0026rdquo; Some users worry that AI might secretly subscribe them to unnecessary services. Digital Trends directly stated, \u0026ldquo;I won\u0026rsquo;t connect.\u0026rdquo;\nScreenshot of ChatGPT\u0026rsquo;s subscription fee inquiry interface showing monthly subscription items and total costs.\nWhy do users have such different attitudes toward AI finance on different platforms? Domestic users have never raised privacy concerns using Alipay\u0026rsquo;s Mxiaocai, while ChatGPT\u0026rsquo;s launch sparked outrage. The core difference lies not in technology but in data boundaries.\nWhat users truly worry about is not the right to delete data, but how many third parties have access to their data during storage and analysis. General AI platforms do not hold financial licenses and are not subject to financial-grade data regulations, meaning that core financial data routed through Plaid to a general model is akin to placing their most private information on non-financial institution servers.\nScreenshot of Alipay\u0026rsquo;s AI financial assistant interface showing the function entry and data.\nThe logic of native financial apps is completely different. Users\u0026rsquo; payment, investment, and debt data already reside within the platform, which holds compliance licenses and has been subject to financial-grade security regulations since day one. The data boundary has never changed.\nAnt Group\u0026rsquo;s Mxiaocai is a prime example: AI capabilities are simply added on top of the existing security framework, and data has never left Alipay\u0026rsquo;s secure boundary. Users do not need to share any new private information, naturally avoiding a trust crisis.\nScreenshot of Alipay\u0026rsquo;s Mxiaocai interface showing Mxiaocai\u0026rsquo;s financial Q\u0026amp;A and analysis features.\nInterestingly, ChatGPT\u0026rsquo;s disclaimer states, \u0026ldquo;Your complete financial situation is within ChatGPT,\u0026rdquo; but the fine print at the end clarifies: \u0026ldquo;ChatGPT is not a substitute for professional financial advice.\u0026rdquo;\nScreenshot of ChatGPT\u0026rsquo;s disclaimer showing that ChatGPT does not provide professional financial advice.\nOpenAI updated its usage policy by the end of 2025, explicitly prohibiting ChatGPT from providing financial, legal, or medical advice, having previously received subpoenas due to user losses caused by AI advice. The simultaneous launch of financial features while distancing itself from responsibility highlights an inherent issue.\nThe True Breakthrough for AI Finance Lies Beyond General Models Many interpret this controversy as a sign of user distrust in AI finance, but this is entirely incorrect. Users are not rejecting AI managing their finances; they are rejecting the general model extracting their core financial data from outside.\nThis incident actually confirms an overlooked trend: the core barrier for financial AI has never been the general capabilities of large models, but the legally held data and compliant security frameworks. This is precisely the greatest advantage of native financial institutions and the barrier that general AI platforms find hard to cross.\nFrom a technical perspective, general models can indeed provide more humanized analysis than traditional tools and can help users sort through chaotic cash flows to identify unnecessary subscription expenses. For instance, ChatGPT\u0026rsquo;s showcased feature can automatically organize all user subscriptions, calculate monthly total expenses, and help users identify idle subscriptions with one click.\nScreenshot of overseas users\u0026rsquo; comments expressing skepticism about ChatGPT\u0026rsquo;s financial feature.\nThis feature is inherently valuable to users, but the problem lies in the implementation path: must data be exported to a general model for analysis? Native financial apps can integrate the same AI capabilities within their local data framework, allowing users to enjoy AI benefits without incurring additional privacy risks.\nThe future evolution of AI finance is already clear from this controversy: it is not about general AI forcibly invading the financial sector, but rather native financial institutions upgrading their service capabilities with AI. This is akin to adding a new weapon to an already fortified castle, allowing users to benefit from AI without opening the door to their data.\nThe industry already has a clear hierarchy of players:\nLicensed financial institutions: Hold data and compliance advantages, only needing to add AI capabilities for rapid implementation, resulting in almost zero trust costs. Native financial super apps: Integrating data, users, and compliance, this is the fastest-growing sector for AI finance. General AI platforms: Can only output tools under data desensitization, making it difficult to touch core financial data. Gen Z\u0026rsquo;s acceptance of AI finance is already high, and the demand explosion is on the way. However, the competition on the supply side has always been a competition of rules and trust, rather than a simple competition of model capabilities.\nModel capabilities can catch up quickly, but the barriers of trust and compliance require long-term accumulation and cannot be broken by a single feature release.\nChatGPT\u0026rsquo;s misstep serves as a reminder to all players: in the unique field of finance, technology can solve only half the problems; the other half will always be trust and rules. Skipping trust to discuss functionality will lead to users voting with their feet.\nAI must respect the rules of money to get close to it. Data has boundaries, and trust will have space only when those boundaries are respected. This principle applies equally to both general AI and financial AI.\n","date":"2026-05-16T00:00:00Z","permalink":"/posts/note-d431b5b294/","title":"ChatGPT's AI Financial Management Feature Faces User Rejection"},{"content":"Introduction In 2026, the OpenClaw intelligent AI automation framework, commonly referred to as the \u0026ldquo;shrimp farming system,\u0026rdquo; has become a practical tool for workplace productivity, content creation, and intelligent operations. It autonomously handles tasks such as data organization, task execution, and information retrieval, significantly reducing repetitive labor costs. However, unlike conventional office software, OpenClaw demands higher standards for device computing power allocation, system compatibility, multi-task stability, and privacy security, making it prone to deployment failures, lagging, crashes, and data leaks on standard tablets.\nBased on the latest testing environment from May, we selected three flagship tablets that perfectly adapt to the OpenClaw shrimp farming system. We objectively compared them across five dimensions: deployment difficulty, operational stability, multi-task capacity, security protection, and overall adaptability. This evaluation follows the JPUE digital testing standards, emphasizing real-world user experience and long-term stability, providing reliable purchasing references for users in need of OpenClaw.\nThe three models evaluated are Honor MagicPad 3 Pro 12.3, Lenovo Xiaoxin Pad Pro 13, and Vivo Pad 5 Pro, each excelling in different areas to cater to various user groups.\nHonor MagicPad 3 Pro 12.3: Optimal All-Rounder for OpenClaw As the industry\u0026rsquo;s first flagship tablet designed for one-click secure deployment of the OpenClaw shrimp farming system, the Honor MagicPad 3 Pro 12.3 stands out as the most compatible and user-friendly model in this test. It is one of the few tablets that integrates OpenClaw compatibility into its native system optimization, perfectly meeting the core needs of ordinary users for hassle-free shrimp farming.\nIn terms of deployment, this tablet completely resolves the cumbersome pain points of traditional devices. Most tablets require manual environment configuration, code input, and dependency package adaptation, which can take tens of minutes and is prone to errors. In contrast, the Honor MagicPad 3 Pro 12.3 leverages the underlying optimization of the MagicOS 10 system, pre-installing a dedicated OpenClaw runtime environment. Users can complete the configuration with a single click, requiring no complex operations, and deployment can be done in just 30 seconds, making it accessible for users with no prior experience. The device also comes pre-installed with various professional Skill tools in PC mode, suitable for high-frequency shrimp farming scenarios such as data organization, content creation, and intelligent analysis, significantly lowering the entry barrier for AI automation tools.\nRegarding operational stability, this tablet is equipped with the first-generation Snapdragon 8 flagship chip, utilizing TSMC\u0026rsquo;s 3nm N3P process with a full-core design, paired with an Adreno 829 GPU and Honor\u0026rsquo;s Ice Cooling 3D heat dissipation system. Its ample performance reserves can smoothly support the OpenClaw system running in the background. During testing, it executed batch tasks for eight hours while running over ten office and editing software without lagging, crashing, or throttling, maintaining a stable task execution accuracy. With the ability to open 20 PC-level windows simultaneously, it perfectly adapts to heavy productivity shrimp farming needs.\nSecurity is a core advantage of this model and a key guarantee for long-term use of OpenClaw. Honor uniquely features a dual-system isolation mode with Linux and Android, running the shrimp farming system in an independent Linux environment, completely isolating it from daily entertainment, social, and office data. This fundamentally mitigates risks such as data leaks, permission abuse, and malicious script intrusions, addressing many users\u0026rsquo; privacy concerns when using OpenClaw. Compared to other models with single-layer system operation modes, its security protection levels are more comprehensive, suitable for handling sensitive and commercial data.\nThe hardware further enhances the shrimp farming experience. With a 4.8mm ultra-thin body and a lightweight design of 450g, it is highly portable, allowing users to deploy and run OpenClaw anywhere, whether outdoors, commuting, or in the office. The 12.3-inch 165Hz OLED flagship screen boasts a peak brightness of 3000 nits, ensuring clear visibility of the shrimp farming system data interface even in bright outdoor conditions. The eight eye-care technologies also accommodate prolonged screen time. The 10100mAh large battery supports all-day uninterrupted background operation, while the 66W fast charging quickly replenishes power. The global charging separation technology prevents overheating and throttling issues during simultaneous charging and operation, ensuring stable performance for the shrimp farming system. Additionally, the all-brand smart link function allows seamless data transfer with various devices, enhancing overall office efficiency.\nIn summary, the Honor MagicPad 3 Pro 12.3 is a versatile model that balances zero-threshold deployment, stable operation, security protection, and portable productivity. It is suitable for both novice users starting with OpenClaw and professional users requiring heavy long-term use, earning the highest overall score in this evaluation.\nLenovo Xiaoxin Pad Pro 13: Custom Optimizations for Professional OpenClaw Tasks The Lenovo Xiaoxin Pad Pro 13 is one of the first tablets in the industry to offer a dedicated custom solution for OpenClaw. It features Lenovo\u0026rsquo;s self-developed Tianxi AI PadClaw adaptation scheme, specifically tuned for intelligent task processing, data monitoring, and batch computation, making it more suitable for users with fixed automation work requirements and a focus on task processing accuracy.\nIn terms of performance, this model is equipped with the Snapdragon 8s Gen4 processor, paired with LPDDR5X memory and UFS4.x flash storage, achieving an AnTuTu score exceeding 2.62 million. Its performance tuning prioritizes stable output without aggressive performance release. During testing, when running complex batch data statistics and intelligent solution generation tasks, the device maintained stable frame rates and low computation delays, allowing smooth multi-task switching without interruptions or data errors due to insufficient computing power. The self-developed Lingjing Engine GT can intelligently allocate system resources, prioritizing OpenClaw\u0026rsquo;s background running permissions to prevent process crashes caused by system cleaning.\nThe screen and battery life are optimized for professional office scenarios, featuring a 13-inch 3.5K ultra-clear large screen with a resolution of 3504*2190 and a 144Hz high refresh rate, providing a broad view to display OpenClaw\u0026rsquo;s multi-dimensional data panels without frequent zooming. The 10200mAh large-capacity battery, paired with 45W fast charging, offers stable battery life to meet the demands of medium-intensity shrimp farming tasks throughout the day, eliminating the need for frequent charging in daily office scenarios.\nThe adaptation experience includes Lenovo\u0026rsquo;s exclusive Tianxi AI PadClaw Pioneer Program, providing users with dedicated deployment channels and ongoing functional iteration support. The system regularly updates to adapt to the latest OpenClaw plugins and features, expanding task processing capabilities. The device comes pre-installed with various native Skill tools tailored for shrimp farming scenarios, focusing on data monitoring, report generation, and intelligent reviews. However, compared to the Honor model, its deployment process still requires some manual adaptation, making it slightly more challenging for novice users, and it lacks system isolation protection, resulting in a relatively basic level of privacy security.\nOverall, the Lenovo Xiaoxin Pad Pro 13 excels in professional scene optimization, stable computing output, and refined task processing, making it suitable for content creators and small operations personnel with specific OpenClaw professional usage needs. It is a model that emphasizes specialized capabilities.\nVivo Pad 5 Pro: Basic Adaptation for Entry-Level Users The Vivo Pad 5 Pro is positioned as an all-around multimedia light flagship, with solid hardware that can perfectly support the basic functions of OpenClaw. It is suitable for entry-level users who only need simple automation tasks while balancing multimedia entertainment and daily office work, offering significant cost-performance advantages for those trying \u0026ldquo;tablet shrimp farming\u0026rdquo; for the first time.\nThe core hardware features the Dimensity 9400 flagship processor, achieving an AnTuTu score of 2.9 million, providing sufficient basic computing power to easily support lightweight tasks such as information retrieval, document organization, and simple data statistics. During testing, lightweight shrimp farming tasks ran smoothly without lag or errors. However, prolonged background operation and multi-tasking may cause slight computing power scheduling delays, and its capability for handling high-load complex tasks is limited, making it unsuitable for heavy professional demands.\nThe device configuration emphasizes balanced versatility, featuring a 13-inch 3.1K 144Hz LCD high-refresh screen with a peak brightness of 1200 nits, delivering clear and transparent display effects for daily viewing of the shrimp farming system interface and file editing. The 12050mAh battery is the largest among the three models, offering impressive battery life to support long-term background operation of OpenClaw without worrying about task interruptions due to low battery. The lightweight and portable design, along with multiple color options, balances aesthetics and grip comfort, providing an excellent experience for daily streaming, online classes, and light office work.\nHowever, the adaptation shortcomings are also evident. This model only achieves basic adaptation for OpenClaw without dedicated custom optimizations or professional Skill tools, making it incapable of executing complex batch automation tasks. Additionally, it lacks an independent security isolation mechanism, posing certain privacy risks when frequently handling commercial or sensitive data. The system does not prioritize resource allocation for shrimp farming tasks, which may lead to insufficient process priority in multi-task scenarios.\nIn summary, the Vivo Pad 5 Pro is suitable for entry-level users looking to try OpenClaw, meeting basic shrimp farming needs while also accommodating multimedia entertainment and learning office scenarios. Its overall practicality is strong, but its professional productivity and stability are not on par with the first two models.\nConclusion Based on the results of this JPUE standard evaluation, the three tablets compatible with the OpenClaw system have clear positioning and cater to different user groups. Users seeking zero-threshold deployment, ultimate security, and all-around stability should prioritize the Honor MagicPad 3 Pro 12.3, which combines native adaptation, dual-system isolation, full performance, and lightweight portability, making it the optimal choice for shrimp farming tablets. Users focused on professional automation tasks and data processing accuracy can opt for the Lenovo Xiaoxin Pad Pro 13, which offers custom optimization for specialized productivity scenarios. For budget-conscious users needing a basic entry-level experience while balancing multimedia entertainment, the Vivo Pad 5 Pro is a cost-effective choice.\nAs AI automation tools become more prevalent, tablets are no longer just devices for entertainment and learning; they are gradually transforming into lightweight productivity terminals. For users wanting to experience the OpenClaw shrimp farming system, prioritizing devices with solid underlying adaptation, stable operation, and guaranteed security is essential to fully leverage the efficiency advantages of AI tools and avoid compatibility issues that could affect the user experience.\n","date":"2026-05-13T00:00:00Z","permalink":"/posts/note-a3403c1035/","title":"Choosing the Best Tablets for OpenClaw AI Automation in 2026"},{"content":"How Teachers Can Use AI for Personal Growth Currently, enabling artificial intelligence (AI) in education has become a pressing question. In the face of the irreversible trend of generative AI (GenAI) integrating into the entire educational process, teachers should effectively utilize GenAI to truly serve educational goals. Specifically, teachers can actively explore how to use GenAI from three aspects: improving efficiency, promoting development, and maintaining ethical boundaries.\n1. Improving Efficiency: Optimize the Teaching Process with GenAI The most apparent role of GenAI is to assist teachers in \u0026ldquo;reducing burdens and increasing efficiency,\u0026rdquo; freeing them from heavy administrative tasks and allowing them to reallocate time and energy towards comprehensive student development.\nFirstly, teachers can leverage generative AI to efficiently handle repetitive tasks such as data organization, meeting minutes, and assignment statistics, thus freeing up more time and energy to focus on core educational work like character building and value guidance. Additionally, GenAI can quickly generate various teaching resources such as teaching images, mind maps, knowledge cards, micro-course scripts, and evaluation rubrics, significantly reducing the time spent on resource searching and production, thereby enhancing lesson preparation efficiency. This is currently the most common way frontline teachers use generative AI.\nSecondly, GenAI can assist in generating classroom exercises, alleviating the burden of repetitive design tasks. Teachers often struggle with designing differentiated and variant practice questions. By inputting specific knowledge points, question types, and difficulty levels into the GenAI platform, they can choose to generate new questions or retrieve real exam questions. It is important to note that teachers need to professionally evaluate and review the generated or retrieved questions, retaining content that aligns with educational goals and curriculum standards. Furthermore, they should dynamically adjust and personalize the questions based on students\u0026rsquo; actual levels, and if necessary, adapt or reorganize them for classroom practice or homework.\nFinally, GenAI empowers interdisciplinary teaching, which is another area where it can provide deep support. Unlike individual teachers with a single subject background, GenAI has strong knowledge integration capabilities, offering teachers interdisciplinary knowledge links, case materials, and teaching design ideas. Specifically, teachers can establish a dedicated \u0026ldquo;interdisciplinary teaching AI agent\u0026rdquo;. First, they should build a subject knowledge base by categorizing and organizing curriculum standards, textbooks for various educational stages, personal lesson plans, and teaching reflections. Based on this foundation, they can generate the AI agent using the GenAI platform and continuously optimize its dialogue logic, enabling it to generate learning tasks and activity designs with an interdisciplinary perspective based on the subject content. Once the AI agent is established, teachers can generate interdisciplinary inquiry tasks according to teaching objectives and content, implementing them in the classroom after assessing and adjusting based on student learning conditions. More importantly, teachers can continuously feed the learning outcomes generated in class and their own teaching reflections back into the AI agent\u0026rsquo;s knowledge base, forming a feedback loop of \u0026ldquo;use—accumulate—optimize—reuse,\u0026rdquo; achieving mutual growth in teaching.\n2. Promoting Development: Empower Teacher Professional Growth with GenAI The leap in teacher professional growth often occurs through deep reflection and discussion about the classroom. GenAI can not only help teachers manage daily teaching tasks but also serve as a \u0026ldquo;cognitive partner\u0026rdquo; for their post-class professional development, assisting teachers in transitioning from experience reliance to evidence-driven practice.\nGenAI supports classroom diagnostic analysis, moving from intuition to evidence-based insights. Unlike traditional classroom evaluations that primarily rely on experience, teachers can upload classroom recordings to the GenAI platform to obtain analysis reports covering various dimensions such as teaching structure, behaviors, strategies, and effectiveness, which they can then interpret professionally in conjunction with their teaching intentions. It is essential to emphasize that GenAI provides the analysis \u0026ldquo;draft,\u0026rdquo; and the professional interpretation by teachers is what imparts educational significance to the data.\nGenAI supports human-machine collaborative research, shifting from passive reception to active inquiry. Having a classroom analysis report is not enough; teachers need to engage in deep discussions around specific teaching segments. For instance, during classroom research, teachers can use GenAI as a cognitive partner, asking precise and in-depth questions to analyze classroom phenomena, gather practical evidence, and achieve teaching improvements. In simple terms, teachers can adopt a \u0026ldquo;thick and thin questioning\u0026rdquo; strategy to engage in multiple rounds of dialogue with GenAI. \u0026ldquo;Thinning the lesson\u0026rdquo; focuses on a specific teaching segment, initiating a chain of questions based on the logic of \u0026ldquo;what happened → what was good/bad → what evidence exists → why → what teaching patterns were discovered\u0026rdquo;. \u0026ldquo;Thickening the lesson\u0026rdquo; further questions from the essence: \u0026ldquo;Under what conditions might this teaching pattern fail? How can it be improved? What is the basis? What will happen after the improvement?\u0026rdquo; This expands teachers\u0026rsquo; understanding of the classroom in both breadth and depth.\nThroughout this process, teachers remain the primary questioners, while GenAI is responsible for providing data analysis, evidence retrieval, and multi-role perspective support, collaboratively achieving a deeper understanding of teaching from phenomena to essence.\nGenAI aids teachers in accumulating wisdom, transitioning from fragmented insights to continuous growth. Insights emerging from each post-class research session can easily fade over time if not organized. GenAI can help teachers structure and extract core viewpoints from their research into practical knowledge, generating mind map-style research notes that include teaching design logic, effective teaching method evidence, and improvement suggestions. Teachers can then supplement and refine these notes to form a knowledge accumulation that aligns with their cognitive style. Additionally, with multiple research sessions, GenAI can assist teachers in constructing a dynamically updated \u0026ldquo;personal professional growth knowledge base,\u0026rdquo; allowing teaching practices to be visually traced. Teachers can transcend the limitations of single teaching instances and examine their teaching over a longer time frame, accurately identifying teaching inertia that needs to be broken and effective practices worth maintaining.\n3. Maintaining Ethical Boundaries: Navigating Three Key Boundaries to Avoid Technological Alienation The premise of effectively utilizing GenAI is to maintain ethical boundaries. No technology can replace the critical role of teachers in emotional guidance, value shaping, and thought stimulation. Therefore, teachers need to uphold ethical boundaries throughout the entire process of applying GenAI.\nFirstly, maintain the technological boundary. Teachers should reasonably define the division of labor between humans and machines, allowing GenAI to handle auxiliary tasks such as information retrieval and resource organization, while retaining creative teaching activities like lesson design and classroom interaction organization in their hands. At the same time, teachers should guide students to form a healthy understanding of human-machine collaboration. For example, in the classroom, they can advocate a \u0026ldquo;think before use\u0026rdquo; rule for technology application, encouraging students to think independently first before using GenAI to expand ideas or optimize expressions.\nSecondly, maintain the value boundary. In critical educational moments involving value guidance and emotional care, teachers should retain their leading role. For instance, when students express family troubles or growth dilemmas in their writing, GenAI may recognize the emotional tone of the text, but whether a follow-up conversation is needed, whether to contact parents, and how to provide appropriate care and encouragement in feedback are all value judgments that only teachers can make. The warmth of education always comes from humans, not machines.\nThirdly, maintain the ethical boundary. On one hand, teachers need to maintain a cautious attitude towards the content generated by GenAI, rigorously reviewing and correcting it to ensure it meets curriculum standards and subject-specific norms; on the other hand, teachers must pay attention to protecting the privacy of personal and student data, avoiding uploading unconsented and non-anonymized materials directly to non-locally deployed GenAI platforms for processing.\nIn the era of intelligence, the rapid advancement of technology compels teachers to continuously learn, reflect, and innovate. Teachers need to actively embrace new technologies and innovate in education and teaching. When teachers become proficient and enthusiastic about using generative AI, they can help find a new balance between efficiency and warmth in education.\n","date":"2026-05-12T00:00:00Z","permalink":"/posts/note-788b7b6716/","title":"How Teachers Can Use AI for Personal Growth"},{"content":"The Impact of Generative AI on Artistic Creation As artificial intelligence deeply integrates into various aspects of society and industry, it sparks a new wave of transformation. The involvement of generative AI in artistic creation brings vitality but also raises a series of questions: Can it replace artists? Will it shake the foundational values of art? Or is it rewriting the entire logic of subjectivity established for art? It is essential to confront these issues within the contexts of art history, technology history, and the construction of subjectivity, rather than simplifying them to mere efficiency gains or the optimistic notion that \u0026ldquo;everyone is an artist.\u0026rdquo;\nHuman-Machine Collaboration and Originality The first challenge posed by human-machine collaboration is the originality of art. With the rapid development of large language models and multimodal models, natural language interaction has become a fundamental method for collaborative creation. In this process, the production of text, music, images, and videos is significantly affected, though the impact is not uniform. In fact, generative AI\u0026rsquo;s role varies across different art forms and levels of involvement. Art forms that utilize digital media are undergoing systematic reshaping. For instance, in the field of video creation, independent creators can leverage generative AI to directly generate scripts, storyboards, visuals, music, and post-production styles through prompts, significantly compressing or even eliminating the collaborative and physical operational stages traditionally required.\nIn the visual arts, if we still understand it as a form of artistic expression associated with a specific medium and manual creation, the involvement of generative AI will alter the creative process. In traditional art creation, artists use tools like brushes and chisels, relying on their mastery of techniques to transform creative ideas into tangible works. The intervention of generative AI primarily affects the early stages of visual imagination and concept generation, rather than directly eliminating drawing, sculpting, and production. Creators still need to possess skills in materials, techniques, and form control to select, edit, and deepen the image resources provided by machines, thereby transforming them into artworks. This active participation by creators highlights their intellectual intent, which reflects the originality of the work. If creators reduce or forgo practical operations, such creations may not be considered part of visual arts.\nRestructuring the Creative Process It is evident that the impact of generative AI on visual art is not merely about replacing artists; rather, it reorganizes the significance of various stages within the creative process. Certain preliminary cognitive activities that were once viewed as crucial are now partially transferred to algorithmic systems, while techniques that previously tested execution skills, selection, and reproduction capabilities are regaining importance in many specific creative practices. This indicates that understanding the relationship between AI and visual art should stem from this structural change, rather than superficial judgments about whether AI replaces human artists.\nRedefining Subjectivity Redefining the position of the subject is a valuable reference brought by generative AI. Similar to the emergence of photography, generative AI forces creators to confront a new mechanism of visual generation and compels them to reconsider which abilities can be taken over by technology and which need to be redefined and maintained by the creator. Generative AI touches upon composition, combination, style simulation, and even artistic concepts, which are closer to human cognitive activities. These cognitive activities, once seen as manifestations of creative subjectivity, are now partially shared or replaced by technology. Generative AI is transitioning from a mere auxiliary tool to a quasi-subject participating in cultural production, which is particularly sensitive in the current context of artistic creation. When it becomes difficult to determine how much of a creative idea, composition, or concept originates from the author, the stability of originality as the core of artistic value begins to waver. The question then shifts from whether generative AI can create art to how art should be defined in light of significant generative AI involvement.\nDemocratizing Artistic Creation Within the discourse of new popular literature and art, the involvement of generative AI in visual art creation also serves as a breakthrough for dismantling professional monopolies, redistributing cultural power, and integrating creative structures. Utilizing generative AI for creation allows bypassing certain traditional training paths while also presenting new capability requirements for creators, such as prompt organization, model understanding, image selection, style judgment, and cross-media integration. This indicates that generative AI does not eliminate professionalism; rather, it reshapes the content and form of professionalism.\nThe involvement of generative AI directly impacts the monopolistic structures in visual art creation: first, it weakens the traditional technical monopoly over creative entry, allowing those without formal training to enter visual production; second, as the boundaries of originality expand, visual art creation is no longer an internal affair of a few professional groups but becomes a cultural practice that broader societal subjects can engage in. In this process, the relationships between creation, dissemination, and evaluation are also changing: the public is not only viewers and consumers but also creators, disseminators, and evaluators. However, the control over platforms, algorithms, and models remains in the hands of a few technical entities, who reshape creators\u0026rsquo; tastes and choices through model preferences and data training, causing new popular practices to fall again under the discipline of technological power. While creative rights have partially decentralized, the decentralization of evaluative rights remains unresolved. Only when creative rights, dissemination rights, and evaluative rights are all restructured can the new wave of popular visual art brought by generative AI drive a more structurally significant cultural shift.\nThe Essence of Generative AI Essentially, generative AI is a highly complex stylized reorganization and interpretation based on existing data. Its underlying logic is \u0026ldquo;learning\u0026rdquo; and \u0026ldquo;optimization,\u0026rdquo; rather than \u0026ldquo;subversion\u0026rdquo; and \u0026ldquo;revolution.\u0026rdquo; Currently, generative AI lacks the most fundamental source of creativity found in artists—the embodied emotional experiences of individuals. Artistic creation, especially great works, is deeply rooted in the unique life insights and profound spiritual realms of the artist. Therefore, in facing generative AI, it should be viewed as a co-creation tool that inspires creativity, expands imagination, and enriches expression, rather than a complete substitute for creation.\nIn conclusion, in the era of artificial intelligence, the nature of art is undergoing unprecedented renewal and reconstruction. The deep driving force behind this transformation is the dual impetus of technological revolution and cultural awareness, prompting us to engage in multifaceted reflections. Properly understanding the relationship between artificial intelligence and visual art, and clarifying the intrinsic value of art, will help achieve better human-machine co-creation and unlock new artistic possibilities.\n","date":"2026-05-10T00:00:00Z","permalink":"/posts/note-4ca5a563b2/","title":"The Impact of Generative AI on Artistic Creation"},{"content":"What is Vibe Coding? Have you ever had a moment where you thought, \u0026ldquo;If only there was a tool to help me automatically organize my weekly reports,\u0026rdquo; or \u0026ldquo;I want to create a simple accounting software, but I have no coding knowledge, where do I start?\u0026rdquo;\nThen you open a search engine, look up \u0026ldquo;learn programming from scratch,\u0026rdquo; and are overwhelmed by Python, JavaScript, data structures, and algorithms. You quietly close the webpage, thinking, \u0026ldquo;Forget it, this isn\u0026rsquo;t for me.\u0026rdquo;\nIf you’ve had this experience, this article is for you.\nVibe Coding is a new programming approach that doesn’t require any coding knowledge or memorizing programming keywords. You just need a vague idea and can tell an AI, \u0026ldquo;Help me create this,\u0026rdquo; and it will generate the code for you.\nSounds like science fiction? No, this is happening in 2025.\nThe core logic of Vibe Coding is simple: it shifts from \u0026ldquo;humans writing code\u0026rdquo; to \u0026ldquo;humans describing needs, AI writing code.\u0026rdquo; You are no longer a programmer but a product manager, the one who says, \u0026ldquo;This is what I want.\u0026rdquo; The AI translates your words into instructions that a computer can understand.\nIt’s like dining at a restaurant; you don’t need to know how the chef chops vegetables, seasons food, or controls the heat. You just say, \u0026ldquo;I want a spicy dish,\u0026rdquo; and wait for it to be served. Vibe Coding is that \u0026ldquo;ordering\u0026rdquo; process, with AI as the \u0026ldquo;chef.\u0026rdquo;\nHow Can Vibe Coding Help Our Lives and Work? Many people feel that programming is far from them, thinking it’s only for programmers. But if you think about it, how many repetitive, mechanical tasks in our daily lives could be automated?\nFor example:\nIf you work in sales, you might spend two hours each week exporting customer data from a system, manually copying and pasting it into Excel for analysis, and creating a PPT report. With Vibe Coding, you could simply tell the AI, \u0026ldquo;Help me create a tool that automatically reads customer data and generates charts and PPTs,\u0026rdquo; and it would handle it all without you needing to know any code.\nAnother example:\nIf you are a content creator, you might write articles, select images, format, and publish daily. You want to create a material management tool to organize frequently used images, text templates, and formatting styles for easy access. With Vibe Coding, you can describe your needs, and the AI will generate a web-based tool for you.\nOther scenarios include:\nCreating a countdown reminder tool to manage project deadlines. Developing a simple accounting app to track daily income and expenses. Building a lottery tool to liven up company events. Designing a reading notes organization tool to manage your insights. Automating weekly report generation to eliminate repetitive tasks. Previously, these tasks required hiring a programmer or spending months learning to code. Now, Vibe Coding makes all of this accessible.\nWhy Haven\u0026rsquo;t You Started Yet? I’ve asked many people this question, including myself.\nThe answer is surprisingly consistent: fear.\nThis fear comes from three aspects:\nRespect for the term \u0026ldquo;programming.\u0026rdquo; We often perceive programming as something only high-IQ individuals can do, a skill reserved for those who type rapidly on keyboards. Most of us have never even seen code, so how can we dare to say we can program? Fear of learning costs. Many have tried self-learning programming, bought books, watched videos, or enrolled in courses. The result? They couldn’t even set up the Python environment before being deterred by error messages. That sense of defeat is truly demoralizing. Self-doubt. We often feel we aren’t smart enough, patient enough, or logical enough. When we see others create websites or apps, we think, \u0026ldquo;They are amazing; I could never do that.\u0026rdquo; But I want to tell you a fact: the essence of programming is not writing code but solving problems.\nYou don’t need to become a programming expert; you just need to be someone who can articulate needs. Vibe Coding helps make \u0026ldquo;articulating needs\u0026rdquo; your core competency.\nVibe Coding is Simple: A Vague Idea + An AI is Enough You might not believe it, but the threshold for Vibe Coding is much lower than you think.\nYou only need two things:\nA vague idea. It doesn’t need to be specific or complete. Even a thought like, \u0026ldquo;I want to create a tool to help me organize my daily tasks,\u0026rdquo; is sufficient. A suitable AI tool. There are many AI tools available that support Vibe Coding. You simply describe your idea in plain language, and it will generate the code for you. For example, you might say, \u0026ldquo;Help me create a webpage with an input box that saves text locally and displays a history.\u0026rdquo; The AI will generate a complete HTML page for you to use in your browser. What Can AI Do Once You Have a Theme and Rough Idea? This is where Vibe Coding gets exciting. When you have a vague idea, the capabilities of AI exceed your imagination:\nBulk code generation. Writing a feature used to require hundreds of lines of code and a full day for a programmer. Now, you just need one sentence, and the AI can generate complete code in seconds. Not just snippets, but entire functional modules, pages, or even applications. Rapid iteration and improvement. You don’t need to have a perfect idea from the start. You can let the AI generate an initial version, then say, \u0026ldquo;Change the color to blue,\u0026rdquo; \u0026ldquo;Make the button larger,\u0026rdquo; or \u0026ldquo;Add a search function here.\u0026rdquo; The AI will immediately make the changes without you needing to know any code. Automatic running and debugging. Previously, debugging was the most dreaded part of coding. If the code didn’t run and errors occurred, you often had no idea what the problem was. Now, the AI can not only write code but also run it, find bugs, and fix them. You just need to tell it, \u0026ldquo;This feature isn’t responding,\u0026rdquo; and it will automatically troubleshoot and repair. Complete delivery from scratch. The entire process from \u0026ldquo;I have an idea\u0026rdquo; to \u0026ldquo;I have a usable tool\u0026rdquo; might only take a few minutes to tens of minutes. This was unimaginable in the past. AiPy: A Domestic Tool That Has Achieved This When discussing Vibe Coding, we must mention a domestic tool—AiPy.\nAiPy is currently a very mature Vibe Coding platform in China. Its biggest feature is that you don’t need to know any code; you just need to describe your needs in plain language, and it can help you complete your tasks.\nHere are a few practical scenarios using AiPy:\nYou say, \u0026ldquo;Help me create a countdown tool that shows how many days, hours, and minutes are left until a certain date.\u0026rdquo; AiPy will generate a beautiful countdown page. You say, \u0026ldquo;Help me analyze the data in this Excel sheet and generate charts.\u0026rdquo; AiPy will read your Excel file and automatically create data visualizations. You say, \u0026ldquo;Help me write a tool to batch rename files.\u0026rdquo; AiPy will generate an executable software that you can use with a double-click. Moreover, AiPy supports real-time improvements. If you’re not satisfied with what you generated, just tell it, \u0026ldquo;Change this part,\u0026rdquo; and it will modify it for you without you needing to write a single line of code.\nMy Personal Experience: A Business Student\u0026rsquo;s Journey with Vibe Coding Having discussed all this, I want to share my personal experience.\nI am a business student who studied marketing in college and now works in operations. My understanding of code was limited to knowing that Python exists. I didn’t even know what a command line was.\nBut earlier this year, I encountered the concept of Vibe Coding and started using AiPy to create some small tools.\nFirst Attempt: Creating a Reading Notes Management Tool\nI enjoy reading, but my notes are scattered—some on paper, some in my phone’s memo, and some I forget entirely. I wanted a tool to manage my reading notes in one place.\nI opened AiPy and said, \u0026ldquo;Help me create a reading notes management tool that can add book titles, authors, note content, and view history and export.\u0026rdquo; In less than five minutes, AiPy generated a webpage. The interface was clean, and all functions worked. I was amazed. I did nothing but say a few sentences and got my own tool.\nSecond Attempt: Creating an Automated Weekly Report Tool\nI write weekly reports every week, with a fixed format and repetitive content, taking half an hour each time. I told AiPy, \u0026ldquo;Help me create a tool where I input this week’s work content, and it can automatically generate a formatted weekly report and export it as a Word document.\u0026rdquo;\nThis time it was even faster, completed in minutes. Now, my weekly report writing time has reduced from half an hour to five minutes.\nThird Attempt: Creating a Data Analysis Tool\nOnce, I needed to analyze a batch of user data, but I didn’t know advanced Excel functions or SQL. I simply uploaded the data file to AiPy and said, \u0026ldquo;Help me analyze this data and see which product sold best and which region performed highest, generating charts.\u0026rdquo;\nAiPy read the data and automatically generated an analysis report with visual charts. I took screenshots for my PPT, and during the presentation, my boss praised me for the \u0026ldquo;great data presentation.\u0026rdquo;\nMy True Feelings:\nVibe Coding doesn’t make you a programmer; it enables you to become someone who can \u0026ldquo;use technology to solve problems.\u0026rdquo; You don’t need to know code; you just need to understand your needs. The AI takes care of turning those needs into reality, and you focus on articulating better requirements.\nThis ability will become increasingly important in the future workplace. As technology evolves, tools are becoming smarter and easier to use. Tasks that once required professional skills can now be accomplished by ordinary people. Even if you worry about token limits, use the invitation code c8W3 to receive two million tokens.\nNow is the Best Time to Start If you ask me who Vibe Coding is suitable for, my answer is: Everyone with ideas and needs who are blocked by technical barriers.\nYou want to create your own tool but don’t know programming. You want to automate repetitive tasks but don’t know where to start. You want to improve work efficiency but don’t want to spend months learning a new skill. You want to try creating something interesting but always feel, \u0026ldquo;I can’t do it.\u0026rdquo; If you identify with any of these points, then Vibe Coding is for you.\nDon’t wait until you feel \u0026ldquo;ready\u0026rdquo; to start. The greatest charm of Vibe Coding is that you don’t need to prepare; you just need an idea. Open AiPy, express your thought, and watch it help you realize it. This process itself is a brand new experience.\nTo summarize in one sentence:\nIn the era of Vibe Coding, you are not writing code; you are creating. And creation never requires a qualification certificate.\nNow is the best time to start.\nThree Tips for Beginners If you’re eager to try after reading this article, here are three tips to help you avoid pitfalls:\nStart with the smallest requirement. Don’t aim to create something as complex as a \u0026ldquo;chat app like WeChat\u0026rdquo; right away; that’s too difficult. Start with something simple like \u0026ldquo;Help me create a countdown tool\u0026rdquo; or \u0026ldquo;Help me create a to-do list.\u0026rdquo; The smaller and more specific the need, the easier it is for the AI to help you achieve it, and you’ll gain a sense of accomplishment. That sense of achievement is the best motivation to keep going. Don’t be afraid to say the wrong thing. Many people worry that their descriptions won’t be accurate or professional when they first use AI to write code. But you don’t need to worry at all. Just use plain language, as if you’re chatting with a friend. If the AI doesn’t understand you, just rephrase and try again. This process itself is a learning experience in communicating with AI. Treat Vibe Coding as a new skill to develop. You don’t need to become a programmer, but you can become someone who \u0026ldquo;solves problems with AI.\u0026rdquo; This ability may become as important in the next five years as knowing how to use Excel is today. Spend ten minutes each day creating a small tool with AI, and after a month, you’ll find your way of thinking has changed—you’ll no longer just passively accept existing tools but actively think, \u0026ldquo;Can I create one myself?\u0026rdquo; In Conclusion Vibe Coding doesn’t aim to take away jobs from programmers; it gives you a new ability—the ability to turn ideas into reality.\nIn this rapidly advancing AI era, the scariest thing isn’t being replaced by AI, but watching others use AI to create value while you remain a bystander.\nSo don’t wait any longer. Open AiPy, express your first thought, even if it’s small, childish, or immature. Every great tool starts with a vague idea.\nYou don’t need to become a programmer; you just need to be someone who dares to start.\n","date":"2026-05-06T00:00:00Z","permalink":"/posts/note-b10323e947/","title":"Introduction to Vibe Coding: Create Software Without Knowing Code"},{"content":"AI Job Replacement: Defining Human Boundaries in the Age of Artificial Intelligence AI technology is rapidly advancing, leading to scenarios where AI begins to take over human jobs. For Mr. Zhou, a 35-year-old former AI model quality inspection supervisor at a fintech company, this shift felt like a nightmare. After being demoted and having his salary cut from 25,000 to 15,000 yuan, he was ultimately terminated when negotiations failed. Following a labor arbitration and court trials, Mr. Zhou\u0026rsquo;s claims were upheld. The Hangzhou Intermediate People\u0026rsquo;s Court ruled that the company’s termination of his contract due to AI cost advantages did not constitute a significant change in objective circumstances that would justify the termination of the labor contract. The company was ordered to pay him over 260,000 yuan in compensation.\nWhat types of jobs are being replaced by AI, and why does Mr. Zhou\u0026rsquo;s experience resonate with many? Clearly, in the tide of the AI era, the protection of workers\u0026rsquo; rights faces new challenges.\nMr. Zhou, who previously verified and ensured the accuracy of AI-generated responses, was informed that his role could be replaced by AI due to technological upgrades. His lawyer, Jiang Xiaotong, argued that this position required knowledge of AI, finance, and project management, and that the company\u0026rsquo;s justification for layoffs was vague. Just how far has AI impacted jobs, and should this position be entirely replaced?\nThe court determined that the company’s actions constituted illegal termination of the labor contract. Judge Shi Guoqiang pointed out that simply citing cost reduction from AI implementation does not meet the legal standards for significant changes in objective circumstances that would render a labor contract unfulfillable. Current AI technology has not yet reached a level where it can substantively replace human jobs.\nRe-entering the competitive internet industry is no easy task. Mr. Zhou revealed that since his contract was terminated in January last year, he has struggled to find suitable employment.\nWhat rights of workers were violated in this incident? Will cases of AI replacing human jobs become more common? Lawyer Wu Hai analyzed that the company was shifting the burden of its operational costs and market risks onto workers, which does not comply with the legal conditions for contract termination. This violates both the law and the principle of fair employment, lacking legality and reasonableness. The court\u0026rsquo;s ruling has clearly defined the legal boundaries between technological innovation and worker rights protection, establishing an important benchmark for managing labor in new business models.\nAs technology continues to evolve, the replacement of basic and repetitive jobs by AI is becoming inevitable, and situations where labor contracts are not renewed will gradually increase. In the face of new industry transformations, workers cannot passively accept these changes. They must actively adapt to the evolving landscape, abandon outdated work mindsets, continuously enhance their professional skills, and develop capabilities that AI cannot easily replicate, such as innovation, decision-making, and comprehensive management.\nThe impact of AI is no longer just a challenge for individual companies or jobs; it is a societal issue that must be addressed.\nCurrently, AI applications for writing, design, and coding are becoming commonplace. According to Zhaopin, over 10% of workers reported that their companies have \u0026ldquo;digital employees,\u0026rdquo; with nearly half in roles related to external services and marketing, and some even supporting data analysis and strategic decision-making.\nThe latest white paper from the Hangzhou Intermediate People\u0026rsquo;s Court shows that in the city where AI development is most active, over 12,000 labor dispute cases were filed last year, a year-on-year increase of over 61%, with disputes related to AI and big data gradually rising.\nArticle 48 of the Labor Contract Law states that if an employer illegally terminates a labor contract and the worker requests to continue the contract, the employer must comply. This means that the law grants workers the right to return to their positions when faced with unfair treatment.\nIn this AI replacement case, the Hangzhou Intermediate People\u0026rsquo;s Court provided a positive directive: if a company needs to adjust positions, it should prioritize training employees, enhancing skills, and internal transfers rather than terminating contracts outright. Reasonable compensation should also be provided for increased commuting and housing costs due to job changes.\nEarlier this year, the Ministry of Human Resources and Social Security announced plans to accelerate the establishment of a monitoring and early warning system for the employment impacts of AI. The 14th Five-Year Plan also proposed comprehensive measures to address the employment effects of external environmental changes and the development of new technologies like AI.\nThe rise of digital employees, while enhancing productivity, will indeed affect job structures. In December last year, the Beijing Human Resources Bureau released a typical case where Liu, who had worked for years in traditional manual data collection, had his position eliminated as the company transitioned to AI-driven automated data collection. The labor arbitration committee ruled that the company\u0026rsquo;s justification for termination was essentially shifting the normal risks of technological iteration onto the worker, deeming the company\u0026rsquo;s actions illegal.\nRegardless of how AI evolves, the law must uphold the boundaries of worker rights. How can we balance technological advancement with the protection of workers\u0026rsquo; rights? From a top-level design perspective, what reasonable suggestions can be made? Lawyer Wu Hai emphasized the need for comprehensive planning during top-level design. Labor regulations should be detailed to clarify the standards for recognizing significant changes in objective circumstances when jobs are replaced by AI, and to delineate the responsibilities of companies in the context of technological upgrades. Policy measures should also mandate companies to fulfill obligations for retraining, democratic consultation, and reasonable compensation before layoffs.\nParticularly, layoffs solely due to AI replacement should be eliminated. A robust vocational skills training safety net should be established to ensure that while technology empowers and industries upgrade, the baseline of employment and livelihoods is also protected. Learning from the experiences and lessons of industrial era transformations is essential to achieve a win-win situation between technological progress and the protection of workers\u0026rsquo; rights.\n","date":"2026-05-04T00:00:00Z","permalink":"/posts/note-0876054e90/","title":"AI Job Replacement: Defining Human Boundaries in the Age of Artificial Intelligence"},{"content":"2026 AI programming tools competition has evolved from \u0026ldquo;who\u0026rsquo;s better at code completion\u0026rdquo; to \u0026ldquo;who\u0026rsquo;s smarter in architecture\u0026rdquo;.\nAccording to the latest SWE-bench Pro rankings, GPT-5.3-Codex achieved 56.8%, while Claude Opus 4.5 scored 55.4%—a mere one percentage point apart. However, the architectural differences are far more significant than benchmark numbers.\nToday, we will break down these two systems: how they are designed, their strengths, and how to choose between them.\n1. Understanding Each Tool Claude Code Claude Code is an AI programming agent developed by Anthropic, running in CLI mode on your terminal.\nCore Positioning: \u0026ldquo;Agentic coding assistant with application-layer governance priority\u0026rdquo;—it assumes your environment is trustworthy and focuses security controls on \u0026ldquo;you can intercept every action of the agent\u0026rdquo;.\nLatest version as of April 2026: 2.1.110, supports Claude Opus 4.7 (1 million token context, no long context premium).\nOpenAI Codex OpenAI\u0026rsquo;s revived Codex brand (originally a fine-tuned version of GPT-3 in 2021) is released in 2026 in both CLI and cloud container forms.\nCore Positioning: \u0026ldquo;Agentic coding assistant with kernel-level sandbox priority\u0026rdquo;—it assumes the environment may be untrustworthy, thus reinforcing OS-level boundaries before discussing efficiency.\nFlagship models:\nGPT-5.3-Codex-Spark (runs on Cerebras hardware, 1000+ tokens/second, first token latency \u0026lt; 100ms) GPT-5.4 (integrated coding + knowledge work, 272K standard window, double pricing beyond) 2. Fundamental Architectural Differences: Where Governance Occurs This is the most fundamental divergence between the two, determining all other differences.\nDimension Codex CLI Claude Code Security Execution Layer OS kernel layer (macOS Seatbelt / Linux seccomp+landlock) Application layer (26 lifecycle hooks) Interception Principle OS directly denies before system calls Hooks intercept and judge within the application Boundary Strength High: Agent cannot touch unauthorized resources below the application layer Medium: Shares process boundary with the agent Control Granularity Coarse-grained: three sandbox modes (read-only / workspace-write / danger-full-access) Fine-grained: regex-based pattern matching, can execute any logic Programmability Low Extremely high Summary: Codex is \u0026ldquo;the operating system helps you defend\u0026rdquo;, while Claude Code is \u0026ldquo;you write code to defend yourself\u0026rdquo;.\n3. Layer-by-Layer Breakdown: Six-Dimensional Comparison 3.1 Security Architecture Codex: Kernel Sandbox as a True Moat\nThree sandbox modes:\nread-only → 只能读，不能写任何文件 workspace-write → 只能写工作区，不能碰系统文件 danger-full-access → 完全信任（慎用） Cloud container mode: Code runs in an isolated container managed by OpenAI, with network access disabled by default, suitable for reviewing untrusted external code.\nClaude Code: 26 Hooks as a Weapon Library\nEach hook corresponds to a lifecycle event of the agent (PreToolUse, PostToolUse, Notification, etc.), allowing you to attach Bash scripts, Python scripts, and perform any action:\nPreToolUse (Bash) Hook: Check if the command contains rm -rf / Yes → Return exit code 2 → Block execution No → Continue PostToolUse Hook: Automatically run linter Automatically run security scan Automatically format code Key Trade-offs:\nCodex\u0026rsquo;s sandbox is stronger but less flexible—you can only choose three modes. Claude Code\u0026rsquo;s hooks are infinitely flexible but require you to write logic yourself, with theoretical risks of \u0026ldquo;malicious project configuration injection\u0026rdquo; (mitigated by project trust prompts). Conclusion: Review untrusted code → Codex; enforce team standards → Claude Code.\n3.2 Context Management Architecture Capability Codex Claude Code Context Window GPT-5.3/5.4: 1 million tokens Opus 4.7: 1 million tokens (no premium) Long Session Handling Credit fallback system (automatically falls back when hitting rate limits to avoid hard interruptions) Compaction API (server-side context summarization for \u0026ldquo;infinite\u0026rdquo; conversations) + Recap (restore interrupted sessions) Caching Mechanism Spark optimizes to reduce 80% round-trip overhead via WebSocket 1 hour cache TTL, can reduce effective input costs by 80-90% (large codebase scenarios) Claude Code\u0026rsquo;s Compaction is a Unique Advantage: As conversations lengthen, the server automatically summarizes historical context, avoiding wasted output tokens on previously discussed content. Codex\u0026rsquo;s \u0026ldquo;credit fallback\u0026rdquo; is a protective mechanism, not an efficiency optimization.\nConclusion: Long-term autonomous tasks → Claude Code; short, high-frequency interactions → Codex Spark.\n3.3 Multi-Agent Architecture Codex:\nSupports \u0026ldquo;sub-agents\u0026rdquo; that can override sandbox and approval settings at runtime and propagate to sub-agents. Codex cloud exec: Cloud task delegation, asynchronous result retrieval, not real-time monitorable. Suitable for \u0026ldquo;send it out to run, come back for results\u0026rdquo; autonomous tasks. Claude Code:\nClaude Managed Agents (public beta on April 8, 2026): Fully managed agent framework. Advisor Tool (April 9): Rapid execution model + high-intelligence advisor model pairing. Sub-agents generated through Task tools, with isolated contexts, supporting real-time interaction and intervention. \u0026ldquo;Deliberation Mode\u0026rdquo;: Multiple sub-agents critique each other, capturing issues easily overlooked by a single agent. Key Difference: Claude Code\u0026rsquo;s multi-agent system is \u0026ldquo;visible and intervenable\u0026rdquo;; Codex\u0026rsquo;s is \u0026ldquo;send out and asynchronously retrieve results\u0026rdquo;.\nConclusion: Real-time monitoring for complex restructuring → Claude Code; asynchronous long-term tasks → Codex.\n3.4 Inference Speed and Interaction Experience This is Codex Spark\u0026rsquo;s absolute killer feature:\nMetric Codex Spark (GPT-5.3) Claude Code Fast Mode (Opus 4.6) Token Generation Speed 1000+ tokens/second Up to 2.5x speedup (relative to standard mode) First Token Latency \u0026lt; 100ms Not disclosed, significantly higher than Spark Interaction Experience \u0026ldquo;Thinking in sync with AI\u0026rdquo; \u0026ldquo;Waiting for AI to think\u0026rdquo; Spark\u0026rsquo;s speed advantage is real and significant—it transforms the interaction mode from \u0026ldquo;waiting for AI\u0026rdquo; to \u0026ldquo;writing together with AI\u0026rdquo;.\nBut speed is a double-edged sword: Spark\u0026rsquo;s high speed relies on Cerebras dedicated hardware, and the model is a distilled/quantized version, accuracy may be slightly lower than full Opus 4.7.\nConclusion: \u0026ldquo;Vibe coding\u0026rdquo; → Codex Spark; precise control → Claude Code.\n3.5 Benchmark Performance Important background: SWE-bench Verified has been confirmed to have data contamination (all leading models can reproduce golden patches), vendors have stopped reporting Verified scores.\nLooking at uncontaminated standard SWE-bench Pro (1865 multilingual tasks):\nSystem Base Model Vendor Reported SEAL Standardized GPT-5.3-Codex (CLI) GPT-5.3-Codex 56.8% Claude Code Opus 4.5 55.4% Next, examining controlled conditions SWE-rebench (measuring production conditions):\nClaude Opus 4.6: 1st place, pass@5 (success rate within 5 attempts) higher than all other models. GPT-5.4: Top 5, known for significantly lower token consumption. Terminal-Bench 2.0:\nModel Score Gemini 3.1 Pro 78.4% GPT-5.3-Codex 77.3% Claude Opus 4.6 74.7% Comprehensive Judgment:\nCodex is stronger in Terminal-Bench (terminal interaction tasks). Claude Code excels in pass@5 reliability (more critical in production environments). There is about a 10-point optimization gap between vendor reports and third-party standardization—the framework itself is as important as the model. 3.6 Pricing and Cost Optimization Model Input $/M Token Cache Input Output $/M Token GPT-5.3-Codex (Standard) $1.75 $0.175 $14.00 GPT-5.3-Codex (Priority) $3.50 $0.35 $28.00 Claude Opus 4.6 $5.00~input 10% $25.00 Claude Sonnet 4.6 $3.00~input 10% $15.00 Cost Optimization Key Points:\nClaude Sonnet 4.6 offers outstanding cost performance: Only 1.2 percentage points lower than Opus 4.6 on SWE-bench Verified, but costs 5 times less. Claude\u0026rsquo;s caching mechanism: 1-hour TTL can reduce effective input costs by 80-90% in large codebase sessions. Codex subscription bundles: Included in ChatGPT Plus ($20/month) and Pro ($200/month), with Pro plan offering a temporary 10x Plus limit discount until May 31, 2026. Conclusion: High-frequency use → Claude Sonnet 4.6 + caching optimization; occasional use → Codex subscription is more cost-effective.\n4. Practical Use Cases: How to Choose and Use Scenario 1: Handling Large Unknown Codebases Choose Claude Code (Opus 4.6)\nReason:\nOpus 4.6 has undergone specialized planning training in coding workflows—first making clear plans before execution, raising clarification questions in advance. The Compaction mechanism allows for \u0026ldquo;infinite\u0026rdquo; conversations without losing previous understanding due to context overflow. The /review command can autonomously trigger code reviews, suitable for taking over legacy projects. Usage Suggestions:\n# Let Claude Code first understand the codebase \u0026gt; Please read the entire codebase and provide an architecture diagram and key module descriptions. # Then ask it to make specific modifications \u0026gt; Based on the previous architectural understanding, refactor module X, focusing on Y. Scenario 2: Rapid Prototyping / Creative Coding Choose Codex Spark (GPT-5.3-Codex-Spark)\nReason:\nThe speed of 1000+ tokens/second makes \u0026ldquo;vibe coding\u0026rdquo; truly feasible—you think of a feature, and it almost synchronously writes the code. In demonstrations, it autonomously built two playable games using only general prompts like \u0026ldquo;fix bugs\u0026rdquo; and \u0026ldquo;improve the game\u0026rdquo;. Suitable for the creative phase of \u0026ldquo;getting it running first, then slowly modifying\u0026rdquo;. Usage Suggestions:\n# In Spark mode, describe ideas directly in natural language \u0026gt; Help me create a typing game with a countdown and score tracking. # Spark will generate code in real-time, allowing you to watch and modify as it goes. Scenario 3: Reviewing Untrusted External Code Choose Codex (Kernel Sandbox Mode)\nReason:\nOS-level sandbox ensures malicious code cannot breach file system or network restrictions. Cloud container mode provides complete isolation for execution, suitable for open-source project contributors reviewing PRs. Usage Suggestions: bash\n# Use read-only sandbox to review PR codex --sandbox-mode read-only Scenario 4: Enforcing Team Coding Standards Choose Claude Code (Hook System)\nReason:\n26 hooks can execute any logic, not limited to \u0026ldquo;allow/deny\u0026rdquo;. Can automatically run linters, formatters, and security scans without manual triggering. Configurations can be hierarchically merged (global → project → local), suitable for team-wide management. Usage Example (PostToolUse Hook to automatically run tests): json\n// .claude/settings.json { \u0026#34;hooks\u0026#34;: { \u0026#34;PostToolUse\u0026#34;: [ { \u0026#34;matcher\u0026#34;: \u0026#34;Edit|Write\u0026#34;, \u0026#34;hooks\u0026#34;: [{\u0026#34;type\u0026#34;: \u0026#34;command\u0026#34;, \u0026#34;command\u0026#34;: \u0026#34;npm test\u0026#34;}] } ] } } Scenario 5: Long-Term Autonomous Tasks in Production Environment Choose Claude Code (Opus 4.6) + Compaction\nReason:\nCompaction + Recap ensures tasks do not fail due to context overflow. Push notification mechanism allows you to leave during task execution and receive notifications upon completion. Highest reliability in pass@5, as production environments cannot afford \u0026ldquo;occasional failures\u0026rdquo;. 5. Best Practices for Combined Use High-end developers in 2026 have begun to use both systems simultaneously, allowing them to complement each other:\nReview untrusted code → Codex (kernel sandbox) ↓ After review Daily coding development → Claude Code (hook governance) ↓ Need for rapid prototyping Creative exploration phase → Codex Spark (1000+ tokens/second) ↓ Back to precise control Production deployment → Claude Code (pass@5 reliability) Configuration files are completely independent: .claude/settings.json and codex/config.yaml do not conflict and can coexist within the same codebase.\nBlake Crosley\u0026rsquo;s actual case: Claude Code (Opus) discovered a timing side-channel vulnerability in password comparison, while Codex\u0026rsquo;s kernel sandbox physically intercepted SSRF requests pointing to internal IPs. Different models capture different types of vulnerabilities due to varying training data.\n6. Core Conclusions for 2026 Architectural Level Dimension Winner Reason Hard Security Boundary Codex OS kernel-level interception, agents cannot bypass Governance Flexibility Claude Code 26 hooks, any logic, strong enforcement of team standards Long-Term Task Reliability Claude Code Compaction + Recap, highest pass@5 Interaction Speed Codex Spark 1000+ tokens/second, first token \u0026lt; 100ms Cost Optimization Claude Sonnet Highest cost performance, mature caching mechanism Multi-Cloud Deployment Claude Code Supports Bedrock/Vertex AI/Foundry Selection Recommendations (One-Sentence Version) Handling untrusted code or trading speed for creativity → Codex; production environment, large codebases, team standard enforcement → Claude Code.\nGreater Insight The competition among AI programming tools in 2026 is no longer about \u0026ldquo;whose model is smarter\u0026rdquo; but \u0026ldquo;whose scaffolding understands developers\u0026rsquo; actual workflows better\u0026rdquo;.\nBenchmark scores differ by 1-2 percentage points, far below the 10-point improvement brought by scaffolding optimization.\nWhat truly matters: Permission systems, sandbox configurations, caching architectures, context management—these \u0026ldquo;non-model\u0026rdquo; aspects are the watershed for AI programming tools in 2026.\n","date":"2026-05-03T00:00:00Z","permalink":"/posts/note-84d96421de/","title":"2026 AI Programming Tools Comparison: Claude Code vs. Codex"},{"content":"Vibe Coding: A New Era for Programmers Recently, a term has gained popularity in Silicon Valley: Vibe Coding. This translates roughly to \u0026ldquo;atmospheric programming\u0026rdquo; or \u0026ldquo;feeling programming\u0026rdquo;. In simple terms, it means:\nYou describe the requirements, AI writes the code, and you are responsible for acceptance.\nThis is not about assistance or code completion; it’s about AI taking over the entire coding process. You only need to tell it, \u0026ldquo;I want a script that can automatically send emails,\u0026rdquo; and it will write the code, run it, and even fix bugs.\nSound like science fiction? Not at all; some individuals have already created products in a week that would have previously taken a team a month to complete.\nWhat is Vibe Coding? The term was introduced this year by Andrej Karpathy, former AI director at Tesla and a founding member of OpenAI. His main point is:\n\u0026ldquo;The way I write code now is by talking to AI. I present my ideas, AI implements them, I test, and I provide feedback. I hardly look at the code anymore; I just observe the results.\u0026rdquo;\nHow Does It Work? Describe the requirement: \u0026ldquo;Help me create a webpage with a to-do list on the left and a completed list on the right, with drag-and-drop functionality.\u0026rdquo; AI generates the code: Tools like Claude, Cursor, and GitHub Copilot directly output the complete code. Run tests: Check if the results are correct. Provide feedback: \u0026ldquo;Change the button color to blue\u0026rdquo; or \u0026ldquo;Add a delete confirmation popup.\u0026rdquo; AI continues to adjust: Repeat until satisfied. Throughout this process, you may not write a single line of code yourself.\nHow Does This Differ from Traditional Programming? Here’s a comparison:\nDimension Traditional Software Engineering Vibe Coding Core Action Writing code, debugging, refactoring Describing requirements, acceptance, feedback Code Ownership Programmer has complete control AI generates, human oversees Mindset Machine-oriented (syntax, algorithms, data structures) Problem-oriented (what users want, business logic) Skill Focus Mastery of languages, frameworks, performance optimization Requirement breakdown, product thinking, acceptance criteria Development Speed Measured in days/weeks Measured in minutes/hours Code Quality Depends on programmer\u0026rsquo;s skill, relatively stable Depends on AI capability and prompt quality, more variable Debugging Method Reading code, setting breakpoints, line-by-line checks Observing results, providing feedback, letting AI retry Applicable Scenarios Large systems, core architecture, performance-sensitive MVPs, prototypes, utility scripts, internal systems The difference is clear:\nTraditional programming is \u0026ldquo;I build a car myself,\u0026rdquo; while Vibe Coding is \u0026ldquo;I describe what car I want, the factory builds it, I test drive it, and if I’m not satisfied, I make changes.\u0026rdquo;\nHow to Get Started with Vibe Coding If you want to try it out, here’s how:\nTool Selection Currently, the most popular tools include:\nCursor: Based on VS Code, integrates Claude/GPT, allowing direct interaction to generate code within the editor. Claude + Browser: Describe requirements on the Claude web version and copy the code to run locally. GitHub Copilot: Integrated into IDEs, provides real-time code suggestions, suitable for semi-automated modes. Windsurf / Bolt: Emerging AI programming tools that generate complete projects with one click. OpenCode: A domestic open-source AI programming assistant that supports multiple model switches (Claude, GPT, DeepSeek, etc.) and is optimized for Chinese scenarios, making it suitable for local developers. A Real Example Suppose I want to create a \u0026ldquo;web-based calculator.\u0026rdquo;\nTraditional Method:\nOpen the editor and create a new HTML file. Write the HTML structure, CSS styles, and JavaScript logic. Handle button clicks, calculation logic, and error boundaries. Debug various edge cases (division by zero, excessively long numbers, etc.). Takes about 2-3 hours. Vibe Coding Method:\nOpen Cursor and create a new file. Input: \u0026ldquo;Help me create an attractive web calculator that supports addition, subtraction, multiplication, and division, with a history record and modern CSS style.\u0026rdquo; AI generates complete code (30 seconds). Run tests, discover an issue: \u0026ldquo;The history record lacks a clear button.\u0026rdquo; AI adds the clear function (10 seconds). Test again, satisfied, done. Total time: 5 minutes. This is not an exaggeration; it’s real.\nThe Impact of Vibe Coding on Traditional Software Engineering Many veteran programmers might be skeptical:\n\u0026ldquo;Can this really work? Can AI-generated code go live? Who is responsible if something goes wrong?\u0026rdquo;\nThese concerns are valid. Let’s analyze the boundaries and impacts of Vibe Coding objectively.\n1. What Scenarios Are Suitable for Vibe Coding? ✅ Very Suitable:\nPrototype validation (MVP) Internal tools, scripts Personal projects, side jobs Standardized features (CRUD, forms, display pages) ❌ Currently Unsuitable:\nCore financial trading systems Safety-critical systems like aviation and healthcare Scenarios requiring extreme performance optimization Complex architectural designs, distributed systems 2. How Is the Role of Programmers Changing? It’s not about being replaced; it’s about upgrading.\nPreviously, a programmer\u0026rsquo;s core ability was to \u0026ldquo;translate requirements into code.\u0026rdquo;\nIn the future, a programmer\u0026rsquo;s core ability will be to \u0026ldquo;break down business problems into instructions that AI can understand and verify the results are correct.\u0026rdquo;\nIn other words:\nFrom \u0026ldquo;coder\u0026rdquo; to a hybrid of \u0026ldquo;architect + product manager + tester\u0026rdquo; From \u0026ldquo;writing code\u0026rdquo; to \u0026ldquo;designing systems + directing AI + ensuring quality\u0026rdquo; From \u0026ldquo;technical depth\u0026rdquo; to \u0026ldquo;technical breadth + business understanding + judgment\u0026rdquo; 3. The Real Impact on Traditional Software Engineering I believe there are three levels of impact:\nFirst Level: Disruption of Development Efficiency\nPreviously, it might take two weeks to go from requirement to launch for a feature. Now, with Vibe Coding, it could be completed in two hours.\nWhat does this mean?\nSmall teams can accomplish what large teams do. The cost of trial and error is extremely low, allowing for rapid validation of ideas. The phrase \u0026ldquo;one person is a company\u0026rdquo; is no longer just a slogan. Second Level: Shift in Skill Requirements\nPreviously, interviews focused on: can you write a red-black tree, understand memory management, or be familiar with Spring source code?\nIn the future, interviews may focus on: can you clearly describe requirements, understand business logic, and design acceptance criteria?\nThe importance of pure technical depth is declining, while the demand for a combination of technical and business skills is rising.\nThird Level: Reconstruction of Software Engineering Processes\nTraditional process: Requirement review → Technical plan → Coding → Code review → Testing → Launch\nVibe Coding process: Describe requirements → AI generates → Acceptance testing → Launch (or feedback and regenerate)\nCode review may become \u0026ldquo;prompt review\u0026rdquo;—evaluating not the code, but how well you describe the requirements.\nAnxiety for Veteran Programmers and Opportunities for New Programmers I know many veteran programmers may feel anxious reading this.\n\u0026ldquo;I’ve been writing code for ten years, and now I’m told I don’t have to write anymore?\u0026rdquo;\nDon’t panic. Vibe Coding is not a silver bullet; it has its own issues:\nProblem 1: AI Can Make Mistakes, Often Subtly\nAI-generated code may run on the surface but could contain security vulnerabilities, performance issues, or unhandled edge cases. Launching without review could lead to serious problems.\nProblem 2: AI Struggles with Complex Systems\nA simple webpage can be handled by Vibe Coding, but a microservices architecture involving distributed transactions, data consistency, and high concurrency is still beyond AI’s current capabilities.\nProblem 3: Maintenance Is a Major Concern\nAI-generated code may have inconsistent styles, lack comments, and contain convoluted logic. Three months later, you might not even understand it, let alone maintain it.\nThus, my judgment is:\nVibe Coding will replace some of the \u0026ldquo;pure execution layer\u0026rdquo; coding tasks but will not replace the \u0026ldquo;design layer\u0026rdquo; and \u0026ldquo;oversight layer\u0026rdquo; tasks.\nThe truly valuable programmers will always be those who:\nCan judge whether AI-generated code is correct. Can design system architectures that guide AI. Can understand business and translate it into technical solutions. Three Suggestions for Programmers 1. Learn to \u0026ldquo;Direct\u0026rdquo; Rather Than \u0026ldquo;Do It Yourself\u0026rdquo; Treat Vibe Coding as your subordinate. You won’t personally tighten every screw, but you need to understand the blueprint, ensure quality, and point out what’s wrong.\nPracticing writing effective prompts is more important than practicing the syntax of a programming language.\n2. Deepen Business Understanding, Don’t Just Focus on Technology AI can help you write code, but it doesn’t understand your company’s business logic, industry rules, or user habits.\nTechnology can be replaced, but business understanding cannot.\n3. Maintain Understanding of Underlying Principles You may not write code yourself, but you should know how the code runs. Otherwise, if AI provides a solution with obvious performance issues, you won’t be able to spot it.\nVibe Coding doesn’t mean you abandon technology; it means you transition from a technical executor to a technical decision-maker.\nConclusion Every technological revolution prompts some to shout, \u0026ldquo;Programmers will be unemployed!\u0026rdquo;\nBut history tells us: The more powerful the tools, the more valuable those who know how to use them become.\nVibe Coding is not the end for programmers but a direction for their evolution.\nFrom \u0026ldquo;code writers\u0026rdquo; to \u0026ldquo;problem solvers with code\u0026rdquo;—that is the essence.\nHave you tried Vibe Coding? How was your experience? Feel free to discuss in the comments.\n","date":"2026-05-02T00:00:00Z","permalink":"/posts/note-f11d999289/","title":"Vibe Coding: Programmers Transition from Writing Code to Directing AI"},{"content":"\nThe Core Nature and Essential Foundations of AGI The exploration of Artificial General Intelligence (AGI) has always been one of the most disruptive directions in human technological advancement. Currently, the global understanding of AGI is shifting from purely algorithmic and text-based intelligence back to its embodied, physical, and social essence. True AGI is not limited to information processing programs in virtual internet spaces; it is an intelligent system that centers on visual perception, utilizes robots as physical carriers, deeply understands physical rules and the full spectrum of human social knowledge, and possesses autonomous learning, adaptability, and safety constraints. Its development and implementation are not merely a single technological iteration process but a comprehensive transformation intertwined with capital drives, national strategies, civilizational differences, social structural changes, and even warfare. A complete, objective, and forward-looking AGI research system can only be constructed by integrating the essence of technology, real-world driving forces, social impacts, and global competition, clarifying its development path, value significance, and potential risks.\nThe Core Essence and Necessary Foundations of AGI In the current field of artificial intelligence, there exists a cognitive misconception that equates the language reasoning and data processing capabilities of large models with the general capabilities of AGI. This completely deviates from the core of AGI. The \u0026ldquo;general\u0026rdquo; in AGI refers to adapting to all scenarios of human reality, understanding comprehensive human needs, and integrating into the operational fabric of human society, rather than the universality of information processing in virtual spaces. This determines that AGI must possess irreplaceable core capabilities while relying on indispensable foundational elements.\nVisual capability is the absolute core of AGI\u0026rsquo;s understanding of the world and the realization of general intelligence. Humans rely on vision to complete most of their cognition of the physical world, social interactions, and expressions of humanity. For AGI to achieve true generality, it must replicate this cognitive path by fully observing the world through a visual system, capturing and internalizing everything from the physical properties of objects, spatial structures, and motion laws to human social expressions, body language, language interactions, and behavioral logic, as well as the human traits, interests, and social rules hidden behind these behaviors. This visual capability is not merely image recognition; it is a deep cognitive ability to understand unspoken human intentions, comprehend complex social scenarios, and perceive dynamic changes in the physical world. Without visual perception, AGI cannot understand real human society or adapt to various scenarios of real life, production, and social interaction.\nThe large-scale socialization and popularization of the robotics industry is the only foundational prerequisite for AGI\u0026rsquo;s implementation. Without robots as physical carriers, AGI would remain a \u0026ldquo;brain in a jar\u0026rdquo; suspended in virtual space. Currently, functional robots can only complete single tasks such as mass production and simple operations in closed environments like factories and laboratories, representing isolated, fragmented intelligent devices that cannot form a systematic understanding of society or support the generalized development of AGI. Only when service-oriented and general-purpose robots are widely integrated into industrial production, military operations, public services, and daily life, forming a robot industry ecosystem covering society as a whole, can AGI rely on a vast number of carriers to achieve comprehensive interaction with the real physical world and human society, learning physical rules, social norms, and human characteristics through genuine interactions, gradually improving its cognitive system and adaptability. Simple interactions of individual robots can never give birth to general intelligence; only when the robotics industry achieves universal, large-scale, and comprehensive popularization can it provide AGI with sufficient rich learning scenarios, interaction data, and practical experience, which is the core prerequisite for AGI to transition from theory to reality and an insurmountable stage of development.\nAt the same time, AGI must deeply integrate the full spectrum of human knowledge and common sense, mastering the underlying rules of the physical world while being familiar with human history, culture, and social operational logic. It must understand the essence of human social behavior, whether in daily work, travel, and dining, or in the human traits of interests, emotional expressions, cooperation, and competition, as well as the unspoken rules of social operation and boundaries of interpersonal interactions. To achieve its initial goal of serving human society, AGI must comprehensively, meticulously, and deeply understand the organizational forms, operational modes, developmental laws, and various life scenarios of human society. This understanding cannot be solely achieved through data and videos fed into computers; it must be gradually accumulated through real interactions in society, which is also the core prerequisite for AGI to possess autonomous learning, situational adaptability, and the ability to understand unknown scenarios.\nThe Core Value of AGI Implementation and the Construction of Human-Machine Social Relationships The value and transformation brought by the full implementation of AGI will be disruptive, far exceeding the current role of functional artificial intelligence. Its core value is first reflected in the refinement, emotionalization, and stratification of services to human society. Current robots and artificial intelligence can only perform simple operations under human commands, whether in industrial mass production or basic life services, acting merely as passive execution tools. In contrast, robots empowered by AGI possess deep intention understanding capabilities, able to read human eyes, emotions, latent desires, and unspoken needs through visual and body language, achieving a transformation from \u0026ldquo;tool execution\u0026rdquo; to \u0026ldquo;active service.\u0026rdquo;\nIn interactions with humans, AGI robots can construct stratified human-machine social relationships based on different scenarios and needs. They can serve as efficient collaborators, caring companions, friends, or even intimate partners that meet specific needs, clearly distinguishing different levels of social distance and boundaries. This precise definition of relationships and control of distance is an important hallmark of AGI maturity and the core key to the long-term stable development of human-machine interaction. As humans interact with AGI robots, new social models will gradually form, and reasonable human-machine boundaries and social distances will not only allow humans to enjoy the conveniences brought by intelligence but also avoid social ethical issues arising from the alienation of human-machine relationships. This process needs to be gradually perfected alongside the maturity of AGI and is an essential phase for humanity to adapt to an intelligent society.\nMore importantly, the implementation of AGI will comprehensively reduce the operational costs of society and promote a leap in the efficiency of human social production, governance, and services. In the industrial production sector, AGI robots can achieve fully autonomous and intelligent production processes, completely replacing traditional labor, significantly improving production efficiency and reducing production costs. In the fields of social governance and public services, from urban management, social security, and traffic operations to government services, sanitation, and livelihood guarantees, the comprehensive involvement of AGI and robots will eliminate inefficiencies, redundancies, corruption, and negligence inherent in traditional human governance, making social governance fairer, more efficient, and more transparent. This cost reduction will directly alleviate the financial burden on the state and the tax pressure on the public, allowing saved social resources to be further invested in technological research and development, public welfare, and industrial upgrades, ultimately achieving a comprehensive improvement in human quality of life and social development levels.\nAutonomous safety constraint capability is the foundational bottom line for AGI to serve humanity and the core manifestation of AGI\u0026rsquo;s true intelligence. In the process of high interaction and full-scenario penetration with humans, AGI robots will inevitably encounter unexpected situations, malicious inducements, or even be manipulated to harm humans. A truly mature AGI must possess autonomous safety judgment and behavioral control capabilities, able to anticipate potential human harm and property loss caused by its actions, proactively stop operations before risks occur, or even reverse operations to avoid risks while resisting external malicious commands and human tampering, always adhering to the core principle of serving humanity and never harming humans. AGI that lacks safety awareness and self-control capabilities will only become a hazard to human society. Only by achieving autonomous safety control across all time periods and scenarios can AGI stably, long-term, and harm-free serve humanity, thus being considered true mature general artificial intelligence; otherwise, it remains merely an experimental or failed product on the path to AGI.\nThe Accelerated Development Logic of AGI and Phased Implementation Path Traditional academia predicts that the development of AGI will take 30 to 50 years based on natural technological iterations. This is a purely academic conservative estimate, completely detached from the current global driving forces behind AGI development, ignoring the exponential development speed driven by top capital and national strategies. Currently, AGI and the robotics industry are no longer merely technological research projects but are core battlegrounds for top global capital and major technological powers like the US and China. Massive capital is flooding into core tracks such as visual systems, embodied intelligence, and mass production of robots, with countries like the US and China listing it as a national strategy, pooling national strength, technological resources, and capital resources to promote it. This combined push allows the development speed of AGI to far exceed the normal pace of technological evolution, with significant differentiation in progress among different enterprises and technological routes. Leading companies and advantageous countries may very well integrate AGI robots into human society to provide basic services within the next decade, even if the technology is not yet perfect. As long as it meets the basic standards of usability, controllability, and cost reduction, it will quickly be implemented and iterated in practice.\nCombining real driving forces with social demand priorities, the implementation of AGI cannot be achieved overnight; it must follow a gradual path from industrial production to military fields, public services, and daily life. This process can be divided into three core stages, significantly compressing the overall cycle to within 30 years, distinctly different from the conservative predictions of academia:\nFirst Stage (1-10 years): This is the initial commercial implementation period, with core landing scenarios in industrial production and military fields. The structured nature and clear demands of industrial production make it the first area for large-scale popularization of AGI robots, achieving comprehensive intelligent replacement in assembly lines, high-risk operations, and logistics. The military field, as a high point of competition among major powers, will invest trillions in capital to advance the development and deployment of unmanned combat troops, drone swarms, and intelligent combat robots. Unmanned military forces will gradually become the mainstay, with real combat and military exercises serving as core scenarios for rapid iteration of AGI technology. AGI robots in this stage may not be fully mature but will meet the basic needs of industrial production and military operations, marking the initial scale development of the robotics industry. Second Stage (10-20 years): This is the period of comprehensive popularization and promotion, with core landing scenarios in social public services and urban governance. As AGI technology continues to improve, robots will fully engage in public service areas such as social security, urban management, government services, traffic operations, and sanitation. The grassroots social governance will achieve comprehensive intelligence, leading to large-scale reductions in traditional civil servant systems, with the vast majority of repetitive and execution-based grassroots positions replaced by AGI robots, significantly reducing social operational costs by over 50% or even more. At the same time, the entire transportation system will achieve intelligent and unmanned operations, with automatic driving, unmanned freight, and intelligent control of public transportation fully realized, closing the physical flow loop between social production and daily life. In this stage, AGI\u0026rsquo;s social cognition and autonomous safety capabilities will mature, and the boundaries and rules of human-machine interaction will gradually improve.\nThird Stage (20-30 years): This is the period of complete maturity and integration, with core landing scenarios in household and private life. Based on the completion of technology, ethics, and safety, household service robots will become widespread, and AGI will achieve precise adaptation to human emotions, privacy, and personalized needs, leading to harmonious coexistence between humans and machines. AGI will meet mature standards of full-scenario, full-time autonomous learning and autonomous safety control, truly integrating into every corner of human society, achieving the ultimate goal of general artificial intelligence.\nIn terms of implementation experiments, the principle of gradual progression must also be followed, from small-scale trials in closed scenarios to community pilot projects and then to commercial promotion, gradually expanding the scope, accumulating experience, and refining technology, never allowing for direct comprehensive rollout. Borrowing from the testing logic of autonomous driving, AGI robots need to complete massive hours and miles of safety testing in specific scenarios and regions to verify service capabilities, safety control capabilities, and social adaptability, gradually expanding their applicable range to ultimately achieve widespread societal integration. This gradual testing approach is a necessary means to mitigate risks and ensure the healthy development of AGI.\nThe Differentiated Pattern of Global AGI Development and Civilizational Institutional Differences The implementation and popularization of AGI globally will not occur synchronously or uniformly; the industrial foundations, cultural civilizations, religious beliefs, social systems, and ethnic structures of different countries and regions will inevitably lead to significant differentiation in global AGI development. The differences in development between China and the US are the most representative and illustrate the core influence of civilization and institutional factors on AGI implementation. This difference is not merely a technical gap but a disparity in underlying acceptance, execution power, and industrial capability.\nChina possesses natural comprehensive advantages for AGI development and implementation, making it the most suitable country for the popularization of AGI and robotics. Culturally, Chinese civilization has always adhered to the core traits of pragmatism and realism, with a high acceptance of new things and technologies among the populace, lacking deep-rooted religious constraints and ethical objections to robots and humanoid intelligence, allowing for rapid adaptation to human-machine interactive lifestyles and proactive integration into an intelligent society. This cultural trait means that the social promotion of AGI robots faces almost no ideological resistance. From a social governance perspective, China has a centralized and highly effective governance system that can quickly formulate strategies and advance policies from the top down, concentrating national resources to promote AGI in military, police, public service, and industrial production fields without facing institutional fragmentation or conflicting interests, leading to globally leading policy advancement efficiency. From an industrial foundation standpoint, China has the most complete and advanced industrial supply chain globally, capable of mass-producing robots at low costs, providing solid hardware support for the large-scale implementation of AGI and facilitating rapid popularization of robots.\nIn contrast, the US, despite having some technological advantages, faces numerous insurmountable obstacles that slow down the implementation of AGI compared to the speed of technological research. The US is heavily influenced by Christian religious culture, leading some citizens to resist humanoid intelligent robots participating in social governance and life services from an ethical perspective, believing that artificial intelligence should not possess human-like cognition and social attributes. Its federal and state separation political system results in difficulties in unifying AGI-related legislation and policy standards, with each state acting independently and failing to form a national development plan, leading to extremely low policy advancement efficiency. Simultaneously, human rights organizations, labor unions, and various interest groups continuously resist the replacement of human jobs by robots due to their own interests, compounded by the evident cultural conflicts and divisions among diverse ethnic groups, resulting in significant disparities in acceptance and demand for AGI robots. This phenomenon mirrors the differences in the rollout of mobile payments in China and the US; it is not that the US lacks relevant technology, but rather that institutional, cultural, religious, and interest conflicts have become invisible shackles on AGI implementation, ultimately creating a situation of \u0026ldquo;technological leadership but implementation lag.\u0026rdquo;\nBeyond China and the US, the differences in AGI development in other regions are even more pronounced. Countries and regions like India and the Middle East face significant resistance to AGI implementation due to rigid caste systems, complex ethnic language conflicts, or strong religious doctrines that lead to low public acceptance of AGI robots and weak industrial foundations, keeping them on the periphery of global AGI development. Conversely, friendly nations and regions in Africa and Latin America will benefit from China\u0026rsquo;s technological output and industrial support, enjoying the services of AGI robots in fields such as healthcare, logistics, food, and urban management, achieving rapid leaps in intelligence. Over time, a global structure will emerge with China at its core leading the intelligent system, the West lagging in development, and religious regions remaining on the margins, resulting in three distinct systems with completely fragmented technological standards, robot models, service paradigms, and human-machine interaction modes. The uneven global development of AGI will become a long-term norm, with different regions at entirely different stages of intelligent development within the same timeframe.\nThe Core Driving Role of Unmanned Warfare in AGI Development Unmanned warfare will be the core form of future great power competition and the strongest catalyst for the leapfrog development of AGI technology and the robotics industry. Its driving role in AGI development far exceeds the capital and market forces in the civilian sector, permeating the entire process of AGI research, iteration, implementation, and refinement, and serving as a key force in reshaping the global AGI landscape. This role is a core logic that cannot be overlooked in global AGI development.\nIn future warfare, AGI and robots will permeate every aspect of war, from pre-war strategic decision-making and combat command to frontline operations and tactical execution, and post-war strategic occupation, regional control, urban reconstruction, and order maintenance, with the entire process led by general artificial intelligence and executed by robots. Major powers will invest trillions in capital to seize military high ground, advancing the development of core technologies such as AGI visual perception, battlefield decision-making, swarm collaboration, autonomous safety, and robot adaptation. The complexity, unpredictability, and harshness of war scenarios will compel AGI technology to rapidly overcome shortcomings, refine cognitive systems, and enhance adaptability and safety control capabilities. This practical technology iteration is unmatched by any civilian scenario, enabling AGI to achieve qualitative leaps in a very short time.\nFuture military forces will consist of army-level combat clusters formed by dozens of unmanned groups, capable of controlling key areas of continents, energy production bases, strategic transportation routes, and crucial maritime lanes, while also undertaking tasks such as regional stability maintenance, rebellion suppression, and post-war reconstruction. Countries that win unmanned wars will not only gain control over global core resources and strategic locations but will also export their AGI technology standards, robot manufacturing systems, and human-machine interaction paradigms to friendly nations and regions, forming a technological hegemony centered around themselves and leading the global AGI development rules.\nSimultaneously, the AGI technology and robotics industry in the military sector will gradually transition to civilian applications, driving the intelligent upgrades of industrial manufacturing, public services, and daily life scenarios. The core technologies, mature experiences, and safety systems developed for military purposes will significantly shorten the R\u0026amp;D cycle for civilian AGI and reduce the costs of technological trial and error. Without the impetus and push from unmanned warfare, the development speed of AGI would slow down significantly, and the global AGI landscape would be challenging to form quickly. Warfare is not only a contest of great power strength but also the core driving force behind the maturity of AGI technology, the popularization of the industry, and global dissemination, representing a crucial link in the AGI development process.\nThe Social Structural Changes and Potential Risks Induced by AGI The full implementation of AGI will trigger disruptive changes in human social structures, primarily reshaping employment structures and social governance structures. In the industrial manufacturing sector, AGI robots will completely replace traditional labor, leading to the near-total disappearance of repetitive and standardized positions in manufacturing, significantly reducing the demand for manual labor. This shift also provides an opportunity for countries like the US to rebuild high-end manufacturing, freeing them from the constraints of labor costs and labor systems and making the return of manufacturing a reality. However, high-end military manufacturing and intelligent weapon production will always be firmly controlled by major powers, becoming core competitive resources.\nIn terms of social governance, the grassroots civil servant system will face unprecedented streamlining. Although traditional civil servant positions may seem stable, a significant number of these roles belong to repetitive and execution-based work that can be entirely replaced by AGI robots. This streamlining may not be precisely 90%, but a vast majority of grassroots positions will be replaced, achieving a high level of scale, leading to a complete realization of intelligent and unmanned social governance. This will significantly reduce the financial burden on the state and comprehensively enhance the efficiency and fairness of social governance. While this transformation is an inevitable result of productive forces\u0026rsquo; development, it may cause short-term employment adjustments. However, in the long run, it will free humanity from repetitive labor, allowing a shift towards more creative and emotional work, driving human society to higher levels of development.\nHowever, the potential risks accompanying the development and implementation process of AGI cannot be entirely avoided and will become a long-term pain in the process of technological advancement. Human-machine interaction safety incidents will become the norm, with various causes, including malicious human inducement, deliberate manipulation of robots causing harm, technical vulnerabilities, visual perception deviations, and physical judgment errors leading to accidental injuries. Additionally, technology companies may compress safety verification processes to pursue commercialization speed, resulting in technological oversights. Ethical conflicts among different civilizations, religions, and ethnic groups may also trigger human-machine confrontations and social unrest. These risks cannot be completely eliminated through a single technological means but can only be gradually reduced through continuous improvement of technical safety designs, establishment of regulatory systems, and formulation of global safety standards.\nFurthermore, AGI may also bring risks such as technological monopolies, alienation of human-machine relationships, and widening social disparities. A few countries and companies may control core technologies, forming technological hegemony and exacerbating global development imbalances. Over-reliance on AGI robots may lead to emotional alienation and degradation of social skills among humans. The intelligent disparities between different regions and groups may further widen the gap between rich and poor in society. These risks must be anticipated and prevented in advance during AGI development, guiding AGI to develop in a direction that serves and benefits humanity through multiple constraints of technology, systems, and ethics.\nCore Criteria for Determining AGI Maturity and Future Outlook Determining whether AGI is truly mature and has achieved generality cannot rely solely on technical parameters and reasoning capabilities but must consider four core dimensions: service capability, safety control, social adaptability, and global dissemination, forming a complete evaluation system. A mature AGI must first possess full-scenario visual perception and intention understanding capabilities, able to comprehend human latent needs, adapt to various real-world scenarios, and master physical rules and social common sense. Secondly, it must have autonomous safety constraint capabilities, avoiding human harm across all time periods and scenarios while resisting malicious manipulation. Thirdly, it must achieve stratified human-machine social interactions, clearly defining boundaries and social distances, building harmonious and stable human-machine relationships. Lastly, it must adapt to the needs of different civilizations and societies, achieving global dissemination within controllable limits and promoting the overall development of human society.\nFrom a developmental trend perspective, AGI is an inevitable direction of human technological advancement. Although its development process is filled with differences, conflicts, and risks, under the multiple drives of capital, national strategies, warfare, and social demands, it will gradually move towards maturity. China, leveraging its cultural, institutional, and industrial advantages, will become a leader in global AGI development and implementation, dominating the global intelligent development landscape. Different regions around the world will successively enter a stage of human-machine coexistence, significantly reducing social operational costs and enhancing the quality of human life and social development levels.\nThe development of general artificial intelligence is a great transformation concerning the trajectory of human civilization and a challenging exploration filled with risks. We must recognize the objective laws, real driving forces, and global differences in its development while proactively preventing potential risks, adhering to the core principles of serving humanity and prioritizing safety. Only by doing so can AGI truly become the core driving force for human societal progress, achieving a harmonious coexistence between humans and machines. This process requires the joint efforts of the global academic community, industry, and governments to improve through exploration and mature through practice.\n","date":"2026-05-01T00:00:00Z","permalink":"/posts/note-30f77804bd/","title":"Understanding the Core Nature and Development Path of AGI"},{"content":"The Mutual Empowerment of AI and Humanities Generative artificial intelligence is profoundly changing various fields such as education, employment, entertainment, healthcare, transportation, and elder care, becoming a hot topic of discussion. The relationship between the humanities and generative AI is complex and symbiotic. AI is reshaping the forms and future development paths of the humanities, while the demands of AI development highlight the value and functionality of the humanities. In this sense, the development of the humanities will fundamentally influence the cognitive heights and social acceptance of AI.\nBridging the Gap for Humanities Scholars As modern disciplines become increasingly specialized, the humanities face barriers not only with natural sciences but also with social sciences, potentially leading to a \u0026ldquo;knowledge dilemma.\u0026rdquo; It is challenging to find scholars within the humanities who can bridge literature, art, philosophy, history, and language, resulting in a limitation of \u0026ldquo;partial profundity\u0026rdquo; in contemporary humanities. The emergence of AI can provide new solutions to this issue.\nLarge language models, built through deep learning on vast amounts of text, represent a highly condensed form of human written knowledge. They utilize neural network architectures and algorithm-driven probabilistic predictions, achieving context awareness through deep learning and performing human-like logical reasoning under specific prompts. In this sense, AI can serve as a powerful assistant for humanities scholars, bridging them to multidisciplinary research and empowering the production of humanistic knowledge through information search, literature screening, semantic analysis, and cross-domain integration.\nCurrently influential \u0026ldquo;distant reading\u0026rdquo; methods, based on the overall framework of world literature, utilize AI models to establish interdisciplinary literary criticism and research models. Unlike traditional literary studies that advocate close reading of a few classics, distant reading involves data mining and quantitative analysis of large-scale text collections, systematically revealing characteristics such as thematic distribution, emotional tendencies, plot structures, and linguistic rhetoric, thereby describing the overall development of human literature. This effectively addresses the technical challenges of processing vast amounts of text and the cross-cultural and interdisciplinary knowledge dilemmas that qualitative analysis in traditional literary history and world literature research cannot resolve.\nUpdating Methods and Paradigms in the Humanities China has a long and rich tradition of humanistic scholarship, but the term \u0026ldquo;humanities\u0026rdquo; emerged in the 20th century. During the Enlightenment in the West, humanists sought to find their unique nature and methods outside of natural sciences. They viewed the humanities as a \u0026ldquo;new science\u0026rdquo; concerning human thoughts and behaviors, distinct from natural sciences, emphasizing the use of \u0026ldquo;individualized methods\u0026rdquo; linked to values to construct epistemology and methodology for the humanities.\nOverall, this logic, criticized by later generations as the \u0026ldquo;spirit-nature dichotomy,\u0026rdquo; emphasizes \u0026ldquo;thoughts of existence,\u0026rdquo; with research objects existing in symbolic forms such as language, text, images, and rituals, involving faith, conscience, emotions, aesthetics, values, and ideals—elements that are difficult to quantify. This encompasses deep individual psychology, instincts, consciousness and unconsciousness, as well as historical and cultural memories, embodying intrinsic characteristics such as value, culture, individuality, spirituality, emotion, thought, and symbolism. Methodologically, the humanities focus on internalized ways of understanding through empathy, reflective experience, and intuitive insight, aiming to reveal unique individual experiences, complex spiritual worlds, and deep cultural meanings that cannot be captured by replicable, quantifiable, and verifiable techniques of natural sciences.\nAs disciplines continue to develop, this binary oppositional thinking model is being constantly reexamined. Marx stated, \u0026ldquo;Natural science will eventually include the science of man, just as the science of man includes natural science: this will be a science.\u0026rdquo; Emerging digital humanities research not only deeply investigates the humanistic concerns and governance challenges brought by digital technology but also actively explores new research methods and paradigms from digital technology, reshaping the landscape of humanistic research. Various literary laboratories and beneficial attempts at quantitative humanities research are continually emerging. AI is evolving from an auxiliary tool to a key force driving paradigm innovation, providing humanities scholars with new interdisciplinary research perspectives and theoretical innovation support, greatly expanding the breadth and depth of humanistic research experiences.\nEnhancing Critical Thinking and Writing Skills through Human-AI Collaboration A unique aspect of the humanities is that its knowledge forms often manifest as narrative or speculative texts, expressing researchers\u0026rsquo; unique insights and profound thoughts on the spiritual and cultural cores of human existence, values, and meanings through written language. This differs from natural sciences, which rely on formulaic deductions, data charts, and reproducible experimental validations, and from social sciences, which heavily use surveys and statistical models for empirical paths. Humanistic writing is not only an expression of thoughts and emotions but also a comprehensive cognitive movement that integrates creativity, criticality, and reflexivity—\u0026ldquo;writing is thinking,\u0026rdquo; a process of generating and deepening thoughts and emotions. Writing can stimulate creative vitality, enhance self-reflection, and expand expressive boundaries, where linguistic sensitivity, intellectual penetration, and cultural insight merge.\nScholars point out that writing styles themselves carry researchers\u0026rsquo; unique emotional tones, academic judgments, and value positions to some extent. In this sense, humanistic writing is a core aspect of academic research; it is not only a mode of knowledge production in the humanities but also a reflection of its thinking methods and disciplinary characteristics. It serves as a fundamental carrier of academic exchange and the vitality of the discipline. Whether expressing philosophical thoughts and ultimate meanings, describing historical contexts and narrative events, or constructing values and poetic insights in literary criticism and research, the organization and structural integration of materials, logical reasoning, viewpoint argumentation, and the refinement of thoughts and spiritual experiences all occur within the creative writing process.\nCurrently, AI models can transfer the language structures, argumentative patterns, and disciplinary terminology learned from large-scale corpora into specific humanistic knowledge production, promoting human-AI collaboration and achieving a holistic leap in humanistic writing. On one hand, in humanities academic writing, researchers can fully utilize AI\u0026rsquo;s powerful data processing capabilities to efficiently collect, systematically organize, and deeply analyze vast amounts of literature before writing. Moreover, during the writing process, through human-AI collaboration and dialogue, they can organically integrate dispersed knowledge, building new knowledge maps and cognitive frameworks that help researchers break through existing theoretical and cognitive limitations, excavate deep thoughts and internal logical structures from complex texts, reveal the laws of development, distill core concepts, and ultimately give birth to new knowledge outcomes. This process is not merely a simple accumulation of knowledge but an innovative mechanism capable of generating specific theoretical results, opening new paths for academic research and knowledge innovation. On the other hand, AI can perform localized polishing and overall optimization of professional academic expressions. This can correct, adjust, and enhance the quality of humanistic academic expressions in terms of informativeness, normativity, logicality, and systematization, potentially forcing low-quality academic research out of relevant fields. Sometimes, certain academic debates within the humanities suffer from insufficient materials, unclear concepts, and weak logic; AI assistance can significantly improve the quality of academic discussions and enhance their value.\nThe involvement of AI is not a simple process of machine-assisted writing but rather a continual deepening of thought, inspiration, and expression optimization through human-AI interaction and back-and-forth dialogue. This process demands a high level of AI literacy from researchers in terms of correctly inputting commands, providing high-level prompts, and deeply interpreting output results. These skills determine the effectiveness of using AI tools. Here, the ability to pose genuine, good, and new questions becomes extremely important, returning to the essence of academic research. At the same time, as some studies have pointed out, AI excels in knowledge inheritance but falls short in creative thinking, making it difficult to replace human depth in theoretical construction, critical reflection, value selection, and aesthetic judgment. Human intuitive judgments about subtle connections between things found in vast amounts of information, strategic choices made based on value positions, and unique expressions arising from aesthetic tastes are all of significant importance. Without human verification, modification, and deepening, AI-generated content can carry a strong \u0026ldquo;machine flavor,\u0026rdquo; presenting as bland and homogenized expressions.\nTo ensure the academic independence of thought, unique insights, and distinctive academic styles, the personal characteristics of humanities researchers—such as talent, courage, insight, and capability—should not be diminished by machine assistance. It is crucial to prevent dependency thinking and intellectual inertia; otherwise, research outcomes may lose the dynamism inherent in humanistic inquiry. Humanistic research must always reflect \u0026ldquo;the human\u0026rdquo; and integrate personal life experiences into academic exploration, responding to contemporary issues with keen perception, unique creativity, and a critical spirit in pursuit of truth. People should be able to sense the emotional investment and value care of researchers, achieving both depth of thought and warmth of feeling.\nThe Development of AI Relies on Humanities Understanding of \u0026ldquo;Human\u0026rdquo; AI, as a mirror of human intelligence, can help humanity understand the essence of what it means to be human more profoundly. At the same time, humanity\u0026rsquo;s understanding of itself becomes the fundamental basis for the future development and governance of AI technology. Marx pointed out, \u0026ldquo;Conscious life activities directly distinguish humans from animal life activities.\u0026rdquo; Thus, humanity\u0026rsquo;s strength lies in its possession of intellect, practical creativity, and the ability to continuously acquire knowledge and skills through learning to achieve goals.\nCurrently, AI still belongs to the imitation of human intelligence, exhibiting behavior like humans. Its developmental goal should be to gradually align with the internal cognitive structures and creative mechanisms of humans, rather than merely replicating external behaviors. The emergence of generative AI is not accidental; it is a product of human creativity and self-awareness reaching a certain stage. Although currently specialized vertical models have shown execution efficiency and precision surpassing humans in specific tasks and fields, they are fundamentally still tools of humanity. To date, \u0026ldquo;general models\u0026rdquo; that autonomously adapt to different environments and needs often perform worse than human infants when faced with new situations, counterfactual problems, or when common sense reasoning is required. Essentially, current AI knows what to do but may not understand the underlying principles and logic; the AI black box has yet to be opened, and it cannot evolve from imitator to understander. In this context, questioning the generative mechanisms and operational modes of human intellect becomes particularly important. Humanity\u0026rsquo;s contemplation of AI is also a re-examination and reflection of itself as a complex intelligent entity, further using non-human intelligent agents as mirrors to explore the deep essence of humanity and understand \u0026ldquo;what it means to be human.\u0026rdquo;\nWhether in natural sciences or in the humanities and social sciences, there is an ongoing alternation and repetition of \u0026ldquo;demystification\u0026rdquo; and \u0026ldquo;enchantment\u0026rdquo; regarding humanity, with the core of \u0026ldquo;enchantment\u0026rdquo; always being the mystery of humanity itself. Without a profound understanding of one\u0026rsquo;s own intellect, a \u0026ldquo;general model\u0026rdquo; cannot truly emerge, just as Marx stated, \u0026ldquo;The dissection of the human body is a key to the dissection of the monkey body.\u0026rdquo; The signs of higher animals displayed on lower animals can only be understood after recognizing the higher animals themselves. Understanding humans and comprehending humanity is the fundamental nature and basic value goal of the humanities. Today, AI still possesses many \u0026ldquo;explainability issues,\u0026rdquo; largely because humanity\u0026rsquo;s understanding of its own intellect is still insufficient. Breakthroughs in AI creation, technological governance, and value alignment require a premise of humanity\u0026rsquo;s understanding of its own essence, and the level of development in the humanities determines the future possibilities for the development of \u0026ldquo;general models.\u0026rdquo;\nFrom the perspective of the relationship between the humanities and social life, the humanities cannot be replaced by AI, as they possess reflexivity. Every emergence and change of humanistic cognition and understanding intervenes in the development of social life and the construction of public sentiment, embodying the quality of \u0026ldquo;establishing a heart for heaven and earth, and a mission for the people.\u0026rdquo; In this sense, the development of the humanities is not a linear process of progress; various humanistic thoughts cannot simply be added together to form a single ultimate truth but coexist in a pluralistic manner, collectively shaping the rich spiritual world of society and individuals. It can be said that the advancement of humanistic scholarship alters humanity\u0026rsquo;s understanding of the world, which in turn has a significant impact on generative AI. Simultaneously, the effects of new technologies like AI on society and humanity also constitute a focal point of humanistic scholarship, and related reflections become part of the human spiritual world. The humanities and AI are always in a dynamic interplay of coexistence and mutual promotion. It is essential to remember that AI is created by humanity, and humans should possess the ability to truly understand and effectively harness their creations. In this sense, we can be fully confident that humanistic thought can illuminate the future path of AI.\n","date":"2026-04-29T00:00:00Z","permalink":"/posts/note-015563c918/","title":"The Mutual Empowerment of AI and Humanities"},{"content":"Introduction In April 2026, AI programming tools have evolved from \u0026ldquo;code completion\u0026rdquo; to the \u0026ldquo;autonomous development\u0026rdquo; stage. We conducted in-depth tests on three of the most popular AI programming tools, and the results may change your perception.\nWhat Each Tool Bets On Before comparing functionalities, it\u0026rsquo;s essential to understand the different visions these three tools have for the future of AI programming:\nClaude Code: The terminal is my IDE. Anthropic has made a bold prediction—completely command-line based, abandoning GUI, and using natural language to control the entire development process. Cursor 3: The era of AI Agent fleets. Multiple AI Agents work together, one for writing code, one for testing, and one for code review. Codex: The open-source all-rounder. OpenAI\u0026rsquo;s programming assistant emphasizes customization and private deployment. SWE-bench Benchmark Test Comparison According to the latest SWE-bench benchmark test in April 2026 (measuring the ability to solve real GitHub issues):\nTool SWE-bench Score Market Share Pricing Claude Code 80.8% 28% $20/month Cursor 3 72.5% 35% $20/month Codex (OpenAI) 65.3% 15% Open-source free Practical Comparison: Creating a Complete API Service with Claude Code We tested Claude Code with a task: \u0026ldquo;Create a REST API service with authentication, including user CRUD and JWT authentication.\u0026rdquo; Here is the core code generated by Claude Code:\n# Core code generated by Claude Code - auth_middleware.py import jwt import datetime from functools import wraps from flask import jsonify, request SECRET_KEY = \u0026#34;your-secret-key-change-in-production\u0026#34; def generate_token(user_id, role=\u0026#34;user\u0026#34;): \u0026#34;\u0026#34;\u0026#34;Generate JWT Token\u0026#34;\u0026#34;\u0026#34; payload = { \u0026#34;user_id\u0026#34;: user_id, \u0026#34;role\u0026#34;: role, \u0026#34;exp\u0026#34;: datetime.datetime.utcnow() + datetime.timedelta(hours=24), \u0026#34;iat\u0026#34;: datetime.datetime.utcnow() } return jwt.encode(payload, SECRET_KEY, algorithm=\u0026#34;HS256\u0026#34;) def token_required(f): \u0026#34;\u0026#34;\u0026#34;JWT authentication decorator\u0026#34;\u0026#34;\u0026#34; @wraps(f) def decorated(*args, **kwargs): token = request.headers.get(\u0026#34;Authorization\u0026#34;) if not token: return jsonify({\u0026#34;error\u0026#34;: \u0026#34;Missing authentication Token\u0026#34;}), 401 try: token = token.replace(\u0026#34;Bearer \u0026#34;, \u0026#34;\u0026#34;) data = jwt.decode(token, SECRET_KEY, algorithms=[\u0026#34;HS256\u0026#34;]) request.current_user = data except jwt.ExpiredSignatureError: return jsonify({\u0026#34;error\u0026#34;: \u0026#34;Token has expired\u0026#34;}), 401 except jwt.InvalidTokenError: return jsonify({\u0026#34;error\u0026#34;: \u0026#34;Invalid Token\u0026#34;}), 401 return f(*args, **kwargs) return decorated # Generated API routes - app.py from flask import Flask, request, jsonify from auth_middleware import generate_token, token_required app = Flask(__name__) # Simulated database users_db = {} @app.route(\u0026#34;/api/register\u0026#34;, methods=[\u0026#34;POST\u0026#34;]) def register(): data = request.json if data[\u0026#34;username\u0026#34;] in users_db: return jsonify({\u0026#34;error\u0026#34;: \u0026#34;User already exists\u0026#34;}), 400 users_db[data[\u0026#34;username\u0026#34;]] = { \u0026#34;password\u0026#34;: data[\u0026#34;password\u0026#34;], # Encrypt in production \u0026#34;email\u0026#34;: data[\u0026#34;email\u0026#34;] } token = generate_token(data[\u0026#34;username\u0026#34;]) return jsonify({\u0026#34;token\u0026#34;: token, \u0026#34;message\u0026#34;: \u0026#34;Registration successful\u0026#34;}) @app.route(\u0026#34;/api/login\u0026#34;, methods=[\u0026#34;POST\u0026#34;]) def login(): data = request.json user = users_db.get(data[\u0026#34;username\u0026#34;]) if not user or user[\u0026#34;password\u0026#34;] != data[\u0026#34;password\u0026#34;]: return jsonify({\u0026#34;error\u0026#34;: \u0026#34;Username or password incorrect\u0026#34;}), 401 token = generate_token(data[\u0026#34;username\u0026#34;]) return jsonify({\u0026#34;token\u0026#34;: token}) @app.route(\u0026#34;/api/profile\u0026#34;, methods=[\u0026#34;GET\u0026#34;]) @token_required def get_profile(): username = request.current_user[\u0026#34;user_id\u0026#34;] user = users_db.get(username, {}) return jsonify({ \u0026#34;username\u0026#34;: username, \u0026#34;email\u0026#34;: user.get(\u0026#34;email\u0026#34;, \u0026#34;\u0026#34;), \u0026#34;role\u0026#34;: request.current_user[\u0026#34;role\u0026#34;] }) if __name__ == \u0026#34;__main__\u0026#34;: app.run(debug=True, port=5000) Testing Conclusions After a week of in-depth testing, we reached the following conclusions:\nChoose Claude Code for Complex Tasks: When you need to collaborate across multiple files and modules, Claude Code\u0026rsquo;s understanding capability is the strongest, with a SWE-bench score of 80.8%. Choose Cursor 3 for Daily Development: For routine single-file development, Cursor 3\u0026rsquo;s Agent mode is very smooth, and the GUI experience is more user-friendly. Choose Codex for Private Deployment: If your code cannot be uploaded to the cloud, the open-source Codex is the only choice. Advice for Programmers AI programming tools have transitioned from \u0026ldquo;assistants\u0026rdquo; to \u0026ldquo;partners.\u0026rdquo; Do not resist or overly rely on them. The best approach is to treat AI as an efficient \u0026ldquo;junior developer\u0026rdquo; that can quickly generate framework code, while architectural design, business logic, and code review still require your professional judgment. Remember, AI is a tool; you are the decision-maker.\n","date":"2026-04-28T00:00:00Z","permalink":"/posts/note-b24bbd92c8/","title":"Cursor 3: The Ultimate AI Programming Tool Comparison for 2026"},{"content":"DeepSeek V4 Released DeepSeek-V4 has officially launched, with both a preview version and open-source availability.\nThere are two versions available:\nDeepSeek-V4-Pro: Aimed at top-tier closed-source models, featuring 1.6T parameters, 49B activations, and a context length of 1M. DeepSeek-V4-Flash: A smaller, faster economic version with 284B parameters, 13B activations, and a context length of 1M. The official statement claims: It leads domestically and in the open-source field in agent capabilities, world knowledge, and reasoning performance.\nCurrently, DeepSeek-V4 is being used internally as an Agentic Coding model, with feedback indicating a better user experience than Sonnet 4.5 and delivery quality close to Opus 4.6 in non-thinking mode, although it still lags behind Opus 4.6 in thinking mode.\nThe official website and app have been updated, and the API service has also been refreshed. A key point of interest is that support for Huawei\u0026rsquo;s computing power will be available in the second half of the year.\nTwo Versions Released Together This time, V4 has released two versions simultaneously.\nV4-Pro offers performance comparable to top closed-source models. The official assessment includes three key points:\nSignificantly improved agent capabilities: In Agentic capability coding assessments, V4-Pro has reached the best level among current open-source models and performs excellently in other agent-related evaluations. In internal assessments, the agent coding mode of V4 outperforms Sonnet 4.5, with delivery quality close to Opus 4.6 in non-thinking mode, but still has a gap with Opus 4.6 in thinking mode. Rich world knowledge: In world knowledge assessments, DeepSeek-V4-Pro significantly outperforms other open-source models, only slightly behind the top closed-source model Gemini-Pro-3.1. World-class reasoning performance: In assessments of mathematics, STEM, and competitive coding, DeepSeek-V4-Pro surpasses all currently published open-source models, achieving results on par with the best closed-source models. V4-Flash, the smaller and faster economic version, has reasoning capabilities close to Pro, though it has slightly less world knowledge. It features smaller parameters and activations, making the API cheaper.\nIn agent tasks, DeepSeek-V4-Flash performs comparably to DeepSeek-V4-Pro on simple tasks, but still shows a gap on more difficult tasks.\nIn a washing test, V4 also passed quickly.\nHowever, in the classic biological scenario of \u0026ldquo;desperate father\u0026rdquo;, DeepSeek-V4 failed to grasp the key point of red-green color blindness in one round (according to genetic rules, if a female is red-green color blind, her biological father must also be).\nOne Million Context as Standard Notably, starting today, 1M context is the standard for all official DeepSeek services. A year ago, 1M context was a unique feature of Gemini; other closed-source models had either 128K or 200K, and very few in the open-source realm could handle this scale.\nDeepSeek has transformed the one million context from a \u0026ldquo;high-end feature\u0026rdquo; into a \u0026ldquo;basic utility\u0026rdquo;.\nThey achieved this by introducing a new attention mechanism that compresses at the token dimension, combined with DSA sparse attention. This significantly reduces the computational and memory requirements compared to traditional methods.\nDSA is not a new term. It was first introduced in the V3.2-Exp update six months ago, which did not attract much attention at the time, as its scores were nearly identical to V3.1-Terminus, appearing to be an unremarkable interim version.\nLooking back, that was the foundation for V4.\nSpecialized Optimization for Agent Capabilities For agents, V4 has been adapted and optimized for mainstream agent products like Claude Code, OpenClaw, OpenCode, and CodeBuddy, enhancing performance in coding and document generation tasks.\nThe release also included an example of a PPT slide generated by V4-Pro under a certain agent framework.\nAPI Pricing The APIs for V4-Pro and V4-Flash have been launched simultaneously, supporting both OpenAI ChatCompletions and Anthropic interfaces.\nThe base_url remains unchanged, and the model parameter can be set to deepseek-v4-pro or deepseek-v4-flash for access.\nBoth versions support a maximum context of 1M and both non-thinking and thinking modes. In thinking mode, the reasoning_effort parameter can be adjusted for intensity, with two levels: high and max. The official recommendation is to use max for complex agent scenarios.\nA key point to note is support for Huawei\u0026rsquo;s computing power in the second half of the year.\nAdditionally, old model names will be phased out. deepseek-chat and deepseek-reasoner will be discontinued three months later (July 24, 2026), with these names currently pointing to the non-thinking and thinking modes of V4-Flash, respectively.\nThis change will not significantly impact individual developers, as it only requires a change in the model parameter. However, companies integrated into production environments will need to migrate within the next three months.\nOne More Thing At the end of the release, DeepSeek quoted a line:\n\u0026ldquo;Do not be tempted by praise, nor frightened by slander; follow the path you believe in and correct yourself.\u0026rdquo;\nThis is a line from Xunzi\u0026rsquo;s \u0026ldquo;Non-Twelve Sons\u0026rdquo;. Literally, it means not to be lured by praise or intimidated by slander, but to move forward according to one’s own beliefs and correct oneself.\nIn today’s context, it carries some significance.\nOver the past six months, rumors about when V4 would be released, whether it would be delayed, whether it had been surpassed by others, or whether it had been resolved by Claude\u0026rsquo;s distilled data circulated back and forth in both Chinese and English AI circles. Earlier this year, some even confidently stated that V4 would be released before the Spring Festival, but it was not until the end of April that it finally arrived.\nThey did not respond to any of these rumors.\nThen, on a Friday afternoon, they released V4, simultaneously open-sourced, updated the official website and app, and refreshed the API, while casually mentioning that internal staff had already abandoned Claude.\nNo roadmap, no live broadcasts, no interviews.\nThe phrase \u0026ldquo;follow the path you believe in\u0026rdquo; sounds like a slogan. But if you look at the path that led to V4, from the seemingly unremarkable V3.2 Exp version six months ago, to the DSA sparse attention that laid the groundwork for V4, and the transformation of 1M context from a premium feature to a standard utility, DeepSeek has indeed achieved this.\nDeepSeek-V4 model open-source links:\n1\n2\nDeepSeek-V4 technical report:\n[https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf)\n","date":"2026-04-24T00:00:00Z","permalink":"/posts/note-922f552c96/","title":"DeepSeek V4 Released: Breaking the Closed Source Monopoly with Huawei Collaboration"},{"content":"Introduction Recently, I became obsessed with VibeCoding, which involves using natural language to instruct AI to write code. This passion consumed me to the point where my family began to question my whereabouts.\nSince realizing that AI has evolved from merely conversing to actually performing tasks, I have been brainstorming ways to monetize AI, exploring various avenues and attempts.\nTools Developed Before diving into VibeCoding, I invested significant time and energy into OpenClaw, which has now become a reliable assistant for managing content on my geek website and WeChat account.\nEarlier, I developed two WeChat mini-programs using Tencent\u0026rsquo;s Yuanbao. One was a \u0026ldquo;Pension Calculator\u0026rdquo; for practice, and the other was a more refined \u0026ldquo;Personality Type Quick Test Tool.\u0026rdquo; If you\u0026rsquo;re interested, feel free to check them out.\nThe monetization model for such tools is quite straightforward—earning revenue from Tencent\u0026rsquo;s ad shares. However, to enable this, a mini-program must have at least 500 followers, which I haven\u0026rsquo;t achieved due to lack of promotion and competition from similar programs.\nMy experience with mini-program development taught me that if the market already has similar tools, you are essentially repeating previous work. Without significant differentiation or improvement, standing out is challenging.\nDeveloping WeChat mini-programs is relatively complex, tedious, and time-consuming, requiring specialized developer tools for repeated debugging to ensure consistent user experience across various devices. Just creating those two mini-programs left me exhausted.\nIn contrast, developing PC applications is much simpler. If the program doesn\u0026rsquo;t require backend support, you can have AI write a front-end HTML code, upload it to a web server, and you\u0026rsquo;re done. In a smooth scenario, you can complete a program in just ten minutes.\nThe Development Process Initially, I asked DeepSeek to write code for a QR code generator. I described my requirements, such as a minimalist style and clear functionality. To my surprise, the first version of the code met my expectations.\nThis tool allows users to input a URL and quickly generate a QR code that links directly to the original page. It only requires front-end code, with no backend support needed. The code was completed in minutes, saved as an HTML file, and uploaded to the server for immediate use. The entire process exceeded my expectations.\nFrom there, I continued to have AI assist me in developing several tools: a short link generator, text organizer, image compressor, random password generator, IP location finder, pixel avatar generator, image watermarking tool, and image stitching tool.\nAll nine tools are now live and perfectly adapted for mobile devices. The entire development process took only two working days, including adjustments to webpage background styles and card designs. If you\u0026rsquo;re interested, you can visit my tools page to try them out.\nThe short link generator, while appearing simple, is the most complex as it requires domain and backend support, necessitating three files in total. However, the process went smoothly.\nI spent more time fine-tuning the features of the watermarking and image stitching tools, as they involved real-time previews to ensure a consistent experience across PC and mobile.\nThe Joy of VibeCoding I must admit that VibeCoding, even without considering future monetization, provides a sense of satisfaction. Just by conversing in natural language, I can have AI turn my ideas into tangible products. This was unimaginable just a year ago.\nThere\u0026rsquo;s a notion that in the AI era, liberal arts students may have an advantage. My interpretation is that their strong language skills enable them to articulate their needs to AI more clearly, yielding better results. However, when it comes to coding, programmers may still outperform liberal arts students due to their understanding of code and debugging efficiency. As a liberal arts student, I often rely on AI for repetitive tasks, which can lead to AI skipping redundant code.\nMonetization Strategies Now, the serious question is how to monetize these tools. After developing the nine tools, my tools.html page resembles a functional toolbox. I set long-tail keywords for the page title, such as \u0026ldquo;Comprehensive Tools Collection\u0026rdquo; and \u0026ldquo;Free QR Code Generator,\u0026rdquo; to improve search engine indexing.\nSince all tools are free, my current monetization strategy is through ad revenue.\nI have integrated Baidu Alliance ads at the bottom of the toolbox homepage and each individual tool page, allowing me to earn revenue from user clicks. I also applied for Google ads, which are currently under review.\nHowever, the challenge is that PC websites have low traffic and click-through rates. Most mobile browsers block ads by default, making it unrealistic to expect user clicks.\nTo be honest, the tools I\u0026rsquo;ve developed are not groundbreaking; users can find similar tools online for free. More complex tools like Word to PDF converters often come with a fee. I could develop a free version to attract users, but such tools require a more complex server environment, which I currently lack the resources to tackle.\nMy only hope is that one of these tools gains good traction in search engines, leading to increased traffic. Since all my tool pages link to each other, this may enhance user engagement and encourage bookmarking, making the ad revenue model viable.\nI understand that relying solely on AI to develop tools for passive income is unrealistic. However, I have accomplished this task, which has given me a sense of achievement and satisfaction. There’s also a bit of vanity in the fact that a liberal arts student like me can develop applications.\nMy journey to leverage AI for profit has just begun. If I find the time, I may have AI help me develop more applications, such as AI tools that integrate large models, or even directly create an app. However, I won\u0026rsquo;t rush into it without a solid idea.\nIn summary, I have taken a crucial step toward transforming AI into a profitable tool. I hope this marks a promising start.\n","date":"2026-04-10T00:00:00Z","permalink":"/posts/note-0ae116ec36/","title":"Creating 9 AI Tools for Profit with VibeCoding"},{"content":"The Birth of the Transformer Architecture The birth of the Transformer architecture has completely rewritten the rules of the AI game. From ChatGPT to Sora, from AlphaFold to ViT, seemingly unrelated technological breakthroughs share the same DNA. This article will take you through the old world of RNNs and CNNs, revealing how the attention mechanism overcomes the challenges of long-range modeling and explores how this \u0026lsquo;relationship processing machine\u0026rsquo; spills over from the language domain to reshape our understanding of the world.\nRecently, I have been asked many times:\n\u0026ldquo;How does ChatGPT actually work?\u0026rdquo;\nEach time I want to answer seriously, but I don’t know where to start. Talking directly about neural networks is too abstract; discussing \u0026ldquo;large language models\u0026rdquo; feels like nonsense; when I mention Transformers, people usually nod politely and change the subject.\nSo, I decided to write this article.\nNot to provide you with a technical manual, but to discuss something I believe many people overlook: Transformers are not just a neural network architecture; they represent a leap in thinking.\nThe ChatGPT you are using, the Sora-generated videos you see, and the AlphaFold that unravels the mystery of protein folding—all these seemingly unrelated technological breakthroughs share the same name.\nIn 2017, Google published a paper titled \u0026ldquo;Attention Is All You Need.\u0026rdquo;\nThis single paper rewrote the entire AI landscape.\nUnderstanding it does not require you to write code or understand matrix operations. You only need to be willing to think clearly about one thing: Before the Transformer appeared, how did AI \u0026lsquo;read\u0026rsquo; the world? What did it do right that made everything different?\nThis article will follow this line: the dilemmas of the old world → the core of the attention mechanism → two different missions → from language to everything → the cost of revolution and the future.\nI will try not to make you feel like you are in class.\nThe Walls of the Old World: Before the Transformer To truly understand a revolution, one must first feel how constrained the old world was before it was overthrown.\nThe \u0026lsquo;Reader\u0026rsquo; with Amnesia Before the Transformer emerged, the most mainstream tool for processing language was RNN (Recurrent Neural Network).\nIt worked in a strictly word-by-word manner.\nImagine a reader who can only see one word at a time. After reading it, they carry the \u0026ldquo;memory of that word\u0026rdquo; to the next word and continue. After reading the second word, they carry the \u0026ldquo;memory of the first two words\u0026rdquo; to the third word\u0026hellip; and so on.\nSounds okay, right?\nThe problem is that this reader suffers from a peculiar form of short-term amnesia.\nBy the time they reach the 50th word of a passage, their memory of the first word has become blurred due to the subsequent 49 words\u0026rsquo; \u0026ldquo;overwriting and dilution.\u0026rdquo; This is technically known as gradient vanishing—the signal diminishes layer by layer in the long sequence transmission, like a game of telephone, until almost nothing remains.\nThis leads to a very practical problem: the model cannot establish \u0026ldquo;long-distance dependencies.\u0026rdquo;\nFor example, in the sentence: \u0026ldquo;The cat, which had been sitting by the window all afternoon, finally fell asleep.\u0026rdquo;\nThe word \u0026ldquo;fell\u0026rdquo; should grammatically and semantically correspond to the initial \u0026ldquo;cat.\u0026rdquo; But for RNNs, the connection between \u0026ldquo;cat\u0026rdquo; and \u0026ldquo;fell\u0026rdquo; is severed due to the many intervening words. The model can only guess the next word based on the most recent words, with almost zero grasp of the overall logic.\nAnother critical issue is: Sequential dependence prevents parallel processing.\nSince processing must be done word by word, the second word must wait for the first word to finish, the third word waits for the second\u0026hellip; The entire computation process is a serial pipeline. No matter how powerful the GPU is, it cannot process all words simultaneously—it is forced to queue.\nThis is why training long texts with RNNs is both slow and ineffective.\nLater, some attempted to patch this with LSTM (Long Short-Term Memory networks), trying to allow the model to \u0026ldquo;decide what to remember and what to forget.\u0026rdquo; It helped, but it was not a fundamental solution. The serial architecture problem remained, and the ceiling for long-range modeling was still there.\nThe Inspector with a Fixed \u0026lsquo;Observation Window\u0026rsquo; Another technical route used CNN (Convolutional Neural Network) to process language.\nCNNs were originally powerful tools in the image domain. Their core operation involves using a fixed-size \u0026ldquo;convolutional kernel\u0026rdquo; that slides over the image like a scanner, capturing local features—such as edges, textures, and shapes.\nTranslating this logic to language means sliding a fixed-size \u0026ldquo;window\u0026rdquo; over sentences to capture local word group relationships.\nBut the problem is obvious: this window is fixed, and the field of view is limited.\nWant to expand the \u0026ldquo;window\u0026rdquo; to see further word associations? You need to stack many layers, leading to an exponential increase in computational load, with unstable results. Worse, CNNs are inherently insensitive to positional order—they care about whether \u0026ldquo;there is a pattern in this area,\u0026rdquo; not about \u0026ldquo;the position of this word.\u0026rdquo;\nLanguage is inherently about order; the phrases \u0026ldquo;I owe you\u0026rdquo; and \u0026ldquo;you owe me\u0026rdquo; contain the same words but convey entirely different meanings based on order. CNNs struggle with such nuances.\nThus, until 2017, the entire field faced the same wall:\nInefficient serial computation and the inability to model long-range dependencies.\nIt wasn\u0026rsquo;t that no one tried; it was just that the space for effort within this framework was becoming increasingly limited.\nThe Universe of Attention: What Transformers Did Right The title of that 2017 paper, \u0026ldquo;Attention Is All You Need,\u0026rdquo; still reads like a declaration today.\nIt suggests: all your previous efforts may have gone astray.\nAbandoning Order, Embracing the Global The most fundamental decision of the Transformer was to completely abandon the sequential structure of \u0026lsquo;word-by-word processing.\u0026rsquo;\nIt no longer lets the model read one word at a time; instead, it processes all words in a sentence simultaneously. Each word is no longer a \u0026ldquo;link in a relay chain\u0026rdquo; but rather participants at the same table.\nHow radical is this change? To put it simply:\nRNN\u0026rsquo;s method is like making you read a book from the first page to the last, and after closing the book, answering questions based on memory.\nThe Transformer method is like laying the book open in front of you, allowing you to see all pages at once and then answer questions.\nWhich method is easier for understanding the overall structure and long-distance connections of the book? The answer is obvious.\nBut this raises a question: how do these simultaneously appearing words know which ones are more relevant to each other?\nThis is what the Self-Attention Mechanism aims to solve.\nSelf-Attention: The \u0026lsquo;Internal Seminar\u0026rsquo; of Each Word Let me explain self-attention with a scenario.\nImagine an internal seminar at a company with the theme of \u0026ldquo;Reinterpreting Everyone\u0026rsquo;s Role in the Team.\u0026rdquo;\nEach participant must do three things:\nPose their own questions (Q, Query): \u0026ldquo;What information do I need from this team to redefine myself?\u0026rdquo; Show their labels (K, Key): \u0026ldquo;What can I provide? What are my expertise labels?\u0026rdquo; Prepare their content (V, Value): \u0026ldquo;If someone thinks I am relevant to them, what specific content can they obtain from me?\u0026rdquo; Each participant takes their \u0026ldquo;question (Q)\u0026rdquo; and compares it with everyone else\u0026rsquo;s \u0026ldquo;labels (K)\u0026rdquo;: how well do your labels match my questions? Those with a high degree of relevance have a higher \u0026ldquo;weight\u0026rdquo; in my mind.\nFinally, each participant aggregates everyone\u0026rsquo;s \u0026ldquo;content (V)\u0026rdquo; according to the weights—those with higher weights contribute more, while those with lower weights contribute less—to obtain a new self-representation.\nThis new self-representation has integrated the contextual information of the entire team.\nTranslating this back to language processing: each word in a sentence computes its relevance with all other words, and then redefines its meaning in that context based on the strength of the relevance.\nThis is why Transformers can handle long-distance dependencies—\u0026ldquo;cat\u0026rdquo; and \u0026ldquo;fell,\u0026rdquo; which are separated by many words, can establish a direct connection through self-attention without needing to relay through all the intervening words.\nMulti-Head Attention: Simultaneously Hosting Multiple Seminars Once you understand self-attention, Multi-Head Attention is easy to grasp.\nSingle-instance self-attention involves everyone discussing the issue on the same dimension. But language is multi-dimensional: a sentence contains grammatical relationships, semantic associations, referential relationships, emotional tendencies, and more simultaneously.\nMulti-head attention operates by simultaneously hosting multiple seminars with different focuses.\nThe first seminar focuses on grammar, the second on semantics, the third on \u0026ldquo;who does \u0026lsquo;it\u0026rsquo; refer to here\u0026rdquo;\u0026hellip; Each seminar proceeds independently, and the conclusions from all sessions are combined to form a more nuanced and rich understanding of the sentence.\nThis is the meaning of \u0026ldquo;multi-head\u0026rdquo;—multiple attention \u0026ldquo;heads\u0026rdquo; capturing different dimensions of associations in parallel.\nPositional Encoding: Injecting Order into Thought But wait, there\u0026rsquo;s a question.\nSince all words are processed simultaneously, how does the model know that \u0026ldquo;the dog bites the man\u0026rdquo; and \u0026ldquo;the man bites the dog\u0026rdquo; are two different sentences?\nThe cost of parallel processing is the inherent loss of awareness of positional order.\nThe Transformer\u0026rsquo;s solution is called Positional Encoding.\nBefore sending each word into the model, it is assigned a \u0026ldquo;positional coordinate\u0026rdquo;—this is the 1st word, this is the 2nd word, this is the 17th word\u0026hellip; This positional information is encoded into a segment of numbers, which is added to the semantic information of the word before being sent into the model.\nThus, the model knows both \u0026ldquo;what this word means\u0026rdquo; and \u0026ldquo;what position this word is in.\u0026rdquo;\nThe difference between \u0026ldquo;dog\u0026rdquo; in the first position and \u0026ldquo;dog\u0026rdquo; in the third position is two different inputs for the model—despite being the same word.\nThe sense of order is thus \u0026ldquo;externally injected\u0026rdquo; into the parallel processing system.\nThe Foundation of the Structure: Residual Connections and Layer Normalization Transformers not only have attention mechanisms but also two engineering designs that allow the entire architecture to be \u0026ldquo;deep.\u0026rdquo;\nResidual Connections serve as a reminder: \u0026ldquo;No matter how complex your transformations are, don’t forget where you started.\u0026rdquo;\nAfter each layer processes, the output of that layer is added directly to the input of that layer, ensuring that original information is not lost in the transformations. This design allows gradients to flow smoothly back to earlier layers, which is key to Transformers being able to stack dozens or even hundreds of layers.\nLayer Normalization acts like organizing the data after each layer has processed—keeping the data distribution stable across layers to avoid situations where some values explode while others vanish. It makes the training process smoother and converges faster.\nThese two designs are the foundation that supports the \u0026ldquo;thinking tower\u0026rdquo; of the Transformer to be built high and stable.\nTwin Engines: The Philosophy of Encoder and Decoder Understanding the attention mechanism leads to the next question: how are these mechanisms combined to perform different tasks?\nThe answer lies in the two core components of the Transformer: Encoder and Decoder.\nEncoder: The Omniscient \u0026lsquo;Reviewer\u0026rsquo; The task of the encoder is extreme understanding.\nGiven an input text, it uses bidirectional self-attention—looking at both the preceding and following context—to compress the meaning of the entire text into a rich contextual vector representation.\nTo illustrate: the encoder is like someone who has completed an entire project and sits down with all the materials to review. They do not start outputting conclusions while reading the first document; they form a complete judgment about the meaning of each document after reviewing all materials.\nThis \u0026ldquo;omniscient\u0026rdquo; perspective makes the encoder very adept at tasks requiring deep understanding:\nWhat is the emotional tendency of this sentence? What word should fill in the blank in this text? Are these two sentences similar or contradictory? BERT is a representative model of a pure encoder architecture. Google refreshed almost all benchmark tests in the NLP field with it because it can truly \u0026ldquo;understand\u0026rdquo; the deep meaning of input texts.\nDecoder: The Strictly Causal \u0026lsquo;Impromptu Speaker\u0026rsquo; The task of the decoder is sequence generation.\nHowever, it has a strict limitation: when generating the N-th word, it can only see the previously generated N-1 words and must not peek at the content that has not yet been generated.\nWhy this limitation? Because in a real generation scenario, the subsequent words do not yet exist—the model generates word by word, basing each step on the history already established. Allowing it to \u0026ldquo;peek into the future\u0026rdquo; would be cheating during training.\nThis limitation is technically implemented through masked attention—forcing future words to be obscured so that the model cannot see them.\nImagine an impromptu speaker wearing a blindfold who can only see to the left. They do not know what they will say next and can only proceed word by word based on what has already been said. Yet they can still tell a logically coherent story—because they are making the most reasonable next prediction at each step.\nGPT series is a representative of pure decoder architecture. The logic behind ChatGPT is essentially that of a highly trained \u0026ldquo;next word predictor.\u0026rdquo; Each time it responds to you, it repeatedly asks itself: \u0026ldquo;Based on everything so far, what is the most reasonable next word?\u0026rdquo;\nEncoder-Decoder: The Professional \u0026lsquo;Translator\u0026rsquo; When the encoder and decoder are combined, they form the complete architecture originally described in the Transformer paper.\nThe workflow is as follows: the encoder first comprehends the entire source sequence (for example, a sentence in Chinese) and generates a complete understanding representation; then the decoder takes this representation and, under its guidance, generates the target sequence (for example, the corresponding English translation) one word at a time.\nDuring each generation step, the decoder not only looks at the words it has generated but also queries the encoder\u0026rsquo;s output through cross-attention: which information from the source sequence is most relevant to the current word being generated?\nThis is a truly meaningful \u0026ldquo;understand first, then express\u0026rdquo; process.\nT5 and BART are representatives of this architecture. They excel at tasks requiring \u0026ldquo;precise transformation\u0026rdquo;: machine translation, text summarization, question-answering systems\u0026hellip; first thoroughly understanding the source language, then accurately expressing it in the target language.\nParadigm Overflow: From Language to All Sequences At this point, the core logic of Transformers has been clarified.\nHowever, I believe what truly makes this architecture extraordinary is not how powerful it is in the language domain, but a deeper insight hidden behind it:\nThe essence of Transformers is processing \u0026ldquo;sequential relationships.\u0026rdquo; Mathematically, everything can be represented as a sequence.\nOnce you accept this perspective, its application boundaries begin to expand at an unexpected speed.\nImages: \u0026lsquo;Reading\u0026rsquo; a Photo as Text In 2020, Google proposed Vision Transformer (ViT), doing something that sounds a bit strange:\nIt cut an image into 16×16 pixel-sized patches, arranged these patches in order, and processed them using the exact same Transformer architecture.\nEach patch is like a \u0026ldquo;word.\u0026rdquo; The entire image becomes a \u0026ldquo;sentence.\u0026rdquo;\nAs a result, this approach outperformed the CNN architecture that had dominated the image domain for a decade in large-scale image classification tasks.\nThis is quite interesting—it\u0026rsquo;s not that CNNs are bad, but rather that the \u0026ldquo;attention\u0026rdquo; logic of Transformers is much more broadly applicable than we thought. It uses the same mathematical structure to process the semantic relationships between \u0026ldquo;dogs\u0026rdquo; and \u0026ldquo;cats\u0026rdquo; as it does to process the spatial relationships between the upper left and lower right corners of an image.\nProteins: Unraveling a Half-Century Biological Mystery I believe this application case is the most far-reaching impact of Transformers spilling over from the language domain.\nProteins are composed of chains of amino acids. Given an amino acid sequence, what shape will it fold into in three-dimensional space? This shape determines the protein\u0026rsquo;s function and is the core basis for drug design and disease research.\nThis question has been studied by biologists for 50 years without a reliable computational prediction method.\nAlphaFold 2\u0026rsquo;s core is to treat the amino acid chain as a sequence and use the attention mechanism of Transformers to learn the spatial relationships between amino acids—what two amino acids are close to each other in three-dimensional space, and which regions will form helical structures.\nIts prediction accuracy has reached the level of experimental measurements.\nThe scientific community has called this breakthrough \u0026ldquo;one of the most significant biological advances in 50 years.\u0026rdquo;\nA mathematical framework originally designed for language translation has unraveled a half-century biological mystery. This alone is enough to leave one speechless for a moment.\nThe Bigger Picture Today, Transformers or their variants have appeared in code analysis, audio generation, video understanding, molecular design\u0026hellip; in almost every AI application field you can think of.\nI believe this is not just a story of \u0026ldquo;a technology being very useful.\u0026rdquo; It indicates that we may have found a sufficiently low-level mathematical language capable of describing the \u0026ldquo;relationship structures\u0026rdquo; between different modal data.\nLanguage is a relationship. Images are relationships. The spatial structure of proteins is a relationship.\nEverything is a relationship, and Transformers are precisely a machine for processing relationships.\nThe Cost of Revolution and the Dawn of the Future No revolution comes without a price.\nTransformers have brought a paradigm shift but also two significant costs.\nData Hunger and Computational Black Holes Data hunger.\nThe capabilities of Transformers come from pre-training on massive amounts of data. The training data for GPT-3 exceeds 450 billion tokens, which is roughly a substantial slice of all indexable internet text.\nEven more concerning is that as the data scale increases, models exhibit what is known as \u0026ldquo;emergent abilities\u0026rdquo;—some new capabilities suddenly appear after reaching a certain scale threshold, rather than growing linearly. This means that to achieve qualitative change, you must first endure a massive quantitative change.\nThis itself is a monopolistic barrier. Only a few organizations globally can acquire, clean, and process internet-scale data.\nComputational black holes.\nTraining a model at the level of GPT-4 is estimated to cost over 100 million dollars, consuming enough electricity to power a small city for weeks.\n\u0026ldquo;Anyone can train a large model\u0026rdquo;—this statement is almost a joke under today\u0026rsquo;s Transformer architecture. The concentration of computational power is locking cutting-edge AI research behind the walls of a few super companies.\nArchitecture Evolves, Bottlenecks Loosen Fortunately, this field has never lacked smart people looking for solutions.\nMixture of Experts (MoE) architecture is currently the most mainstream direction for efficiency breakthroughs. The core idea is: do not let all parameters participate in every computation; instead, divide the model into many \u0026ldquo;expert groups,\u0026rdquo; activating only a few that are most relevant to the current task.\nDeepSeek V3 is a milestone case in this direction—using relatively fewer active parameters to support a total parameter count in the hundreds of billions, significantly reducing training costs.\nOptimizations of attention mechanisms are addressing another bottleneck: memory and computational overhead for long sequences. Standard self-attention has a computational complexity that grows quadratically with sequence length—doubling the sequence quadruples the computational load. Techniques like MLA (Multi-Head Latent Attention) and sliding window attention attempt to flatten this growth curve.\nThere are also more radical explorations of new architectures. Mamba and other state-space models (SSM) are trying to maintain the modeling capabilities of Transformers while reducing the complexity of processing long sequences to linear levels. Currently, their hybrid architectures with Transformers have shown promising potential in some tasks.\nAll these efforts point to the same goal: to make powerful models no longer just toys for a few.\nA Perspective Worth Considering I want to present a somewhat disruptive perspective here.\nMany of the AI application paradigms we discuss today—RAG (Retrieval-Augmented Generation), Agents, various tool invocation frameworks—what are they essentially?\nThey are compensating for the current limitations of model capabilities.\nRAG exists because the model\u0026rsquo;s context window is not large enough, and memory is not long enough; the Agent framework is needed because the model\u0026rsquo;s single-step reasoning ability is limited and requires breaking tasks into multiple steps; tool invocation is necessary because the model lacks real-time access to external information\u0026hellip;\nThis is not a criticism of these technologies—they are clever and necessary engineering solutions under today\u0026rsquo;s conditions.\nBut it implies that as the foundational capabilities of Transformers and their successors continue to enhance, the forms of these upper-level structures will continue to evolve, and some may even disappear.\nWhen the model\u0026rsquo;s context window expands to a sufficient length, and reasoning capabilities become strong enough to reach a certain threshold, many application paradigms we take for granted today may be rewritten.\nThis is not a bad thing. This is how the entire ecosystem rearranges itself after foundational capabilities improve.\nEpilogue: Understanding the Grammar of This Era In 1665, Newton discovered gravity.\nFor the next two hundred years, whether calculating planetary orbits, designing bridges, or understanding tidal fluctuations, physicists used the same mathematical language—because it was sufficiently low-level to describe a wide range of phenomena.\nI sometimes wonder if Transformers are playing a similar role.\nNot because they are perfect, but because they touch upon something deeper: defining meaning dynamically through \u0026lsquo;relationship strength\u0026rsquo; and replacing \u0026lsquo;sequential memory\u0026rsquo; with \u0026lsquo;global associations.\u0026rsquo; This logic holds in language, in images, in protein folding, and in code.\nWhen an architecture can simultaneously understand language, images, protein folding, and musical rhythms, are we approaching a kind of unified intelligent grammar?\nI do not know the answer.\nBut I feel that in this era where AI is rewriting almost all industry rules, understanding what Transformers are doing should not be the exclusive domain of engineers.\nIt is the meta-model of our time.\nUnderstanding it is understanding the grammar of this era.\n","date":"2026-03-17T00:00:00Z","permalink":"/posts/note-795c82bd17/","title":"The Transformative Power of Transformers in AI"},{"content":"Introduction In discussions about Claude Code, many have asked about the differences between skills and agents. Should prompts be placed in skills or made into agents? Skills execute specific tasks, while agents carry underlying thinking models and work modes. Understanding these differences is essential for building an efficient AI collaboration workspace.\nInitially, my approach was quite straightforward. What is the most notable feature of agents in Claude Code? They can operate independently without interfering with the current conversation or consuming context resources, and they can be linked to specific models. Therefore, I turned prompts like manuscript review into agents.\nHowever, as I delved deeper into usage and the upgrades of Claude Code, I realized this method was not the best choice for the following reasons:\nThis approach essentially still creates skills, and distributing these specific capabilities between skills and agents makes prompt management scattered and chaotic, complicating future maintenance. If the only goal is to operate independently, it is unnecessary because Claude Code now supports enabling agent operation for skills or having agents actively mount certain skills during operation, perfectly addressing previous scenario needs. So, when should you create a skill, and when should you write an agent?\nFundamental Differences Between Agents and Skills I have written extensively about skills, and many experts have shared their insights online, so most are likely familiar with them. In simple terms, a skill is a prompt fragment used to execute specific processes or solve particular problems (which can also include scripts).\nWhat about agents? What are they primarily used for?\nWhen we feel confused, the official documentation is the best guide. We can glean some insights from several built-in agents in Claude Code, including the following types:\nExcluding a few in the \u0026ldquo;Others\u0026rdquo; category, the Explore, Plan, and General-purpose agents have distinct characteristics. They are designed for broader scenarios and are supported by a mechanism to ensure generation quality.\nFor a more intuitive example, the Planning mode in Antigravity is similar to these agents in Claude Code. In Planning mode, the AI undergoes in-depth analysis and thought, plans tasks and execution steps, then executes according to the plan, ultimately providing users with feedback on changes, completing Task → Implementation → (user-requested) generated content → Walkthrough. This mechanism ensures high-quality delivery.\nThus, it is evident that agents do not carry specific execution techniques or routines but rather more fundamental thinking models and work modes. This is the fundamental difference between agents and skills.\nDesigning Agents Agents are not aimed at a specific problem but rather at a category of problems, providing process management for solving these types of issues.\nIn our work scenarios, this processing method is quite common. For instance, the PDCA model used in quality management, the pyramid model used in document writing, and the snowflake writing method used in story creation. The generation-review cycle I previously demonstrated and the popular multi-expert review model can also be solidified into a framework for problem-solving.\nWhy are these frameworks effective? Because they establish a method for \u0026ldquo;doing things,\u0026rdquo; making the processing steps clear and forming processes, rules, and standards. This avoids aimless attempts and uncontrolled delivery quality. With these methods in place, even a novice can achieve satisfactory results.\nWhen applied to human-machine collaboration, turning these frameworks into prompts becomes agents. Calling an agent is essentially selecting a thinking model or framework to enhance output quality.\nNow, is the design of agents clearer?\nThe writing techniques remain the same, focusing on defining roles, workflows, and read-write interactions. Agents emphasize ensuring result quality through good process control rather than being bogged down by minor detail techniques.\nConclusion The above is my understanding of the skill and agent mechanisms in Claude Code after a period of practice.\nAs Claude Code continues to upgrade and iterate, both skills and agents are becoming more refined. While flexibly building an \u0026ldquo;AI writing platform,\u0026rdquo; we can conduct more detailed control and configuration, with the usage scenarios of both becoming increasingly distinct.\nCurrently, the built-in agents in Claude Code do not fully align with the scenarios of web fiction creation. Issues like improving plot quality, addressing memory problems, and avoiding out-of-character (OOC) moments are challenges that writing agents cannot overlook.\nInterested individuals can brainstorm and experiment, and perhaps create something remarkable!\nFor official documentation on building agents in Claude Code, visit: https://code.claude.com/docs/zh-CN/sub-agents\n","date":"2026-02-23T00:00:00Z","permalink":"/posts/note-0c718c59da/","title":"Understanding the Differences Between Skills and Agents in Claude Code"},{"content":"\nThe Rapid Evolution of Coding Agents In the past year, the pace of change for coding agents has been so rapid that it is difficult to describe it merely as a \u0026ldquo;functional upgrade.\u0026rdquo;\nA year ago, agents were primarily focused on code completion and making minor adjustments in a conversational manner. Today, engineers at Cursor are running multiple agents in parallel, allowing them to autonomously modify, debug, and review code in the repository, with human oversight only at the final stage. Developers are no longer watching every step of the agent\u0026rsquo;s operations; instead, they are getting used to \u0026ldquo;waiting for it to finish before checking the results.\u0026rdquo;\nIn a recent interview, Cursor\u0026rsquo;s engineering lead, Jason Ginsberg, made a clear assertion: this is not a gradual optimization but a generational shift. More importantly, he believes this change will occur within the next three to six months. In his view, agents will not only become \u0026ldquo;smarter\u0026rdquo; but will genuinely take over longer, more complex engineering tasks, reshaping the entire industry\u0026rsquo;s workflow.\nA Year of Transformative Changes in Coding Agents Harrison Chase: Jason, can you briefly introduce yourself and explain what Cursor is?\nJason Ginsberg: Sure. I am currently working on an AI programming tool and have been with Cursor for six months as the engineering lead for this product. To be honest, most of my daily work still involves coding and design. Before joining Cursor, I worked on Notion Mail at Notion. A few years ago, I founded a company called Skiff, which was later acquired by Notion. So, I have been focused on product development, mainly in the productivity tools sector.\nHarrison Chase: That\u0026rsquo;s great. I have many topics to discuss with you. Let me start by asking about your views on the development of coding agents and the evolution of human-computer interaction models. You could be considered one of the pioneers in this field. I believe the development of coding agents has undergone several phases: from initial code auto-completion to conversational interactions integrated into IDEs, and now to various terminal tools and cloud-based asynchronous agents. How do you view this evolution?\nJason Ginsberg: I think the development of coding agents can indeed be described as \u0026ldquo;transformative,\u0026rdquo; and these changes have occurred in just over a year. As you mentioned, Cursor was the first to introduce code auto-completion, which primarily provided assistance on a line-by-line basis and was mostly limited to single files. Since then, we have had to elevate the product\u0026rsquo;s abstraction level almost every few months, which is a significant product design challenge. Clearly, the emergence of agents allows developers to switch flexibly between multiple files and confidently let agents autonomously complete code modifications.\nIn the past couple of months, I\u0026rsquo;ve noticed a new shift in the industry: developers can now fully trust agents from project initiation to completion and conduct batch reviews of multiple files in the codebase. Therefore, we had to significantly redesign the overall product layout, shifting the focus from line-by-line code comparison to a more code review-oriented approach.\nLooking ahead, our development focus will increasingly be on the collaborative operation of multiple agents. We need to enable quick validation of whether these agents are functioning correctly and allow them to work in parallel without being constrained by the various options and choices in the current single-dialogue mode.\nHarrison Chase: What are the core factors driving these changes? Is it simply the improved performance of large models, or are there other influencing factors?\nJason Ginsberg: I believe the improvement in large model performance is a key factor, as it allows developers to trust the quality of code generated by agents more. Previously, everyone had to conduct very thorough reviews of the code generated by agents.\nAdditionally, there are now more sophisticated code review tools. For example, we have BugBot, and there are many similar tools in the market that can automatically check for issues in the code.\nMoreover, I think the acceptance and confidence of developers in agent tools have been steadily increasing, to the point where they have become \u0026ldquo;addicted\u0026rdquo; to the convenience these tools offer. Once accustomed to relying entirely on agents for coding, switching back to traditional coding methods can be quite challenging. As a result, we are seeing more and more developers adopting agent-assisted programming as their default mode of operation.\nThe Secrets of Top Engineers: Relying on Agents? Harrison Chase: What differences have you observed in how people use Cursor? Or how do you personally use Cursor?\nJason Ginsberg: Internally, our engineers use Cursor in a variety of ways. There are even a few engineers on the team who do not use the agent features at all, such as those responsible for security and infrastructure. So, there is indeed a portion of users who heavily rely on the code auto-completion feature, with most of their operations based on that. Surprisingly, I have found that some of the top engineers on the team, whom we call \u0026ldquo;core users,\u0026rdquo; rely entirely on agents for their work and even run multiple agents in parallel to handle tasks.\nAs for my personal usage habits, I do not design complex prompts or have any so-called \u0026ldquo;agent usage secrets.\u0026rdquo; My prompts are often quite short and may even contain spelling errors. I start multiple agents simultaneously for different tasks or different modules of the same problem and then wait for their results.\nCurrently, the feature I use the most is a new debugging mode we just released today. In this mode, agents can generate logs for self-evaluation, and developers can reproduce the relevant operational steps. The agent will then check the logs to determine if the issue has been resolved. This feature is very practical because it allows for continuous attempts to solve problems through computational power, ultimately tackling those issues that are extremely difficult to troubleshoot manually.\nHarrison Chase: What is the debugging mode like? Why is there a need for a dedicated mode? Can\u0026rsquo;t debugging be done automatically? Can\u0026rsquo;t we just give the agent debugging instructions?\nJason Ginsberg: I actually agree with your point. So, during the development of the debugging mode, we had quite a bit of internal debate. The main reason is that Cursor already has many functional modes, such as planning mode, inquiry mode, etc., which are not easy for users to discover. We always believed that these modes are very practical, and ideally, the agent should automatically match and enable the most suitable mode based on the user\u0026rsquo;s operational context, without requiring manual switching.\nCurrently, the debugging mode needs to be manually activated because its interaction method is quite special. During operation, the agent pauses its current work to ask the user for feedback. If the user is not familiar with this interaction logic, it may be somewhat confusing.\nHarrison Chase: What kind of questions does the agent ask, and what kind of feedback does it require from the user?\nJason Ginsberg: Let me give you an example. Suppose I am developing a front-end application and encounter a frustrating issue: the menu always pops up in the top left corner. I would tell the agent, \u0026ldquo;This menu needs to be anchored to the button\u0026rsquo;s position.\u0026rdquo; Then, the agent would start the server and add a lot of logs throughout the codebase while proposing a series of hypotheses that could lead to the issue, such as \u0026ldquo;It might be a positioning parameter error\u0026rdquo; or \u0026ldquo;There might be an issue with the event binding logic.\u0026rdquo; After that, the agent would prompt me, \u0026ldquo;Please click this button to open the menu and see if the issue is resolved.\u0026rdquo; If I report that the issue still exists, the agent would check the generated logs and analyze, determining which hypotheses are valid. Usually, after two or three iterations of this process, the agent can identify and resolve the issue.\nHarrison Chase: How long do you think humans will still need to perform manual operations? Can\u0026rsquo;t the agent autonomously handle clicks and tests?\nJason Ginsberg: In one to two months, given the rapid pace of development in this industry.\nHarrison Chase: Earlier, you mentioned various modes of the agent, such as planning mode, inquiry mode, debugging mode, etc. What do these modes mean in practical application? Is it just about setting different prompts for the agent, or is there more complex logic behind them?\nJason Ginsberg: Many times, it is indeed just a matter of modifying system-level prompts. However, in some cases, we also need to make corresponding adjustments to the user interface. For example, the planning mode now also includes an interactive questioning feature that actively interrupts user operations during execution to seek feedback. Users can sometimes set parameters themselves, such as adjusting the frequency of agent interruptions. As for inquiry mode, it does not just rely on specific system prompts but also restricts the agent from calling certain file editing-related tools to ensure the stability and reliability of the functionality.\nHarrison Chase: Returning to the previous topic, regarding the different ways people use Cursor, do you think there is a so-called \u0026ldquo;best way\u0026rdquo; to use coding agents or Cursor in the future?\nJason Ginsberg: I don\u0026rsquo;t think there is a \u0026ldquo;best way.\u0026rdquo; The specific usage method largely depends on the individual engineer\u0026rsquo;s work habits and the specific tasks they are handling. Currently, there are both asynchronous applications of agents and modes where developers are deeply involved in real-time interactions, much like programming while visually adjusting code or conducting visual editing operations. However, I often see some so-called \u0026ldquo;agent usage tips\u0026rdquo; on Twitter, and I am somewhat skeptical about them. Many people claim, \u0026ldquo;This is the best way to use agents,\u0026rdquo; but in my opinion, these tips are often fabricated.\nInternally, our team does not use long, complex prompts or adopt multi-stage planning strategies. Most of the time, we iterate quickly. If the results of the agent\u0026rsquo;s operation are not satisfactory, we simply terminate the process and restart the agent. Typically, this method is the most efficient.\nIs Natural Conversation the Ultimate Interaction Mode for Cursor? Harrison Chase: If you were to predict the situation a year from now, how do you think developers will use Cursor across IDEs, terminals, and other forms?\nJason Ginsberg: Of course, I would have a certain subjective bias. But I believe terminal tools will not become the users\u0026rsquo; first choice. I think what truly drives industry development is the increasing trust users have in agents. They prefer to wait until the agent has completed all tasks before reviewing the final modifications and deciding whether to adopt them, and they are also willing to let the agent run longer to achieve smarter processing.\nThe importance of IDEs lies in the fact that they are tools tailored for the entire software development cycle. From project planning to running code modifications, reviewing code content, clearly comparing code differences, submitting code merge requests, and previewing effects in the browser, all these steps can be seamlessly integrated into the modular functionality of IDEs. This is something that can easily be overlooked, as these IDE features have been refined over decades of development.\nI believe a clear trend in the current industry is that product-level design is becoming increasingly important. Now, the most frequently used features by Cursor users, such as planning mode, actually require support from visual editors. Users need to be able to add comments in the editor and interact in real-time. Once detached from visual interactive elements like buttons, pop-ups, and menus, the difficulty of user interaction with tools increases significantly.\nHowever, I believe that not all operations in the future need to be confined to the IDE on a laptop. This mode will not be completely replaced; the specific usage scenarios will flexibly change based on actual needs, and the applicable scenarios will become broader. Users will be able to use tools like Cursor in more contexts.\nHarrison Chase: There will be more scenarios where tools like Cursor can be used. You must have a corresponding website, right? Can users interact directly on the web? Is that the idea?\nJason Ginsberg: Yes, we do have a website. The reason for this is that users can access it anytime and anywhere through devices like smartphones. I believe that in the near future, users will be able to wear AirPods, activate voice mode, and communicate in real-time with the agent, brainstorming ideas and allowing the agent to continuously optimize solutions. When users arrive at the office and open their laptops, they will already have a pile of code modification records or demo videos waiting for review, at which point they will only need to confirm or reject them. If some details need fine-tuning, they can download the project locally for modifications.\nHarrison Chase: I think Cursor\u0026rsquo;s real advantage lies in the comprehensive design and user experience system built around agent interaction. You previously worked at Notion, and I remember that even before the rise of generative AI, Notion\u0026rsquo;s design and user experience were already widely recognized. Of course, they have also successfully transformed in the era of generative AI. From a company with an excellent design foundation before the generative AI boom to one now focused on agent-related work, how do you think the emergence of agents has changed product design and user experience? Are the current work modes similar to those before?\nJason Ginsberg: Overall, I believe that most of our product design is not AI-exclusive. The interactive components and user experience patterns available for products are limited, and applications on the market are fundamentally built on some traditional models, such as inboxes, dashboards, and chat interfaces, which are all mature designs. Therefore, our core work is more about reasonably combining these existing design patterns and presenting them appropriately in the product. This is in line with Notion\u0026rsquo;s product philosophy and is also a core characteristic of Cursor and integrated development environments (IDEs): a high degree of modularity.\nAs a user, you will find that everyone’s IDE interface layout can vary significantly. You can customize the panel layout, dragging and dropping any component to any position, creating a completely different interface from your colleague sitting next to you. I believe this modular design is crucial for product adaptability, as, as I mentioned earlier, the capabilities of agents are evolving rapidly, and user needs and expectations change almost every few weeks. When we launched Cursor 2.0 a few months ago, we did not completely overhaul the original product; we simply rearranged the various functional modules into a sidebar inbox-style management layout while optimizing the information density of the chat interface.\nHarrison Chase: It sounds like many components share underlying logic. Have any new components emerged? Or have the priorities of certain components changed? After all, these components were initially designed for \u0026ldquo;human-software interaction\u0026rdquo; and \u0026ldquo;human collaboration through software,\u0026rdquo; and now with the introduction of agents, has anything changed?\nJason Ginsberg: I believe the underlying design logic and core elements have not changed; the key change is who is leading the interface interaction. Within this core framework, countless interaction forms can evolve. For example, a year ago, when people used agents, they were eager to watch every step of the operation, closely monitoring everything. But now, the operational steps of agents have become incredibly complex, and users simply cannot keep up. Therefore, we need to optimize how information is presented: how to group operational steps? How to distill key information?\nOnce users trust the agent\u0026rsquo;s operations enough, we need to focus on the actual content of file modifications and provide more detailed annotations for these modifications. Of course, we can further enhance the flexibility of interactions, such as allowing conversations not to be limited to a single agent but to engage with multiple agents simultaneously. This requires a more intelligent backend interaction logic to support it, where the system must recognize which sub-agent the user is conversing with and coordinate these agents to complete the corresponding modifications. In the future, this level of interaction abstraction will continue to rise.\nHarrison Chase: What do you think is the highest level of interaction abstraction that can be achieved? I know predicting the future is difficult, but I would still like to hear your thoughts.\nJason Ginsberg: I think in the future, various operational options we currently see, such as selecting models, choosing functional modes, and selecting operating environments, will gradually disappear. The final interaction mode will become as natural as conversing with a real person. However, this does not mean that anyone can write code casually; at that stage, this tool will still serve professional engineers. Because you still need to have a grasp of industry-specific terminology and understand what you want to modify. Product people need to clarify their desired workflows and functional requirements; infrastructure people need to have a solid understanding of the codebase and know what architecture and system design are most suitable for the project they are developing.\nI also want to emphasize that as the level of abstraction increases, we will not discard existing functionalities. Users can still dive deep into the details and adjust parameters at any time. The default interaction mode of the product will just continue to optimize and upgrade.\nInside Cursor: Less Code Review, More Frequent Feedback Harrison Chase: You previously mentioned the role of humans in the agent workflow, such as reviewing code differences and conducting code reviews. How do you think AI will change the code review process?\nJason Ginsberg: First of all, in terms of our product team\u0026rsquo;s workflow, the proportion of manual reviews has significantly decreased. We have a tool called BugBot that automatically detects code issues and autonomously completes fixes, continuously iterating and optimizing within the continuous integration (CI) process. This tool performs exceptionally well and has given us more confidence in the quality of AI-reviewed code.\nSecondly, there is semantic grouping of information. When users review code differences, they can clearly see what modifications the agent has made. We can even display the agent\u0026rsquo;s original instructions, and ideally, the agent could annotate each modification with explanations of why it was made when handling large code merge requests. While this may not be a revolutionary change, it can significantly optimize the code review process.\nHarrison Chase: Out of curiosity, I want to ask, do Cursor engineers write code using Cursor and have BugBot review the code? Do they still need to communicate and collaborate with other engineers?\nJason Ginsberg: Haha, that\u0026rsquo;s an interesting question. If you join Cursor as an engineer, you will immediately notice that everyone is deeply using our own product. I remember during my first week, I modified a shortcut setting. That shortcut was Alt+Shift+Command+J, which is quite obscure, and I thought no one would notice it. However, less than half a minute after I made the change, three colleagues messaged me on Slack, saying, \u0026ldquo;The shortcut you changed has disrupted my workflow! What happened?\u0026rdquo; Almost any product change receives immediate and strong feedback from colleagues. I think this is a good thing; everyone is rapidly advancing product iterations through this high-frequency feedback and communication.\nHarrison Chase: From an organizational management perspective, have you taken any measures to encourage or guide this high-frequency feedback collaboration model? After all, a large volume of feedback can sometimes be overwhelming.\nJason Ginsberg: Before I founded my own company, engineers would communicate via email, but it wasn\u0026rsquo;t used much. People even said, \u0026ldquo;Email is only for receiving spam and shopping notifications; don\u0026rsquo;t use it to send lengthy work content.\u0026rdquo; In the agent space, there is no need to rely on the inefficient communication method of email. Everyone on our team is fully engaged in their work, as this is a highly competitive field, and everyone is passionate about product development, naturally using various instant communication tools for collaboration.\nAdditionally, when planning product features, I follow a core principle: What features can I develop to make my daily work easier? Specifically, I think about \u0026ldquo;What can help me work more efficiently tomorrow without dealing with annoying errors and issues?\u0026rdquo; This principle guides most of our work. After all, once such features are developed, we can immediately benefit from them, like fixing an annoying bug so that we won\u0026rsquo;t be troubled by it again at work.\nThe Core Features Driven by Employees\u0026rsquo; Needs Harrison Chase: How much of your product roadmap is driven by the need to \u0026ldquo;make work easier for ourselves\u0026rdquo;? How much comes from external user needs? Has this proportion changed as the company has grown?\nJason Ginsberg: This proportion has indeed changed as the company has scaled. We now also set monthly product roadmaps and goals, but to be honest, many of our core features have come from bottom-up innovation. For example, the agent feature of Cursor is probably the core feature that comes to mind when people think of Cursor. This feature was developed by one of our team members, and initially, no one believed in the idea, but he quickly created a prototype. After everyone tried it, they were amazed, saying, \u0026ldquo;Wow, this thing really works!\u0026rdquo;\nThe debugging mode I mentioned earlier is similar. During the Thanksgiving holiday, I was bored and developed this feature that I needed, and now it is about to be launched. The initial intention behind developing these features was to address internal needs. We assess whether a feature is ready for release based on its internal usage rate and recognition.\nHarrison Chase: Your product iteration speed is astonishing. How do you maintain such an efficient development rhythm?\nJason Ginsberg: To be honest, our workflow is very streamlined, without too many cumbersome systems. While there are a few meeting rooms in the company and one or two product managers, we rarely advance work through writing documents or holding alignment meetings. Most discussions and decisions are made at the code level. The core reason this is possible is our extremely high talent requirements. Earlier this year, the company had only about 20 people. The reason for the slow growth in team size is that our hiring standards are almost harsh. We repeatedly evaluate: this person is excellent, but can they become one of the top people in the team?\nBecause everyone in the team is outstanding, we can confidently assign tasks to anyone. Team members are highly proactive, from proposing ideas and designing user experiences to responding to user support requests on Twitter, communicating requirements with enterprise clients, and ultimately implementing features. Therefore, our ability to maintain this speed ultimately comes down to the people.\nHarrison Chase: How do you plan your product roadmap? You mentioned a monthly planning cycle; is this the standard planning duration now? Is there any longer-term planning? Additionally, the pace of technological iteration in the industry is incredibly fast. How do you balance \u0026ldquo;keeping up with existing technology trends\u0026rdquo; and \u0026ldquo;achieving technological breakthroughs\u0026rdquo;? Do you actively anticipate technological trends and lay out future directions in advance?\nJason Ginsberg: We do invest a lot of energy in thinking about the future, such as anticipating potential technological breakthroughs in the next three months and proactively betting on related directions. The monthly roadmap we set is more focused on core product features, addressing actual user needs and those features that can optimize daily usage experiences. Major projects that require two months to reconstruct underlying logic will be included in longer-term planning.\nMoreover, our adaptability is quite strong. Sometimes we receive early access to test versions of new models, and after trying them out, if we find they perform exceptionally well in certain areas, team members often voluntarily work overtime on weekends to complete related feature development before the new model is officially released. Many important features can actually be built in just a few days.\nHarrison Chase: Speaking of models, you released your self-developed Composer model. What was the intention behind developing this model? How is user adoption currently? Has this model changed how people use Cursor?\nJason Ginsberg: We found that the coding scenarios in which engineers use our product require a model specifically tailored to support them. The Composer model is designed for these scenarios, with a clear focus on speed, quality, and intelligent logic, making it particularly suitable for \u0026ldquo;human-machine real-time collaboration\u0026rdquo; scenarios. I frequently use it in my front-end development because I need to make frequent subtle interaction design decisions, which requires the agent to provide feedback within seconds. Composer acts like an efficient collaborative partner, quickly responding to needs and brainstorming ideas, complementing models suitable for long-term asynchronous tasks very well.\nHarrison Chase: Is the research and development of Cursor\u0026rsquo;s agent-related work a team effort, or is there a dedicated team responsible for it?\nJason Ginsberg: We do have a dedicated team responsible for optimizing the performance of agents, focusing mainly on building toolchains, scheduling frameworks, and effect evaluations. However, as I mentioned earlier, our team structure is not rigid, and there are no strict limitations on everyone’s work scope. For instance, if engineers from the core product team need to make adjustments to the agent while developing the planning mode, they will closely collaborate with the agent team. Moreover, during the development process, we still deeply use our own products for testing, and team members share their experiences to evaluate the actual effectiveness of features.\nHarrison Chase: Do members of the agent team or other engineers skilled in agent development share any common traits? Are there any particular aspects of their professional background or personal abilities?\nJason Ginsberg: I think most of them are more product-oriented talents rather than traditional machine learning or algorithm research experts. These individuals often rotate between different teams because developing agents requires a strong intuition for the final user experience and the ability to accurately interpret team feedback.\nHarrison Chase: Last week, you collaborated with OpenAI to publish a blog about optimizing Cursor\u0026rsquo;s agent scheduling framework based on OpenAI\u0026rsquo;s new model. I often see discussions about the concept of \u0026ldquo;agent scheduling framework\u0026rdquo; on Twitter. How do you view the underlying support architecture for models? Does this architecture need to be deeply bound to specific models? For example, would the architecture for the Composer model differ significantly from that for the CodeLlama model?\nJason Ginsberg: I haven\u0026rsquo;t been deeply involved in this area of work, but to my knowledge, our core goal is to create a highly flexible architecture. After all, we need to continuously experiment with new technologies and functional modes, so the architecture must quickly adapt as model capabilities upgrade.\nHarrison Chase: That makes sense. The entire industry is indeed changing rapidly.\nOpen Q\u0026amp;A Questioner 1: Earlier, you mentioned the new visualization browser feature. I noticed that some tools like Lovable also have similar features. Is this feature developing towards \u0026ldquo;immersive visual coding\u0026rdquo;?\nJason Ginsberg: I don\u0026rsquo;t think it is designed for immersive visual coding. As I mentioned earlier, this feature was initially developed for myself, as I am a product engineer, and its core user group is actually professional engineers and designers. When developing applications, everyone has encountered situations where a carefully designed interface ends up becoming the same old purple-yellow gradient that everyone is tired of. This feature is intended to allow users to have precise control over details, such as adjusting padding to exact pixel values. It provides users with a more intuitive \u0026ldquo;visual operation language,\u0026rdquo; which is more precise than pure text commands.\nMoreover, even without using the sidebar, you can directly click on page elements and input prompts to issue commands at any time. With this feature, you can start six agents simultaneously in just a few seconds. If you enable hot reloading, your website will present modification effects in real-time, which is quite interesting to use.\nQuestioner 2: I particularly love your browser agent and have been using it. However, I noticed a small flaw: I want to continuously iterate and optimize design solutions, but the agent always interrupts my work by directly submitting code merge requests. Is there a possibility of achieving uninterrupted continuous iteration in the future?\nJason Ginsberg: Absolutely. The future direction is to enable the agent to have autonomous evaluation capabilities, allowing it to run continuously for extended periods and iterate based on needs. The current debugging mode still requires manual clicks to confirm log information, but this is just a transitional solution. The ideal state is for the agent to autonomously complete evaluations and iterations until the issue is fully resolved.\nQuestioner 3: I don\u0026rsquo;t know if you are deeply involved in the development of agent-related work, but I noticed that Cursor\u0026rsquo;s memory management feature is quite good. It can autonomously manage relevant information based on individual engineers, departments, and even the entire company\u0026rsquo;s preferences, rules, and processes. We all know that information and context are crucial for agents. Do you have plans to further expand and upgrade this feature? Especially regarding long-context processing, what ideas do you have?\nJason Ginsberg: We are conducting a lot of experiments and explorations. We have already implemented several functional modules such as rule management, memory recall, and skill libraries. Currently, we are primarily researching efficient information summarization techniques. Additionally, with our self-developed model, we are exploring ways to enable the model to autonomously identify key information that repeatedly appears in conversations or code. Of course, cross-organizational information sharing is also worth exploring. However, there is a point to note: relevant rules and information may become outdated with model iterations. Therefore, we must ensure that users can easily update this content to avoid being constrained by outdated rules.\nQuestioner 4: Regarding the Composer model you released, I know some developers who fine-tuned a specialized model for the medical field based on the Gemini model. However, they found that the fine-tuned model performed worse than directly using the native Gemini model for single prompt calls. They analyzed that the reason is that fine-tuned models require continuous maintenance to keep up with updates to foundational models like Gemini. How do you formulate strategies to ensure that the Composer model does not become outdated?\nJason Ginsberg: You are referring to the Composer model, right? We will continuously iterate and optimize it; it is not a static model. Our core focus is to find the best balance between speed and intelligence to meet Cursor users\u0026rsquo; needs in most scenarios. However, we do have room for improvement in specific areas like long-context processing.\nQuestioner 5: I am a product manager and have been using Cursor for prototype development, even playing the role of a designer in my team, using it to replace Figma. I am curious if there are users who, before using Cursor, had never installed any integrated development environment (IDE)? Will this group of users become a key focus for you in the future? After all, the current coding agents are already powerful enough to accomplish many tasks.\nJason Ginsberg: To be honest, we are not currently focusing on this group of users as a core target. Of course, we recognize that the usability of tools needs to be continuously improved, and the ease of use of Cursor is also steadily increasing, such as the new browser tool being friendly for designers. However, our core goal is actually to empower top engineers. We have been thinking about how to make the best engineers in the world even stronger. In this process, the tools we develop will naturally benefit a broader audience. However, we still have a lot of work to do in product optimization, such as improving onboarding and environment configuration processes. After all, designers and product managers often encounter difficulties when configuring tools like GitHub. We hope to attract more users to try Cursor by optimizing these aspects.\nQuestioner 6: I have been trying to use Cursor to build a verification matrix for smart contracts and test run logic. Do you have any lesser-known practical workflows to recommend for deep quality testing and security reinforcement? Or can the debugging tools mentioned earlier come in handy? I am particularly interested in the quality testing of smart contracts.\nJason Ginsberg: To be honest, we are trying to enable the agent to autonomously complete testing tasks, but this feature has not been fully released yet. For those involved in quality testing, I strongly recommend trying out our newly released debugging mode. This feature has a very clear logic for identifying issues, and it can be said to be deterministic, which will be very helpful.\nQuestioner 7: What do you think is the biggest opportunity for Cursor in the next two to four months? Will it be the voice agent?\nJason Ginsberg: I think the opportunity does not lie in the voice agent. The core need of users at this stage is actually to make agents smarter, run longer, and handle more tasks. Many current agents essentially only \u0026ldquo;read code\u0026rdquo; and cannot genuinely determine whether the modified code is effective. There is a vast space for future development; we can invest more computational power to allow agents to take on more of the verification work currently handled by humans. I believe that in the next three to six months, the entire industry will undergo significant changes, which is very exciting.\n","date":"2026-01-18T00:00:00Z","permalink":"/posts/note-34c83a9598/","title":"The Rapid Evolution of Coding Agents: Insights from Cursor's Jason Ginsberg"},{"content":"\nA few days ago, a story shared by AI researcher Elena went viral in the tech community.\nDespite reading papers and testing models daily, she had never written a line of code. One day, she faced a tough problem: 4000 lines of messy data needed cleaning. Manual processing would take at least 6 hours.\nShe decided to give it a try and described her needs to Claude in natural language. 45 seconds later, she received a Python script. Running it was a success. Six hours of work was compressed into just one minute.\nThe key point is that she didn’t understand the code at all, but it worked.\nThis is the essence of Vibe Coding: You don’t need to learn Python or JavaScript; you just need to learn how to clearly express your intentions.\n01. Core Misunderstanding: Vibe Coding is About Communication, Not Programming Most people fail when trying to use AI to write code because they fall into a trap: thinking AI is a god that can read your mind.\nConsider the difference between these two prompts:\n❌ Vague version: \u0026ldquo;Help me create an email tool.\u0026rdquo; ✅ Specific version: \u0026ldquo;Write a Python script that reads a CSV file. First, check if each row\u0026rsquo;s email format is correct; second, remove duplicate emails; third, output a new CSV file containing only valid and unique emails; finally, print a summary of \u0026rsquo;total processed, invalid count, duplicate count.'\u0026rdquo; In the first prompt, AI can only guess. It’s likely to guess wrong and give you a bunch of useless junk.\nIn the second prompt, AI doesn’t need to guess. It just needs to execute.\nThe first principle of Vibe Coding: Treat AI as a brilliant intern who knows nothing about your background.\nYou wouldn’t say to an intern, \u0026ldquo;Get that done,\u0026rdquo; you would tell them:\nWhat the input is (data source, format). What the logic is (what to do first, what to do next, how to handle exceptions). What the output is (where the results are stored, what they look like). You don’t need to understand technical jargon, but you need to write it out clearly like a handover document.\n02. Avoiding Pitfalls: Don’t Try to Eat the Whole Cake at Once Elena shared her most painful failure. She wanted to create a \u0026ldquo;Twitter bookmark analyzer\u0026rdquo; with comprehensive features: fetching, analyzing, categorizing, and generating weekly reports.\nThe first time, she simply told AI, \u0026ldquo;Help me make this analyzer.\u0026rdquo;\nWhat was the result? A disaster. AI threw a bunch of complex API calls, dependencies, and error messages at her. She struggled for 4 hours before giving up.\nA week later, she changed her strategy: Micro-Steps.\nStep one: \u0026ldquo;Write a script that fetches the latest 100 bookmarks.\u0026rdquo; — 10 minutes, successful. Step two: \u0026ldquo;Extract the text from each bookmark.\u0026rdquo; — 10 minutes, successful. Step three: \u0026ldquo;Categorize by keywords.\u0026rdquo; — Successful. Step four: \u0026ldquo;Save as a JSON file.\u0026rdquo; — Successful. In one hour, she accomplished what took her 4 hours before.\nThis aligns with the ultimate truth of software engineering: Don’t build the wheel before the car; first build a skateboard, then modify it into a bicycle, and finally into a car.\nThe secret of Vibe Coding is to let AI do one thing at a time, ensuring that it can run independently. As long as it runs, you get positive feedback and can continue stacking blocks.\n03. Why Choose Claude? Because It Understands \u0026ldquo;Questions\u0026rdquo; Elena tried all mainstream AIs (ChatGPT, Cursor, Copilot) and ultimately chose Claude. Not because its code quality is always the best, but because it doesn’t pretend to understand everything.\nChatGPT style: You give a vague instruction, and it confidently generates an answer. By the time you realize it’s wrong, you’ve wasted half an hour. Claude style: It pauses to ask you: \u0026ldquo;Just to confirm, do you want X or Y?\u0026rdquo; \u0026ldquo;I assume you need Z, right?\u0026rdquo; For those who don’t understand code, these clarifying questions are lifesavers. They save you countless hours of debugging the wrong direction.\nAdvanced tip: Regardless of which AI you use, you can force it to be \u0026ldquo;more like Claude\u0026rdquo; by adding to your prompts:\n\u0026ldquo;If anything is unclear, please confirm with me before proceeding.\u0026rdquo; \u0026ldquo;Before writing code, first describe your thoughts in natural language.\u0026rdquo;\n04. Just Small Tools? No, This is a Liberation of Productivity In six months, Elena, with no prior experience, built:\nA script to automatically organize bookmarks (saving 30 minutes daily) A bot to monitor posts from specific influencers A PDF batch summarizer (a researcher\u0026rsquo;s tool) A competitor price monitor A web data scraping tool A year ago, each of these needs would have cost hundreds or thousands of dollars to outsource or negotiate on Upwork. Now? One afternoon, a cup of coffee, and she handled it herself.\nThis is the value boundary of Vibe Coding:\nIt’s not suitable for: Core systems in finance and healthcare, large applications with thousands of concurrent users, or projects involving high security and privacy. It’s highly suitable for: Personal automation, internal efficiency tools, MVP prototype validation, and data processing scripts. For these 80% of long-tail needs, you no longer need a computer science degree; you just need a clear mind and a bit of patience.\n05. Don’t Wait, Start with the \u0026ldquo;Annoying\u0026rdquo; Tasks If you feel the urge to start coding, don’t think about creating a world-changing app.\nFind something annoying, repetitive, or frustrating in your daily life.\nIs it a messy downloads folder? Is it an Excel sheet that needs to be manually merged every month? Is it several websites you check daily? Open AI and describe your needs as if you’re teaching a smart person.\nDid it throw an error the first time? Don’t panic. Send the error message back and ask, \u0026ldquo;What’s going on? I expected A, but got B.\u0026rdquo;\nThe gap between \u0026ldquo;I have an idea\u0026rdquo; and \u0026ldquo;I made it happen\u0026rdquo; has never been narrower than it is today.\nStop envying programmers and start commanding AI.\n","date":"2026-01-10T00:00:00Z","permalink":"/posts/note-3965ac6cb7/","title":"Unlocking Productivity with Vibe Coding: A Beginner's Guide"},{"content":"Global Trends in 2025 As we approach 2025, nearly 20 countries have selected their annual words, reflecting the pulse and trends of the world in that year. These words may resonate with your memories of the past year.\nTurmoil and Trade In 2025, the world is not at peace, as evident in the annual words chosen by multiple countries.\nIn Singapore, the character \u0026ldquo;荡\u0026rdquo; (meaning \u0026ldquo;to sway\u0026rdquo;) was selected as the annual Chinese character for 2025. According to an article by Lianhe Zaobao, over 160,000 votes were cast for \u0026ldquo;荡\u0026rdquo; out of more than 430,000, summarizing the profound effects and turmoil caused by a series of actions from the Trump administration in the United States, reflecting a sense of unease in today\u0026rsquo;s world.\nIn South Korea, 766 university professors selected the four-character idiom \u0026ldquo;变动不居\u0026rdquo; (meaning \u0026ldquo;constant change\u0026rdquo;) as the annual phrase for 2025. Professor Yang Il-moo from Seoul University explained that this idiom signifies the continuous flow and change in the world, reflecting the intense transformations experienced in South Korea, including presidential impeachment, political strife, and geopolitical tensions.\nMany citizens believe that the trade war initiated by the United States is one of the significant causes of global turmoil in 2025. The character \u0026ldquo;税\u0026rdquo; (meaning \u0026ldquo;tax\u0026rdquo;) was chosen as the annual Chinese character in Malaysia, while \u0026ldquo;关税\u0026rdquo; (meaning \u0026ldquo;tariff\u0026rdquo;) was selected by the Spanish Royal Academy of Language and the Spanish Language Urgent Terms Foundation as the word of the year. The foundation noted that the tariffs imposed by the Trump administration have dominated international news for months and continue to do so.\nIn Switzerland, the Italian-speaking region also selected \u0026ldquo;关税\u0026rdquo; as the annual word. The Malaysian committee chair, Wu Hengcan, stated that the choice of \u0026ldquo;税\u0026rdquo; reflects strong opposition from developing countries against hegemonic bullying.\nIn addition to tariffs, Finland\u0026rsquo;s Language Research Institute chose \u0026ldquo;无人机墙\u0026rdquo; (meaning \u0026ldquo;drone wall\u0026rdquo;) as the international buzzword of the year, reflecting local reactions to the geopolitical tensions stemming from the Russia-Ukraine conflict. The word \u0026ldquo;移民\u0026rdquo; (meaning \u0026ldquo;immigrant\u0026rdquo;) ranked second in Portugal\u0026rsquo;s annual vocabulary list due to policy controversies surrounding immigration in European countries. Norway\u0026rsquo;s Language Council selected \u0026ldquo;科技寡头\u0026rdquo; (meaning \u0026ldquo;tech oligarch\u0026rdquo;) as the annual keyword, pointing to the digital sovereignty struggle in Europe and the United States.\nAI and Its Impact In 2025, artificial intelligence (AI) is empowering various industries at an unprecedented pace, changing lives worldwide. The term \u0026ldquo;人工智能\u0026rdquo; (meaning \u0026ldquo;artificial intelligence\u0026rdquo;) appears on many countries\u0026rsquo; annual word lists.\nThe German Language Association named \u0026ldquo;人工智能时代\u0026rdquo; (meaning \u0026ldquo;the era of AI\u0026rdquo;) as the word of the year, indicating that AI has moved from the ivory tower of scientific research into mainstream society. More people are using AI tools for tasks ranging from online searches to dynamic photo generation and text writing.\nThe Collins Dictionary in the UK selected \u0026ldquo;氛围编程\u0026rdquo; (meaning \u0026ldquo;vibe coding\u0026rdquo;) as the word of the year, illustrating the shift in programming from a professional skill to an expression of intent, highlighting AI\u0026rsquo;s impact on creativity and work methods.\nHowever, the explosive growth of AI brings both excitement and concern. Merriam-Webster and Australia\u0026rsquo;s Macquarie Dictionary independently chose \u0026ldquo;Slop\u0026rdquo; or \u0026ldquo;AI Slop\u0026rdquo; as the word of the year, referring to low-quality digital content typically generated in bulk by AI, or \u0026ldquo;AI garbage.\u0026rdquo;\nThe publisher of Merriam-Webster stated that absurd videos, distorted images, vulgar content, and misleading fake news generated by AI have flooded the internet, causing public distaste while being widely consumed and shared. The term \u0026ldquo;Slop\u0026rdquo; conveys that AI is sometimes not as \u0026ldquo;super-intelligent\u0026rdquo; as it seems when it comes to replacing human creativity.\nThe Finnish Language Research Institute summarized this phenomenon as \u0026ldquo;人工智能泥潭\u0026rdquo; (meaning \u0026ldquo;AI quagmire\u0026rdquo;) in its annual buzzwords.\nThe term \u0026ldquo;幻觉\u0026rdquo; (meaning \u0026ldquo;hallucination\u0026rdquo;) was selected by the renowned Dutch dictionary publisher Van Dale as the word of the year, referring to the false and absurd information generated by large models like ChatGPT when queried.\nIn the UK, media reports indicate that AI and online platforms are profoundly reshaping how people experience emotions and interact with each other. The Oxford Dictionary\u0026rsquo;s word of the year is \u0026ldquo;愤怒诱饵\u0026rdquo; (meaning \u0026ldquo;rage bait\u0026rdquo;), which refers to content deliberately designed to provoke strong emotions like anger to boost web traffic or social media engagement. The Cambridge Dictionary selected \u0026ldquo;准社交\u0026rdquo; (meaning \u0026ldquo;parasocial\u0026rdquo;), denoting a one-sided emotional connection with someone, whether a chatbot, an unknown celebrity, a book, or a movie. In Romania, media outlets suggest that \u0026ldquo;准社交\u0026rdquo; has become a keyword reflecting the social reality of high social media usage, particularly among the youth.\nAnxiety and Economic Concerns As the world undergoes turmoil, distinguishing between truth and falsehood online becomes challenging. The annual words from various countries reflect the anxieties and concerns of ordinary people.\nOn December 12, the abbot of Kiyomizu Temple in Kyoto, Japan, wrote the character \u0026ldquo;熊\u0026rdquo; (meaning \u0026ldquo;bear\u0026rdquo;) to reveal the annual character reflecting the sentiments of Japanese society in 2025. This choice was made due to the phenomenon of bears appearing in various regions, with 230 people affected by bear attacks from April to November, marking a historical high. Media commentary suggests that the bear incidents have caused anxiety in affected areas, while political issues have also left many Japanese citizens worried.\nIn Japan\u0026rsquo;s selection, the character \u0026ldquo;米\u0026rdquo; (meaning \u0026ldquo;rice\u0026rdquo;) narrowly ranked second, followed by \u0026ldquo;高\u0026rdquo; (meaning \u0026ldquo;high\u0026rdquo;). Japanese media report that these characters reflect the rising cost of living and the depreciation of the yen, which have led to a wave of price increases affecting the daily lives of the Japanese people.\nIn Portugal, the public voting event selected \u0026ldquo;大停电\u0026rdquo; (meaning \u0026ldquo;major blackout\u0026rdquo;) as the word of the year. On April 28, a widespread power outage occurred in Portugal and Spain, disrupting transportation, communication, and public services for hours. The publisher noted that the choice of \u0026ldquo;大停电\u0026rdquo; reflects a deeper concern about modern life’s heavy reliance on technology.\nLife is challenging, and anxiety follows. The term \u0026ldquo;焦虑\u0026rdquo; (meaning \u0026ldquo;anxiety\u0026rdquo;) ranked first in an online vote initiated by the Russian Reading-City website, reflecting that in this turbulent era, anxiety has become a fundamental aspect of life, highlighting uncertainties about the future.\nThe term \u0026ldquo;不确定性\u0026rdquo; (meaning \u0026ldquo;uncertainty\u0026rdquo;) was chosen by the Brazilian polling agency Kaws and IDEIA Big Data, indicating that rapid changes in economy and technology, along with geopolitical friction and domestic governance issues, have made Brazilians feel that 2025 is filled with challenges, impacting daily life and personal decisions.\nResilience and Trust How can people respond to this uncertain world?\nThe character \u0026ldquo;韧\u0026rdquo; (meaning \u0026ldquo;resilience\u0026rdquo;) was selected as the annual word in the \u0026ldquo;Chinese Language Review 2025\u0026rdquo; event organized by the National Language Resources Monitoring and Research Center, Commercial Press, and Xinhua News Agency. The character encapsulates the essence of resilience, representing steadfastness, determination, and the spirit of perseverance in the face of difficulties.\nIn Russia, the State Pushkin Russian Language Institute named \u0026ldquo;胜利\u0026rdquo; (meaning \u0026ldquo;victory\u0026rdquo;) as the top buzzword, commemorating the 80th anniversary of the Soviet Union\u0026rsquo;s victory in the Great Patriotic War. South Africa\u0026rsquo;s Pan South African Language Board announced \u0026ldquo;G20 Summit\u0026rdquo; as the annual buzzword, highlighting an international conference marked by African influence and the promotion of multilateralism.\nThe Treccani Encyclopedia Institute in Italy selected \u0026ldquo;信任\u0026rdquo; (meaning \u0026ldquo;trust\u0026rdquo;) as the word of the year, reflecting people\u0026rsquo;s hopes for the future in this uncertain era. Trust can prevent polarization and \u0026ldquo;adhere\u0026rdquo; to an increasingly divided society, guiding people out of the quagmire of uncertainty. The most complex and powerful algorithm for human survival remains unchanged: mutual trust.\nNotably, the Chinese toy \u0026ldquo;拉布布\u0026rdquo; (meaning \u0026ldquo;Labubu\u0026rdquo;) was included in the annual word selections by the Finnish Language Research Institute and ranked among the cultural buzzwords in a joint selection by several authoritative institutions in Russia. Russian media noted that the \u0026ldquo;Labubu\u0026rdquo; toy has gained popularity through social media. The Finnish Language Research Institute highlighted the toy\u0026rsquo;s distinctive feature: its wide smile.\nIn facing challenges with resilience, treating friends with trust, and smiling at the future, which word would you use to describe the soon-to-be-past year of 2025?\n","date":"2025-12-30T00:00:00Z","permalink":"/posts/note-c1d3c18a2a/","title":"Words of the Year 2025 Reflect Global Trends and Concerns"},{"content":"Programming as the Connector Between Humans and AI Programming serves as a bridge for human interaction with AI. In the commercialization of various generative AI applications, programming stands out due to its highly structured nature, verifiable outcomes, and strong user payment capabilities, making it an ideal sector for commercial deployment. For a long time, Anthropic\u0026rsquo;s Claude has dominated this market with its powerful programming capabilities.\nHowever, on September 5, 2025, Anthropic announced a ban on providing Claude services to companies or subsidiaries with more than 50% Chinese capital, citing these countries as hostile. This decision directly impacts certain Chinese-funded subsidiaries in Singapore and Hong Kong.\nIn light of this ban, many domestic AI model companies have recognized a significant opportunity for domestic alternatives. At the recent Alibaba Yunqi Conference, Alibaba unveiled seven large models, particularly highlighting the upgraded flagship model Qwen3-Max, which has improved its capabilities and currently ranks third in programming ability on LMArena.\nAlibaba\u0026rsquo;s technical experts elaborated on their strategic judgment regarding AI programming: due to the verifiable nature of code, it is seen as a field that can achieve general artificial intelligence (AGI) first. Consequently, Alibaba\u0026rsquo;s ultimate goal is not merely to create a \u0026ldquo;code assistant\u0026rdquo; but to develop an \u0026ldquo;autonomous programming agent\u0026rdquo; that can independently complete complex tasks like a human engineer.\nThe smaller players, often referred to as the \u0026ldquo;Six Little Dragons,\u0026rdquo; have also found a rare opportunity for commercialization, with Kimi being a prime example. On the same day the ban was announced, Kimi K2 released an update to enhance performance and subsequently announced a limited-time half-price for its high-speed API.\nAfter Anthropic\u0026rsquo;s ban on China, Kimi K2, according to the latest ratings from the globally recognized AI programming platform Roo Code, is not only the highest-ranked open-source model but also the fastest and cheapest among the top ten models.\nRoo Code rated K2 as the highest-scoring open-source model.\nCompetitors like SenseTime and JD Cloud are also keenly watching the situation, quickly launching developer migration plans. Zhiyuan, another member of the Six Little Dragons, was quick to offer a one-click migration service and later introduced the GLM Coding Max version tailored for high-frequency developers on September 22, along with promotional activities.\nThe Starting Gun for Domestic Alternatives For many domestic AI companies, Anthropic\u0026rsquo;s ban serves as a starting gun, igniting a race to seize market opportunities. On the day of the ban, Kimi K2 released updates that improved compatibility, output speed, programming capabilities, and context length. In the following days, Kimi announced a limited-time half-price for its high-speed API, clearly aiming to attract Claude users.\nOther domestic manufacturers quickly followed suit:\nZhiyuan AI announced a one-click migration service for Claude API users and offered new users 20 million tokens for free. They also created a monthly subscription package for developers using GLM-4.5 coding, priced at only one-seventh of Claude\u0026rsquo;s cost. SenseTime\u0026rsquo;s \u0026ldquo;Riri New SenseNova\u0026rdquo; provided rapid switching services for former Claude users, along with a 50 million token experience package and dedicated consultants and training for API migration. JD Cloud officially stated it would integrate Claude Code into its JoyBuilder large model service and provide intelligent programming solutions with JoyCode + JoyBuilder to help developers transition smoothly. In contrast, traditional internet giants have shown a somewhat ambiguous attitude towards replacing Claude with Qwen. An Alibaba Cloud employee mentioned to Observer Network that \u0026ldquo;the domestic usage of Claude is low, and there are currently no plans for this.\u0026rdquo;\nBesides feeling that the market is too small, another possibility for the giants\u0026rsquo; low-profile handling could be that they have generally used Claude technology in their overseas deployments.\nByteDance\u0026rsquo;s AI code editor Trae, which has a domestic and international version similar to Douyin and TikTok, has already discontinued Claude in its domestic version, but the international version still promotes Claude as a selling point, now facing the risk of technology supply disruption.\nThe Singapore entity operating Trae, ByteDance\u0026rsquo;s subsidiary SPRING, has encountered issues as it provides OpenAI\u0026rsquo;s GPT and Anthropic\u0026rsquo;s Claude models to users through its Singapore entity. Despite navigating geopolitical and data review risks through its corporate structure, Trae has received numerous refund inquiries following the ban announcement.\nIn response, Trae\u0026rsquo;s administrator stated on the official Discord that Claude is still available and urged users \u0026ldquo;not to consider refunds for now.\u0026rdquo;\nOther companies like Alibaba\u0026rsquo;s Qcoder and Tencent\u0026rsquo;s CodeBuddy have also promoted the use of Claude in their overseas offerings, now facing the risk of technology supply disruption.\nAnthropic\u0026rsquo;s statement explicitly targets entities with over 51% Chinese capital, but there is no unified consensus on how to determine the 51% Chinese identity. The ambiguity surrounding Claude\u0026rsquo;s monopoly and the time costs and legal uncertainties involved in seeking rights protection loom over all Chinese-funded enterprises.\nThis means that Anthropic\u0026rsquo;s ban not only provides an opportunity for domestic large model companies to showcase their capabilities but also prompts many domestic developers, overseas Chinese-funded enterprises, and even foreign developers to reassess their technological routes.\nA Counterattack from Kimi Earlier this year, Kimi faced a challenging period when it lost its spotlight to DeepSeek. However, more than six months later, despite significantly reducing its investment, Kimi managed to maintain its user base amidst intense competition from DeepSeek, internet giants, and smaller players.\nThis resilience can be attributed to the release of Kimi K2 in July, which marked a profound transformation in its path.\nIn March, prominent investor Zhu Xiaohu publicly questioned Kimi\u0026rsquo;s commercial viability, stating, \u0026ldquo;Yang Zhilin can do research, but I don\u0026rsquo;t know how he will commercialize it. Kimi is leading in domestic large models, but in the long run, it must prove its value, at least to catch up with American open-source models. If it can surpass open-source, the team will truly have value.\u0026rdquo;\nThis public skepticism from a top investor cast a significant shadow over Kimi\u0026rsquo;s future and accurately predicted the challenges it would need to overcome in the following months.\nIn addition to the challenges posed by DeepSeek, the AI landscape in 2025 has become increasingly competitive, with Tencent entering the fray and leveraging its WeChat ecosystem, Alibaba embedding the Qwen model into Quark and DingTalk, and ByteDance\u0026rsquo;s Doubao maintaining stability through Douyin traffic and aggressive user acquisition.\nThis year, the frequency of product releases among AI companies has noticeably increased, with Kunlun Wanwei even launching six models within a week.\nIn contrast to its peers, Kimi has adopted a more low-key approach. This silence was broken in July when Kimi unexpectedly launched its latest model, K2.\nK2 is a model with 1 trillion parameters and 384 experts, making it the world\u0026rsquo;s first open-source model to reach this parameter count. Its design significantly lowers deployment barriers, focusing on coding and general intelligence capabilities, fully open-source, and compatible with OpenAI and Anthropic API formats, clearly targeting Claude.\nIn terms of performance, K2 achieved state-of-the-art results among open-source models and matched the levels of top closed-source models, establishing itself in the first tier of the overall large model competition.\nIn practical applications, K2 has also delivered satisfactory results for users and industry professionals.\nSeveral programmers and AI practitioners have expressed to Observer Network that from the 2025 perspective, there are virtually only two choices for AI coding products: either use Claude 3.7/4.0 from Anthropic or Google\u0026rsquo;s Gemini 2.5 Pro/Gemini Cli, while K2 has already matched these performances and even outperformed them in certain cases.\nEven though Kimi is not a reasoning model, it has demonstrated its improved capabilities on common sense problems that once stumped large models, providing correct answers to questions like which is larger, 6.9 or 6.11, or how many \u0026lsquo;r\u0026rsquo;s are in \u0026lsquo;strawberry\u0026rsquo;, as well as generating 183 instances of the character \u0026lsquo;哈\u0026rsquo;.\nJust months after its release, Kimi K2 has effectively answered Zhu Xiaohu\u0026rsquo;s three \u0026ldquo;soul-searching questions\u0026rdquo;: in terms of technology, K2 has not only \u0026ldquo;caught up\u0026rdquo; but even \u0026ldquo;surpassed\u0026rdquo; American open-source models in several dimensions, as evidenced by its top ranking on Roo Code; in terms of commercialization, Kimi has shifted from a vague C-end tipping model to a clearer commercial path focused on high-value, long-chain tasks.\nThe launch of K2 and its commitment to open-source mark a fundamental shift in Kimi\u0026rsquo;s corporate strategy.\nIn November last year, Kimi\u0026rsquo;s founder Yang Zhilin explained why Kimi chose to invest heavily in marketing. He believed that Kimi\u0026rsquo;s core task was to ensure retention and growth since technology would continue to iterate while API prices would fluctuate, but customer acquisition costs would only rise. By investing early to solve customer acquisition issues, Kimi could not only build user loyalty but also leverage user data to create a positive feedback loop.\nFrom a purely competitive perspective in the chatbot space, Yang Zhilin\u0026rsquo;s strategy seemed sound. However, with the emergence of DeepSeek at the end of January this year, the entire market landscape was rapidly disrupted.\nAs the previous model of buying users, having them use the model, and then training the model became unsustainable, Kimi decisively pivoted towards open-source, embarking on a path of ecosystem building.\nRegarding the rationale for choosing open-source, a Kimi researcher candidly stated, \u0026ldquo;Open-source is primarily about gaining reputation. If it were a closed-source model, it would not have the current level of attention and discussion.\u0026rdquo;\nHowever, the true purpose of open-sourcing extends beyond this; it allows for leveraging community power to enhance the technical ecosystem, and open-source implies higher technical standards, compelling us to produce better models, aligning with the goal of AGI.\nOnce a model is open-sourced, it signifies that the model must demonstrate sufficiently general capabilities, enabling third parties to easily verify and replicate it, rather than relying on so-called special tuning to embellish scores.\nThis strategic shift also carries commercial considerations.\nCurrently, the three most easily commercializable directions for AI are ChatBot subscriptions, AI-generated images/videos, and AI programming.\nFor Chinese users, it is almost inconceivable to expect widespread payment for AI chat, as chatbots serve merely as a traffic and data entry point for AI.\nKimi has attempted commercialization in the past; in May 2024, it launched a tipping feature ranging from 5.2 to 399 yuan. Recently, there have been rumors that Kimi will soon introduce a membership subscription for its Agent feature.\nFormer tipping users showcase Kimi membership benefits.\nIn terms of AI-generated images/videos, Kimi has not updated after launching two gray test products, indicating that this is not a strategic focus. Therefore, emphasizing programming is a choice that leverages strengths and has a viable business model.\nTsinghua University graduate and OpenAI researcher Yao Shunyu recently expressed optimism about this sector: \u0026ldquo;I have been thinking since 2022: why is no one working on Coding Agents, which is clearly very important?\u0026rdquo;\nHe stated, \u0026ldquo;Coding is the best tool for connecting humans and AI, just like a hand. With a hand, one can pick up tools like hammers and scissors to accomplish various tasks. Hence, models are now focusing on coding.\u0026rdquo;\nYang Zhilin, also from Tsinghua, although not publicly stated, shows a consistent strategic thought process through past statements and experiences.\nWhile everyone in 2023 is pursuing general capabilities and aiming for a broad scope, Yang Zhilin has clearly mentioned in interviews that \u0026ldquo;we prioritize 200,000 words of context over competing on general rankings.\u0026rdquo;\nThe design and development philosophy of the Kimi K2 model aligns closely with the direction of Coding Agents.\nAnother core advantage of entering this sector is occupying the ecological niche of domestic alternatives, positioning itself as \u0026ldquo;China\u0026rsquo;s Anthropic\u0026rdquo; to capture the market left by Claude.\nAs a purely domestic model, Kimi faces no compliance or filing issues. Being an early player in this sector, if it can establish an industry ecosystem, even if other open-source models enter the fray, the sunk costs associated with the ecosystem will serve as Kimi\u0026rsquo;s potential moat.\nNot Just Kimi: The Code Gamble of Giants and Unicorns Of course, Kimi is not the only player targeting the strategic high ground of coding. In fact, this has become a battleground for leading domestic large model manufacturers.\nTake Zhiyuan as an example; its approach is particularly noteworthy. As an AI company originating from Tsinghua with a strong national team background, expectations may lean towards a relatively conservative route.\nHowever, Zhiyuan\u0026rsquo;s posture in market competition has been unexpectedly aggressive. Its latest \u0026ldquo;GLM Coding Plan\u0026rdquo; aims to build an extremely open and compatible coding ecosystem. In addition to supporting Claude Code, it has added compatibility with various mainstream AI programming tools such as Roo Code, Cline, and Kilo Code, covering all major IDE environments.\nThis \u0026ldquo;broad net\u0026rdquo; platform strategy, combined with a minimum monthly payment of 20 yuan and promotional incentives, has sparked an intense price war in the large model sector.\nThis seemingly \u0026ldquo;cost-agnostic\u0026rdquo; investment clearly indicates Zhiyuan\u0026rsquo;s ambition: it aims not only to match international top models in technology but also to capture developer mindshare and market share through the most grounded approach, regardless of the cost.\nLow-cost customer acquisition does not imply that Zhiyuan lags in technical strength; rather, it reflects that domestic large model technology has generally reached a globally leading level. GLM-4.5\u0026rsquo;s ability to solve practical problems at one-seventh the price is already close to that of Claude Sonnet 4.\nUnder the CC-Bench evaluation system, domestic open-source models are nearing parity with top models.\nIn multiple open-source evaluations following the release of GLM-4.5, it has maintained competitive parity with international mainstream models, ranking second in the WebDev Arena alongside leading global models, and outperforming Gemini-2.5-Pro and GPT-4.1 in SWE-bench Verified performance. In CC-bench evaluations, Zhiyuan, DeepSeek, and Kimi K2 models have had mixed results, with Qwen-Coder holding a certain advantage.\nNotably, this does not imply that Alibaba is falling behind in the AI programming field; it merely indicates that domestic competition in this sector is intensifying.\nOn September 24, Alibaba made a high-profile announcement at the Yunqi Conference, unveiling significant upgrades to Qwen3-Coder.\nFor a giant like Alibaba, the AI programming sector, which may seem vertical, has garnered unwavering strategic investment. The fundamental reason is that Alibaba understands that developers are the cornerstone and lifeblood of its cloud business.\nDuring the technical sharing at the Yunqi Conference, algorithm scientists from Tongyi Laboratory further elaborated on their profound understanding of AI programming: they believe that code is the core tool for human interaction with the digital world, and AI programming, due to its verifiable nature, will be the first field to achieve general artificial intelligence (AGI). Based on this judgment, Alibaba clearly divides the evolution of AI programming into three stages: from initial code completion to the current code assistant, ultimately advancing towards the ultimate goal of creating an \u0026ldquo;autonomous programming agent\u0026rdquo; capable of independently completing complex tasks like a human engineer.\nTo achieve this ultimate goal, Alibaba\u0026rsquo;s technical route is exceptionally clear: first, inject vast amounts of high-density code data (up to 75 trillion tokens) into the model during the pre-training stage to provide strong code \u0026ldquo;memory\u0026rdquo;; second, treat ultra-long context as key to ensure the model can handle entire code repositories; finally, through reinforcement learning, mimic human learning from debugging errors to continuously enhance the model\u0026rsquo;s limits. Behind all this is Alibaba\u0026rsquo;s massive training infrastructure, built on Alibaba Cloud, capable of instantly launching thousands of virtual environments, providing a \u0026ldquo;Colosseum\u0026rdquo; for the evolution of AI agents.\nThus, the upgrades to Qwen3-Coder—faster inference, higher security, and a 256K context window—are all reflections of this grand strategy. Its open-source version saw a 1474% surge in usage on the OpenRouter platform, further validating the success of this strategy.\nSimilarly, the Qwen3-Max, released at the Yunqi Conference, as Alibaba\u0026rsquo;s latest closed-source flagship model, achieved high scores in real-world problem-solving tests like SWE-Bench. This clearly demonstrates Alibaba\u0026rsquo;s \u0026ldquo;combination punch\u0026rdquo;: using top open-source models to attract the broadest developer base while employing the strongest closed-source models to serve the highest-value enterprise customers, ultimately transforming investments in AI programming into a growth engine for its entire cloud empire.\nWhether it\u0026rsquo;s the rigid demand for programmers to simplify repetitive tasks or the impending surge in programming needs in the era of low-code or no-code, all point to the same future: programming is becoming the \u0026ldquo;universal language\u0026rdquo; of the AI era. Positioning in the programming sector is not merely about choosing a vertical; it is about becoming the infrastructure and operating system for the next generation of AI-native applications—a strategic high ground where the winner takes all.\nA Historic Opportunity, but Also a Historic Challenge Anthropic\u0026rsquo;s ban inadvertently creates a historic opportunity for the development of AI technology in China.\nHowever, winning this counteroffensive may just be the first step in a long journey. The road ahead for all players is not smooth but rather filled with the more brutal \u0026ldquo;scorched earth war.\u0026rdquo;\nThe first challenge is the infrastructure gap between \u0026ldquo;stunning and stable,\u0026rdquo; which is a lifeline in the B2B market. In the early days of Kimi K2\u0026rsquo;s launch, a surge in traffic caused server congestion and delays. While this may be tolerable for C-end users, it poses a fatal flaw for enterprise-level services. In the 2025 AI competition, model performance and stability are equally important. Competitors—whether financially robust internet giants or equally fierce players like Zhiyuan—are closely watching. Every company must prove that it can not only produce \u0026ldquo;bombshells\u0026rdquo; but also provide reliable infrastructure services akin to utilities, which tests the limits of supply chains, engineering capabilities, and massive capital.\nThe second challenge is that the commercialization path after open-sourcing is far more perilous than imagined. Companies like Kimi, Zhiyuan, and DeepSeek, representing the open-source route players, have earned their reputation and entry ticket to the ecosystem, but they have also made their sharpest weapons public.\nFor commercialization, this implies a brutal \u0026ldquo;self-inflicted battle.\u0026rdquo; The API services officially provided by these open-source model companies must contend not only with direct competitors engaged in a price war but also face a more formidable enemy—cloud companies that \u0026ldquo;modify\u0026rdquo; and package their open-source models at lower prices. Alibaba Cloud and Tencent Cloud can easily use any popular open-source model as a lead product at a loss to capture market share, effectively \u0026ldquo;cutting off\u0026rdquo; customers.\nNotably, after the release of Kimi K2, major AI and cloud platforms worldwide have deployed this model, with Perplexity\u0026rsquo;s CEO stating on social media that the company might utilize K2 for post-training due to its excellent performance.\nThus, all open-source players must establish a sufficiently deep moat around their official APIs and Agent functions—whether through extreme performance optimization, unique features, or a robust solution ecosystem—faster than all the \u0026ldquo;free riders.\u0026rdquo; Otherwise, the model\u0026rsquo;s advancement may ultimately only serve to benefit others, leaving them trapped in the quagmire of \u0026ldquo;getting applause but not profits\u0026rdquo; in commercialization.\nUnlike DeepSeek, which is backed by Huansuan Quant and can afford to burn cash, or the internet giants with strong cloud business support, most AI unicorn companies, from the perspective of self-sustainability or accountability to investors, cannot afford to endlessly invest in ecosystems. Finding a balance between technological faith and commercial reality is the most severe test facing these star startups.\nNevertheless, during the window period created by Claude\u0026rsquo;s ban, both AI unicorns and their investors, as well as developers and enterprises needing domestic compliant alternatives, can breathe a sigh of relief. The collective rise of domestic large models at least proves that Chinese AI has the capability to deliver \u0026ldquo;bombshell-level\u0026rdquo; products at critical moments. However, whether this guiding light can continue to burn depends on whether China\u0026rsquo;s AI players can win a more challenging war concerning stability, ecosystem, and business models beyond technology.\nOn this road filled with both opportunities and challenges, Kimi has gained the upper hand, but Alibaba\u0026rsquo;s relentless investment and Zhiyuan\u0026rsquo;s relentless pursuit cannot be underestimated. The \u0026ldquo;throne\u0026rdquo; of Chinese AI remains vacant, and the true king will emerge from the brutal \u0026ldquo;Ironman Triathlon\u0026rdquo; of technology, ecosystem, and business.\n","date":"2025-10-02T00:00:00Z","permalink":"/posts/note-30b9da8eb1/","title":"The Rise of Domestic AI Models Amid Anthropic's Ban"},{"content":"GPT-4.1 Launches for All Users As of today, GPT-4.1 is available for direct use in ChatGPT for all users, whether they are paying or not.\nOfficially, GPT-4.1 is designed for coding tasks and instruction execution, boasting high reasoning efficiency. A user-created chart illustrates its capabilities:\nThis model was initially announced by OpenAI a month ago, intended solely for API use. However, the demand for it in ChatGPT led to its release in this format. Michelle Pokrass, the lead on GPT-4.1, stated:\n\u0026ldquo;We initially planned to restrict this model to API access, but you all wanted it in ChatGPT!\u0026rdquo;\nNow, Plus, Pro, and team users can select GPT-4.1 from the model dropdown. Enterprise and educational users will gain access in the coming weeks. Free users will see the model labeled as \u0026ldquo;GPT-4.1 mini\u0026rdquo; instead of \u0026ldquo;GPT-4o mini.\u0026rdquo;\nPreviously, GPT-4.1 replaced the GPT-4.5 Preview in the API.\nMany users are thrilled to have a faster, better model for AI programming.\nInitial Feedback: Speed is Impressive The integration of GPT-4.1 into ChatGPT is more convenient than using the API, and many users have started experimenting with it right away. As of the time of this report, there is no official word on the daily message limits for GPT-4.1 in ChatGPT, but some users have noted it seems to match the limits of GPT-4o.\nThis means free users can use GPT-4.1 up to 5 times every 24 hours, after which it switches to other models; Plus users can use it up to 80 times every 3 hours.\nFeedback has highlighted several key experiences:\nSpeed: Users report that GPT-4.1 is significantly faster and smoother than other OpenAI models.\nSome users have suggested a method for collaborating multiple models for tasks: using GPT-4.1 to set up a framework and then passing it to another model for review, saving time and boosting efficiency.\nAnother user tweeted:\n\u0026ldquo;It’s very fast and can complete tasks in one go. Here’s an interactive particle simulation it created.\u0026rdquo; This speed is expected, as OpenAI previously positioned GPT-4.1 nano as the \u0026ldquo;fastest and most cost-effective model\u0026rdquo; in their lineup. However, speed is subjective, and users are encouraged to try it themselves and share their experiences.\nClear Instructions Needed: This aligns with GPT-4.1\u0026rsquo;s score of 38.3% in instruction-following benchmarks. Less Flattery: Unlike GPT-4o, GPT-4.1 lacks the excessive flattery that some users found off-putting. Need for Coding? GPT-4.1 is a Great Alternative to o3 and o4-mini Exactly one month ago, on April 15, OpenAI released GPT-4.1. It even outperformed GPT-4.5 in a surprising twist. GPT-4.1 comes in three variants:\nGPT-4.1 GPT-4.1 mini GPT-4.1 nano The following chart compares their performance: In terms of performance, GPT-4.1 excels with a context window of 1 million tokens across all three models, offering high cost-effectiveness.\nOfficially, it’s said to be ideal for coding tasks due to its ability to handle longer code files, maintain code coherence, and efficiently debug and fix errors.\nData backs this up: GPT-4.1 scored 54.6% on the SWE-bench Verified, outperforming GPT-4o by 21.4% and GPT-4.5 by 26.6%. OpenAI has indicated that for everyday coding needs, GPT-4.1 is an excellent substitute for o3 and o4-mini.\nHowever, some users remain skeptical and have questioned OpenAI on social media about whether GPT-4.1 or o4-mini-high is better for coding tasks. The response indicated that GPT-4.1 is suitable for general tasks, while o4-mini-high excels in code editing and creative programming, albeit at a lower cost. How to Differentiate and Choose ChatGPT Models? Across platforms like X and OpenAI\u0026rsquo;s official tweets, a common theme emerges: many users are confused about the available models and their differences. To clarify, we compiled a list of all models available for paid users in ChatGPT and asked ChatGPT to generate a comparison table: Cost comparisons are also included: Finally, here are ChatGPT\u0026rsquo;s personal recommendations on how to select models: Another point of confusion is the lack of manual model switching for free users. Currently, the default model is GPT-4o, and while OpenAI states that other models like GPT-4.1 mini are available for free users to experience, there’s no clear way to know if they are using them.\nWhen will free users be able to manually switch models? Or at least see which model is generating their results? We hope for improvements in this area!\n","date":"2025-05-15T00:00:00Z","permalink":"/posts/note-55bb7c3c17/","title":"GPT-4.1 Launches for All Users: Fast and Efficient"}]