DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Dear Utol (2025): Totoy Bayo Episode 38 Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
SpaceX just stuck another rocket landing at sea, this time before dawn
2025-06-26 18:32
2468 views
Read More
Children who play video games might have better cognitive performance, study shows
2025-06-26 18:28
1363 views
Read More
Twitter employees slam Elon Musk's 'negligent' plan to fire 75 percent of workers
2025-06-26 18:02
2021 views
Read More
New 'Brexit Party' forgets to register its domain name, website gets claimed by pro
2025-06-26 17:36
2611 views
Read More
Pranksters fool media into thinking Elon Musk laid off employees after Twitter takeover
2025-06-26 16:59
572 views
Read More