Yeying Xiong | 熊晔颖

resume.html

个人总结

本人工作及在校期间认真负责、行动及自驱力强，信仰不设限的终身学习、热衷探索和解决有价值且有趣的问题、助人成长。寻找多模态大模型、机器学习等相关岗位。

教育背景

北京邮电大学，计算机技术，工程硕士 2015.9 – 2018.6

武汉大学，软件工程，工学学士 2011.9 – 2015.6

武汉大学，市场营销，辅修 2013.3 – 2014.2

工作经历

Urbanic，AI Native 算法工程师 2024.5 – 至今

负责跨境女装电商图搜、商品属性等多模态模型、AI 选品预测、AI 模特生成和换脸研发，2024.12–2025.04 远程兼职顾问接入

AI 选品预测（DressGATv2 + LightGBM）：解决跨境女装新品销量预测难题——新品零历史数据、类目SKU 极度稀疏（7 品类 143 子类）、14d 销量长尾分布 skew=6.5，业内尚无成熟方案；设计 Star Graph + GATv2 双头架构从历史相似款借销售信号，结合 LightGBM 多目标融合（7d 回归 + 14d ratio + SAB 分档分类），通过 calibrated prior 校准缓解 S/B 端偏移；6 轮迭代（alpha/quantile loss/品类先验等）最终 14d WMAPE 1.232→1.157（-6.1%），7d WMAPE=1.042，S 档 F1=0.342，B 档 F1=0.469，Dresses 品类 7d WMAPE=0.898
图搜系统优化：构建自研图搜系统替代阿里云图搜，通过 GroundingDINOv2 + SegFormer 主体识别与类目路由优化，提升搜索转化率与 GMV。一期主体识别优化（已上线）：非成衣 F1：0.608→0.780 (+28.3%)，UCVR 环比 IN +14.68% (4.77%→5.47%)，IQ +14.55% (8.04%→9.21%)；二期系统优化（AB 实验）：图搜 GMV +9.54%，UV 转化率 +7.6%；Find Similar GMV +18.57%，UV 转化率 +23.77%
基于多模态商品理解模型的趋势分析：设计基于 BLIP-2 ITM 的 16 属性商品理解模型，将女装 Category 识别从 34 个类别扩展到 143 个类别（F1=93.57%），关键设计元素 Design Detail 多标签分类 F1=84.68%, 利用 GPT-4 进行数据重标（Neckline F1+25.5%，Length F1+12.6%）；结合模型和 GPT4 通过对多个女装品牌和时尚秀场的多源分析洞察市场趋势，为季节性爆款设计提供参考
AI 模特生成和换脸：研发基于 ComfyUI 的文生图/图生图的模特生成和换脸工作流，确立用于 Prompt 的人脸标签体系；生图流程集成 FLUX+IP-Adapter 生成高级感多样性人脸，SD1.5 后处理增强照片真实感，IC-Light 用于肤色提亮，采用 FaceID 一致性约束将输出与参考图/提示对齐；素人模特换脸利用 PuLID 进行面部特征编码、集成 Canny ControlNet 模块来解决面部朝向，局部（头部）重绘提升人脸清晰度；适配如大码/甜美/运动/性感/冷酷等风格，大幅降低模拍成本，收获内部摄影/买手/品牌运营团队好评，对比真人模特 PV-CTR 稳定提升 10% 左右

字节跳动 | TikTok，机器学习算法工程师 2021.10 –2024.4

负责广告内容的多模态大模型预训练和微调，内部标注平台的人力效率和精度提升，含英语沟通

基于多模态预训练的广告违规类别推荐：1、设计标注平台的 Top-N 推荐模型（～ 200 标签），百万量级数据训练，基于图片、文本等预训练大模型特征和视频、音频 Embedding 等内容特征，聚合广告主基础信息/风险信息/行业、一阶段垂类模型命中情况、投放市场等分桶特征，KL Loss 进行多标签分类的 Transformer 模型；2、多模态预训练：基于广告/直播等多模态无监督数据构建千万级视觉-融合文本对，基于 CLIP+ 分类 +MLM 进行对比学习，并在业务数据进行微调后固定用于适应下游推荐/分类/检索任务；为解决冷启动及长尾类别性能不佳，在前者多模态分类粗召基础上加入对比学习进行精排；3、持续完善推荐系统。设计关键上线指标，构建实验看板，交互改进，埋点等辅助归因；离线线上特征对齐，模型自动迭代，动态标签更新等机制；将模型从视频拓展到落地页、广告创意全市场，人力提效 9%，标注精度提升 3.8%
广告行业 GPT 分类模型：1、设计基于 LLaVA 的广告行业分类（800+ 标签）模型，百万级业务数据微调，Swin-T 提取图片/视频特征，将类别作为 special token 编码到 Prompt，基于 Video-LLaMA 和 Vicuna 生成 reason 进行 CoT 推理，利用层级类别关系设计多层类别预测和多轮对话，进行数据清洗和重标提升数据质量，overall F1 提升 6.7%；2、Prompt Engineering：为提升头部类别性能，设计结合多模态分类模型的两阶段 Top-N 精排实验 (改进 Prompt)，相关类别 F1 提升 1.3%；通过 GPT4 二分类修正 label 和生成更准确的 CoT，微调 Yi-VL，相关类别 F1 提升 5%+
广告/直播精排：为提高风险召回能力，在一阶段垂类粗召模型之后引入精排，利用特征重要性分析，基于 MMT 模型融合内容 Embedding 和分桶特征，改进特征 Pooling（LOUPE）、MoE 等结构，设计风险 Soft 标签减少数据噪音，完善模型自动迭代，降低大盘标注量 2.5%，提升召回 9%
多模态风险粗召：基于 TextCNN、BERT 等模型构建举报评论和高危风险相关性的文本分类模型，根据线上精确/召回率情况迭代多个风险类别的多模态、基于 Shuffle-T 的图片多分类模型

阿里巴巴达摩院 | Alibaba Damo Academy，高级算法工程师 2018.8 – 2021.9

视觉算法研发与落地，担任部分项目 1 号位

单目 3D 目标检测：1、基于 YOLO-v3 对监控视频中车辆进行三维包围框检测，融合角度、角点坐标回归、分类等多任务学习，借鉴 Anchor-free 检测方法中的 corner pooling 进行角点自由度约束，利用相邻帧 motion 增强、基于车辆的刚体特性使用全车辆位置-3D 检测框宽度及偏航角回归关系进行精细化修正；2、数据生成上使用 GTA 生成百万模拟数据，实现交通数据 3D 自动化标注，结合两者使用 CC-SSL 策略混合训练，结果相比 LiDAR 方法实现了更低成本的感知，达到与 2D 检测相当的召回；3、采用 3D 检测框底面与车辆物理信息结合，还原道路真实距离与像素对应关系，实现道路建模，结合道路已知标线支持车速计算，在全场景实现 15% 以上的提升
车辆异常行为检测：设计两阶段的车辆异常行为检测算法。首先基于交通数据集训练的 RetinaNet 与背景建模结合，实现静态候选目标提取，在此基础上进一步通过异常轨迹刻画的车辆行为及关系、以及针对远处拥堵 ROI 进行精细化检测，结合道路建模实现车辆间距计算，聚类异常行为的车辆 cluster，降低了因为上游检测和跟踪导致的误差干扰，实现 F1 线上提升 20% 以上
车流量统计：设计基于轨迹的过线算法，针对跟踪不稳定问题对车道进行自适应的 N-crossing 计算，结合半自动化轨迹相似度计算解决路口转向问题，落地高速、城市道路等场景，参加 NVIDIA AI City Challenge 2021 取得 score=91.57%
特殊车型小样本识别：对特殊车型进行小样本识别，引入 PAM 和 CAM 等注意力模块关注头尾、后车厢等关键区分特征，采用 triplet/center loss 和 label smoothing 增大类间距离缩小类内差异，使用 Multi-Task 学习实现多层级类别标签
路面分割：基于 HRNet-OCR 对摄像头画面进行语义分割，检测车辆可行驶区域，目前实现在 Cityscapes 数据集上 road 类别 IoU=97.39%

Amazon Lab126，算法实习生 2017.10 – 2018.2

Human Orientation Estimation，与海外 mentor 和实习生合作

融合多个人体姿态、行为识别等公开数据集，利用相机内外参进行角度计算生成人体朝向 GT，设计并实验基于多种主流 Backbone 的人体朝向回归模型，对最初直接对图片进行人体朝向分类的方法进行改进，使用 hourglass 网络进行多尺度的人体关键点检测，基于关键点 +MLP 网络计算朝向

地平线 | Horizon Robotics，算法实习生 2017.4 – 2017.9

数据平台目标检测实现自标注，支持安防 + 自动驾驶业务

复现 FasterRCNN COCO 指标并基于业务数据训练，基于模型生成的检测框人工微调实现精细标注，打通线上安防 + 自动驾驶数据模型辅助标注链路，实现并推广组内通用目标检测评测工具

项目经历

ClarityX - 会议室AI决策系统&硬件终端 2024

实时语音转写 + AI 决策分析的SaaS+会议室硬件终端，通过多轮对话、观点提取、共识生成算法帮助团队从争论走向决策，深度集成知识库与办公套件（飞书等）

A-Stock - A 股情绪量化助手 2025

A 股市场情绪量化与选股辅助平台，集成涨停/连板/竞价/题材/舆情多维实时数据，通过情绪指数、风险仪表盘与 AI 研报对话辅助短线决策

Ignite - AI 全链路电商助手 2024

淘宝中小商家 AI 新品全链路工具，集成选品/上架/定价/评价/推广/促销六大 Skill，通过自研聚类引擎与运筹学定价实现 GMV 最大化

cv.run - AI 简历生成器 2025

AI 驱动的在线简历生成器，粘贴任意文本即可自动结构化、润色语言并生成精美网页简历，支持 PDF 导出与链接分享

赛博神算子 - AI 占卜应用 2026.2

AI 塔罗/周易占卜应用，定位娱乐+心理科普，设计合规路由机制确保安全边界

毒舌品味官 - MBTI 互动平台 2024

用户品味测评与 MBTI 交叉分析的社交平台，通过 AI 生成个性化点评内容，驱动高频互动与分享

流场数据的生成、表示和特征提取，毕业论文 2017.10 – 2018.6

阅读 Tecplot 相关流体力学资料，基于流场数据实现多种输入格式生成（2D 映射、点云、体素化），使用聚类、AutoEncoder、GAN 多种变体等无监督学习方法进行数据扩增，使用 FCN 微调回归像（体）素对应的压强及温度值，采用弱监督学习实现对流场特征结构（激波，涡）的检测

验证码字符识别 2016.10 – 2016.12

整理开源及脚本生成的多种验证码图片（包括扭曲，噪点等），熟悉图形处理中的灰度、二值化等操作，使用 opencv 闭运算和黑帽运算等预处理操作进行前处理，完成基于 SSD+VGGNet 的字符识别模型，扩展使用 LSTM+CTC 解决不定长字符序列识别问题

技术能力

编程语言: Python > C++ > Java
算法及框架: 熟悉 Transformer、CNN 多种主流模型及量化加速
开发工具: Claude Code, OpenClaw, Cursor, Google AI, GPT

证书情况

HackPKU, 第四名 (2018.5)
微软创新杯, 最佳创新组全国特等奖 (2014.4)
"创青春 • 精彩在沃"湖北省大学生创业大赛, 移动互联网专项赛"金奖" (2014.4)
IBM University Program Academic Qualification (2013.10)

Personal Summary

Interested in lifelong learning and solving valuable problems, willing to help others grow. Looking for machine learning, AIGC and other related positions.

Education

Beijing University of Posts and Telecommunications, Computer Technology, Master of Engineering 2015.9 – 2018.6

Wuhan University, Software Engineering, Bachelor of Engineering 2011.9 – 2015.6

Wuhan University, Marketing, Minor degree 2013.3 – 2014.2

Work Experience

Urbanic，AI Native Engineer 2024.5 – Now

Pioneered Women's Fashion visual search, product attribute multimodal models, AI product selection prediction, AI model generation and face swapping pipeline, remote consulting role from Dec 2024 to Apr 2025

New Product Sales Prediction (DressGATv2 + LightGBM): Built an end-to-end S/A/B tier classification system for newly launched SKUs before real sales data is available. Designed DressGATv2 (GATv2Conv, 2-layer, 192d) constructing star-shaped subgraphs per new product to aggregate sales signals from historically similar items while preventing data leakage. Engineered a 1,030-dim feature pipeline combining product attributes, visual embeddings (PCA, 77.9% variance), market trend snapshots, neighbor behavior statistics, and target encoding. Developed a hybrid GAT + LightGBM ensemble with seasonal sample weighting and calibrated dual-gat prior correction for robust SAB classification across 15 apparel categories. Achieved avg SAB F1 ~0.54 (spring) across 12 rolling monthly windows; outperformed human buyer labeling on S-tier F1 by +0.40 for Dresses (Jan 2026); single-item inference ~87ms p50 on GPU
AI Product Selection (DressGATv2 + LightGBM): Solved the cold-start sales prediction problem for cross-border women's fashion — zero historical data for new items, extreme SKU sparsity across 7 categories / 143 subcategories, 14d sales long-tail distribution (skew=6.5), no established industry solution; designed Star Graph + GATv2 dual-head architecture borrowing sales signals from historically similar items, combined with LightGBM multi-objective fusion (7d regression + 14d ratio + SAB tier classification), calibrated prior correction to mitigate S/B-end bias; after 6 rounds of iteration (alpha tuning / quantile loss / category priors), final 14d WMAPE 1.232→1.157 (-6.1%), 7d WMAPE=1.042, S-tier F1=0.342, B-tier F1=0.469, Dresses 7d WMAPE=0.898
Visual Search System Optimization: Built in-house visual search system to replace Alibaba Cloud service, improving conversion rate and GMV through object detection and category routing optimization. Phase 1 object detection optimization (live): non-garment F1: 0.608→0.780 (+28.3%), UCVR MoM: IN +14.68% (4.77%→5.47%), IQ +14.55% (8.04%→9.21%); Phase 2 system optimization (AB test): Visual Search GMV +9.54%, UV conversion +7.6%; Find Similar GMV +18.57%, UV conversion +23.77%
Trend Prediction: Developed BLIP-2 ITM-based attribute prediction system expanding women's apparel recognition from 34 to 143 categories while maintaining F1-score (93.52%→93.67%), leveraging GPT-4 for data refinement(neckline F1+25.5pp, length F1+12.6pp); Derived market trend insights through multi-source analysis of e-commerce platforms and runway collections, informing seasonal design strategies
AI Fashion Photography: Developed ComfyUI-based face generation and swapping workflows featuring text-to-image/image-to-image workflows with FLUX+IP-Adapter integration for high-variety premium face synthesis, SD post-processing for photorealism enhancement, and IC-Light components for skin tone optimization, while implementing FaceID consistency control to align outputs with reference images/prompts; Engineered face swapping workflows utilizing PULID for facial feature encoding and integrated Canny/Pose ControlNet modules to resolve facial orientation challenges

TikTok，Machine learning Engineer 2021.10 – 2024.4

Responsible for the improvement of moderation efficiency and accuracy, as well as the development of Multimodal pretraining and MLLM finetuning, involving English communication

Policy Recommendation based on Multimodal Pretraining：1.Led end-to-end design and optimization of multi-platform AI recommender system (videos/urls/ad creatives), collaborating cross-functionally with product/ineering/data/operations teams, which achieved 9% operational efficiency gain and 3.8% accuracy boost in content moderation; 2.Developed a million-scale Top-N recommendation model integrating multimodal pre-trained features (visual, textual, audio) and business-specific attributes (advertiser profiles, risk factors, ad industry, vertical model hits), implemented KL Divergence loss for multi-label classification and deployed via TensorRT-optimized Faster Transformer; Solved cold-start challenges for policy updates through contrastive refinement layer post multi-modal recall; 3.Designed CLIP-based contrastive learning framework with multi-task objectives (classification + MLM), trained on 10M+ ad/livestream visual-text pairs, frozen vision/text encoders for downstream tasks
Ads Industry GPT: Developed 800-category ad understanding system via LLaVA fine-tuning, leveraging Swin-T for visual feature extraction from 2M+ ad creatives. 1.Engineered multimodal prompts with category-specific tokens, integrating Video-LLaMA for video understanding and Vicuna for Chain-of-Thought captioning. Implemented hierarchical prediction with multi-round QA and data quality optimization(F1+6.7%); 2.Addressing head category challenges, engineered a two-stage pipeline in which GPT model selects Top-N candidates from multimodal predictions(F1+1.3%). Later upgraded the model to Yi-VL leveraging more accurate CoT and label generated by GPT4(F1+5%+)
Ads/Live Ranking：Designed dynamic moderation system reducing labeling costs by 2.5% while boosting recall by 9% through, combined content embeddings with bucketing features via feature importance analysis, implementing LOUPE pooling for multi-modal representation, and also introduced Mixture-of-Experts (MoE) architecture with risk-aware soft labels for Noise-Robust modeling
Multimodal Risk Recall：Constructed and iterated NLP model, multimodal models and Shuffle-T backbones for multiple risk prediction

Alibaba Damo Academy，Senior Algorithm Engineer 2018.8 – 2021.9

Traffic Vision algorithm research and development, served as the leader of some projects for on-site communication and features delivery

Monocular 3D Object Detection：1. Pioneered road-aware 3D vehicle detection and road modeling achieving 15% speed estimation improvement: Architected anchor-free detector with geometry constraints (corner pooling + rigid body priors, scaled training data via GTA simulation pipeline(1M+ auto-labeled samples) with CC-SSL strategy and hybrid training
Vehicle Abnormal Behavior Detection：Trained RetinaNet with background modeling and dynamic ROI refinement, and consequently vehicle behavior and correlation are further characterized by abnormal trajectories and vehicle spacing calculated by tracking, 3D and ML models(F1+20%)
Traffic Flow Statistics：Designed a trajectory-based crossing algorithm, and performed an adaptive N-crossing method to solve intersection steering, participated in NVIDIA AI City Challenge 2021 score = 91.57%
Hazardous Chemicals Vehicles Recognition：Multi-Task training, introduced PAM/CAM module to pay attention to the head and tail, applied triplet/center loss and label smoothing to better classify
Road Segmentation：Implemented HRNet-OCR to detect the drivable area, the vehicle IoU for road category on Cityscapes dataset reaches 97.39%

Amazon Lab126，Algorithm Intern 2017.10 – 2018.2

Human Orientation Estimation，working with overseas mentors and interns

Integrate multiple public datasets such as human posture and behavior recognition, and leverage camera parameters to calculate the orientation angle, construct models in which hourglass is used for multi-scale human keypoint detection, and the direction is calculated by the keypoint+MLP network

Horizon Robotics，Algorithm Intern 2017.4 – 2017.9

Data platform object detection to achieve self-labeling, supporting autonomous driving business

Reproduce the FasterRCNN COCO and train on business data, manually fine-tune the detection box generated by the model; realize and promote the general object detection and evaluation tools

Project Experience

ClarityX - Meeting Room AI Decision System & Hardware Terminal 2024

Real-time speech-to-text + AI decision analysis SaaS + meeting room hardware terminal with multi-turn dialogue, viewpoint extraction, and consensus generation algorithms to help teams move from debate to decision, deeply integrated with knowledge base and office suite (Feishu, etc.)

A-Stock - A-Share Sentiment Quant Assistant 2025

A-share market sentiment quantification and stock-picking platform integrating multi-dimensional real-time data (limit-up, consecutive board, auction, themes, news sentiment), assisting short-term decisions via sentiment indices, risk dashboards and AI research chat

Ignite - AI E-Commerce Full-Stack Assistant 2024

Full-chain product launch tool for Taobao SMEs integrating selection, listing, pricing, review analysis, promotion through self-built clustering engine and operations research-driven dynamic pricing

cv.run - AI Resume Generator 2025

AI-powered online resume generator — paste any text to auto-structure, polish language, and generate a beautiful web resume in seconds, with PDF export and link sharing

Cyber Oracle - AI Divination App 2026.2

AI tarot/I Ching divination app positioned as entertainment + psychology education with compliance-aware routing to ensure safe boundary management

Taste Oracle - MBTI Interactive Platform 2024

User taste assessment and MBTI cross-analysis social platform with AI-generated personalized reviews driving high-frequency interaction and sharing

Generation, Representation and Feature Extraction of Flow Field Data，Graduate Thesis 2017.10 – 2018.6

Read relevant fluid mechanics data of Tecplot and transform the data format to generate 2D mapping, Point cloud and voxelization, augment data using AutoEncoder and various variants of GAN, finetuning FCN for the pressure and temperature values corresponding to the regression image(voxel), realizing automatic feature extraction of flow field data

Captcha Character Recognition 2016.10 – 2016.12

Organize and preprocess a variety of captcha pictures (including distortion and noise etc.) , implement opencv preprocessing, LSTM + CTC to solve the problem of variable-length character sequence recognition

Technical Skills

Programming language: Python > C++ > Java
Algorithms and Frameworks: Familiar with Transformer, CNN and quantization
Development Tools: Claude Code, OpenClaw, Cursor, Google AI, GPT
Others: Short videos and Podcast creation, in which a single video exceeded 100,000 views

Certification

PADI OW and AOW Diver Certificate (2023.6)
HackPKU, 4th place (2018.5)
Microsoft Innovation Cup, Best Innovation Group National Grand Prize (2014.4)
Hubei Province College Students Entrepreneurship Competition, Gold Award (2014.4)
IBM University Program Academic Qualification (2013.10)