Pioneered Women's Fashion trend analysis models, AI model generation and face swapping pipeline, transitioned to remote consulting role since Dec 2024
- Visual Search System Optimization: Built in-house visual search system to replace Alibaba Cloud service, improving conversion rate and GMV through object detection and category routing optimization. Phase 1 object detection optimization (live): non-garment F1: 0.608→0.780 (+28.3%), UCVR MoM: IN +14.68% (4.77%→5.47%), IQ +14.55% (8.04%→9.21%); Phase 2 system optimization (AB test): Visual Search GMV +9.54%, UV conversion +7.6%; Find Similar GMV +18.57%, UV conversion +23.77%
- Trend Prediction: Developed BLIP-2 ITM-based attribute prediction system expanding women's apparel recognition from 34 to 143 categories while maintaining F1-score (93.52%→93.67%), leveraging GPT-4 for data refinement(neckline F1+25.5pp, length F1+12.6pp); Derived market trend insights through multi-source analysis of e-commerce platforms and runway collections, informing seasonal design strategies
- AI Fashion Photography: Developed ComfyUI-based face generation and swapping workflows featuring text-to-image/image-to-image workflows with FLUX+IP-Adapter integration for high-variety premium face synthesis, SD post-processing for photorealism enhancement, and IC-Light components for skin tone optimization, while implementing FaceID consistency control to align outputs with reference images/prompts; Engineered face swapping workflows utilizing PULID for facial feature encoding and integrated Canny/Pose ControlNet modules to resolve facial orientation challenges
Responsible for the improvement of moderation efficiency and accuracy, as well as the development of Multimodal pretraining and MLLM finetuning, involving English communication
- Policy Recommendation based on Multimodal Pretraining:1.Led end-to-end design and optimization of multi-platform AI recommender system (videos/urls/ad creatives), collaborating cross-functionally with product/ineering/data/operations teams, which achieved 9% operational efficiency gain and 3.8% accuracy boost in content moderation; 2.Developed a million-scale Top-N recommendation model integrating multimodal pre-trained features (visual, textual, audio) and business-specific attributes (advertiser profiles, risk factors, ad industry, vertical model hits), implemented KL Divergence loss for multi-label classification and deployed via TensorRT-optimized Faster Transformer; Solved cold-start challenges for policy updates through contrastive refinement layer post multi-modal recall; 3.Designed CLIP-based contrastive learning framework with multi-task objectives (classification + MLM), trained on 10M+ ad/livestream visual-text pairs, frozen vision/text encoders for downstream tasks
- Ads Industry GPT: Developed 800-category ad understanding system via LLaVA fine-tuning, leveraging Swin-T for visual feature extraction from 2M+ ad creatives. 1.Engineered multimodal prompts with category-specific tokens, integrating Video-LLaMA for video understanding and Vicuna for Chain-of-Thought captioning. Implemented hierarchical prediction with multi-round QA and data quality optimization(F1+6.7%); 2.Addressing head category challenges, engineered a two-stage pipeline in which GPT model selects Top-N candidates from multimodal predictions(F1+1.3%). Later upgraded the model to Yi-VL leveraging more accurate CoT and label generated by GPT4(F1+5%+)
- Ads/Live Ranking:Designed dynamic moderation system reducing labeling costs by 2.5% while boosting recall by 9% through, combined content embeddings with bucketing features via feature importance analysis, implementing LOUPE pooling for multi-modal representation, and also introduced Mixture-of-Experts (MoE) architecture with risk-aware soft labels for Noise-Robust modeling
- Multimodal Risk Recall:Constructed and iterated NLP model, multimodal models and Shuffle-T backbones for multiple risk prediction
Traffic Vision algorithm research and development, served as the leader of some projects for on-site communication and features delivery
- Monocular 3D Object Detection:1. Pioneered road-aware 3D vehicle detection and road modeling achieving 15% speed estimation improvement: Architected anchor-free detector with geometry constraints (corner pooling + rigid body priors, scaled training data via GTA simulation pipeline(1M+ auto-labeled samples) with CC-SSL strategy and hybrid training
- Vehicle Abnormal Behavior Detection:Trained RetinaNet with background modeling and dynamic ROI refinement, and consequently vehicle behavior and correlation are further characterized by abnormal trajectories and vehicle spacing calculated by tracking, 3D and ML models(F1+20%)
- Traffic Flow Statistics:Designed a trajectory-based crossing algorithm, and performed an adaptive N-crossing method to solve intersection steering, participated in NVIDIA AI City Challenge 2021 score = 91.57%
- Hazardous Chemicals Vehicles Recognition:Multi-Task training, introduced PAM/CAM module to pay attention to the head and tail, applied triplet/center loss and label smoothing to better classify
- Road Segmentation:Implemented HRNet-OCR to detect the drivable area, the vehicle IoU for road category on Cityscapes dataset reaches 97.39%
Human Orientation Estimation,working with overseas mentors and interns
- Integrate multiple public datasets such as human posture and behavior recognition, and leverage camera parameters to calculate the orientation angle, construct models in which hourglass is used for multi-scale human keypoint detection, and the direction is calculated by the keypoint+MLP network
Data platform object detection to achieve self-labeling, supporting autonomous driving business
- Reproduce the FasterRCNN COCO and train on business data, manually fine-tune the detection box generated by the model; realize and promote the general object detection and evaluation tools