New Ideation for Domino / Sherlock extension
Created: 03 Apr 2023, 10:10 AM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge,
-
ControlNet for data augmentation https://github.com/lllyasviel/ControlNet. Need to work on the prompt to make sure the object stays in the same position, as well as further croping and rearranging.
-
SOTA zero-shot object detection as of now https://blog.roboflow.com/grounding-dino-zero-shot-object-detection/
-
Outpainting for zoom in and out in the same of different background images https://getimg.ai/tools/outpainting
-
Target journal (TPAMI) https://getimg.ai/tools/outpainting. Basically the top1 ML journal
-
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model Flamingo, maybe potentially better than CLIP in the process. There is an openflamingo as well
Monday, 3 April 2023
10:10 am
Shen thought of something in terms of technical detail, not about of practical (must be AI, must be practically useful)
- System that will:
- During data processing / labelling stage - have zero shot OD tool based on LLM + zero shot models to label / detect images without knowing the classes. Use CLIP to have contextual info from image, detect automatically the object, then label the object immediately, especially in rare cases which humans might not do. (near real time during inference is not very mature yet, but for preprocessing still possible)
- During training - StyleGAN / use prompt to change the images for data augmentation for more samples in the rare cases / those that are not doing well by adding more samples to the database. Select those not doing well during training, cluster them in clusters, then each cluster use text prompt to generate image, zoom in and out, crop and change, generate new image from list of underperforming images. Assumption is that the training will be better - data approach is effective and easier to do compared to model approach for the current stakeholders. Use text to regulate how to modify the original image. Recent developments that might be better than CLIP.
- During evaluation - same / similar as Domino current state but add flavours like spatio-temporal knowledge graph
- During deployment - real time inference, what can be added?
- NLP, LLM, SOTA models, Image-to-text, Text-to-image, StyleGAN ish style to augment images
- Research aspect - how to tune to increase accuracy, how small pieces can come tgt to become an effective framework
- Is more of system paper, than a methodological aspect of research
- If solidified, might become great cos is q new
- From her motivation ⇒ file an ID on it. If can be developed, it could be a nice ML paper published in high rank journals (higher chance on journal cos many pieces).
- Before the end of the year - intention is journal so no time constraint
- If pushing for contivation - can use this scope but it is based on shen’s current knowledge
- This is something aligned well w team scope
- It is engineering, not lightweight AI - unless add after deployment side if can add something innovative on that side to add more detail
If push this towards EOY, maybe can:
- Wrap it as an extension of domino / further work
- Add into MS4, time have 3 months in MS4
- Explore of all these things
- Wait till 2024, see if can park into another project - contivation
- Or do this when we have time, since it is pushing for a publication in B3
Aligning well with both our interests
- Shen is more on modelling side
- Mine is more on lightweight AI side
Try to make it that we both find interesting
Ideation
- From shen’s part of the story she has something already, now need to see if I can add anything to the story
For me:
- Think about lightweight AI in evaluation, how to make this go into lightweight AI deployment
- No need to think about LLM since there is some aspect in previous steps
- Should be practically useful
Am I enthusiastic on writing own paper in domino - local conference to publish?
- No, go for more worth it places, in singapore is more of the sales conferences, not much of value
We should start this soon, so that we can propose
Better to think of getting it done in this Thursday
Start ID earlier the better since 4 months
- Has this idea presented (contivation need to put it there)
- All black boxes, might affect
When deployed on edge devices, how to be data-centric?
How to look at it from the data side of things?
https://cms.tinyml.org/wp-content/uploads/summit2022/Situnayake-Daniel.pdf



Ideas:
-
Training aware quantization / compressiong / pruning
- Instead of offline quantization at the end, think of way to know the awareness during training, so that later on so that quantized later it will be better
- So that deployment will alr have some knowledge
-
Row rank adaptation / matrix factorisation / compression
-
How to think of something related to deployment / lightweight AI?
- If it is an add on - it is still fine, but if it is integrated then it will be better
-
Loss function during training incorporating hardware specs / deployment specific info
- But it is model centric
-
Streaming data (real world performance)
- What about streaming data?
-
[Loop back into the labelling portion to assist data flywheel?]{.mark}
-
[On device finetuning / training? On device incremental learning?]{.mark}
- improve the accuracy over time by exploiting fresh information coming from the field, and to deal with concept drift
-
[On-device out-of-distribution detection / concept drift detection ⇒ finetuning]{.mark}
-
[On-device zero shot object detection (not prev trained) after OOD detection, then looping back into the on-device model finetuning to augment the model for increased model perf. Or could also loop back into the very start labelling / model training step to assist data flywheel.]{.mark}
Problem with edge cases that are not even in the validation dataset
- As much as you want to OD additional classes in dataset, in real world you still face edge cases / out of distribution cases
- how to help?
- In real world inference u don’t know the GT
Model is fixed, already known to be able to deploy on the edge device
Human assisted on device finetuning?
- During device trials - assist the algo with GT labelling on device? Will gradually finetune the model based on real data?
- Like reinforcement learning from human feedback (RLHF), but now is model finetuning from human feedback
- But requires user testing
On-device finetuning
- https://arxiv.org/abs/2211.01163 (Click Through Rate finetuning in ecommerce)


-
On device finetuning mitigates the privacy concern since no data collection back to the main training process
-
https://www-users.cse.umn.edu/~fengqian/paper/deeptype_ubicomp19.pdf (finetuning local model based on user input on local device, where local model was first obtained from global pretrained model) https://xumengwei.github.io/

- https://arxiv.org/pdf/2204.11786.pdf Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [Review paper]

-
https://arxiv.org/pdf/2007.11622.pdf (TinyTL)
-
https://arxiv.org/pdf/2107.14759.pdf (TinyML for Concept Drift)
- “improve the accuracy over time by exploiting fresh information coming from the field, and to deal with concept drift”
- “Passive solutions adapt the model at each incoming data,disregarding the fact that a concept drift has occurred inthe data-generating process (or not)”
- Deep learning-based passive solutions examples can be found in [16], [17], [18].
- [16]: H. Li, P. Barnaghi, S. Enshaeifar, and F. Ganz, “Continual learning using bayesian neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
- [17]: G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54—71, 2019.
- [18]: B. Perez-S ´ anchez, O. Fontenla-Romero, and B. Guijarro-Berdi ´ nas, ˜ “A review of adaptive online learning for artificial neural networks,” Artificial Intelligence Review, vol. 49, no. 2, pp. 281—299, 2018
- “active solutions aim at detecting conceptdrift in the data generation process and, only in that case,they adapt their model to the new conditions”
- Deep learning-based active approaches (integrating deep learning solutions with active adaptive solutions) can be found in [31], [32].
- [31]: S. Disabato and M. Roveri, “Learning convolutional neural networks in presence of concept drift,” in 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019, pp. 1—8.
- [32]: Z. Yang, S. Al-Dahidi, P. Baraldi, E. Zio, and L. Montelatici, “A novel concept drift detection method for incremental learning in nonstationary environments,” IEEE transactions on neural networks and learning systems, vol. 31, no. 1, pp. 309—320, 2019.
-
https://arxiv.org/pdf/2103.08295.pdf (TinyML with Online Learning - TinyOL)
-
https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_Rep-Net_Efficient_On-Device_Learning_via_Feature_Reprogramming_CVPR_2022_paper.pdf (Rep-Net: Efficient On-Device Learning via Feature Reprogramming)
-
https://towardsdatascience.com/machine-learning-sensors-truly-data-centric-ai-8f6b9904633a (Machine Learning Sensors)
-
https://arxiv.org/abs/2206.15472 MCUNET v3? - ondevice training under 256kb memory
See [Outlier-Detection-vs-Data-Drift-Detection-vs-Co]
https://cleanlab.ai/blog/outlier-detection/
- OOD detection based on feature embeddings (KNN in embedding space, see distance)
- OOD detection based on classifier predictions (classification problem - use pred class probs)
Anyway for the ideation, I was thinking along the lines of on-device finetuning.
I’ve been reading some lit review on what ppl have done in that and found some that have done on-device online finetuning for Click Through Rate, or for next word prediction.
For our use case, the new labels in real time are not as easily obtained vs CTR / word prediction since there is user feedback as part of the system. Will it be viable to get human feedback during real world trials and conduct finetuning then?
But some papers also state that fine-tuning on full network req large memory and is difficult to run on edge, so they propose finetuning certain portions like BN / biases and freezing the rest to be more memory efficient.
⇒ after discussion w shen
-
Could still look into feeding back the detected problems / images to be finetuned into the first training stage since what we are looking at is still at POC level and we are not directly interacting w customers for example.
-
the out of distribution problem maybe possible to run on device
On device tuning
- If going there, it might be close towards federated learning (have pretrained model, with local update, but might be abit big as a topic in this ID)
- LoRA Low-Rank Adaptation of Large Language Models ⇒ this specific design is super close / fit well with on device tuning (freeze PT weights, only train low rank portion with new model on device)
- See https://github.com/huggingface/peft for other Parameter Efficient
- Only 1 / 2 samples of OOD
- If do finetuning batchnorm - since no prev trained data, might deteriorate the result and shift entire distribution to the new one
- Labelling part will be difficult for on device finetuning - require large models, so this part maybe not very useful
Search more and if nothing applies due to labelling can go back to certain stage and incorp LoRA
- If all ok, except some OOD on device then do tuning
- If cannot then go back to offline
multiple image / group of image descriptions
-
See Distinctive Image Captioning
-
Distinctive Image Captioning via CLIP Guided Group Optimization
-
GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints
-
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
-
Group-based Distinctive Image Captioning with Memory Attention
-
distribute these ideas to other notes