New Ideation for Domino / Sherlock extension

Created: 03 Apr 2023, 10:10 AM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge,

ControlNet for data augmentation https://github.com/lllyasviel/ControlNet. Need to work on the prompt to make sure the object stays in the same position, as well as further croping and rearranging.
SOTA zero-shot object detection as of now https://blog.roboflow.com/grounding-dino-zero-shot-object-detection/
Outpainting for zoom in and out in the same of different background images https://getimg.ai/tools/outpainting
Target journal (TPAMI) https://getimg.ai/tools/outpainting. Basically the top1 ML journal
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model Flamingo, maybe potentially better than CLIP in the process. There is an openflamingo as well
- https://www.unum.cloud/blog/2023-02-20-efficient-multimodality
- https://github.com/unum-cloud/uform

Monday, 3 April 2023

10:10 am

Shen thought of something in terms of technical detail, not about of practical (must be AI, must be practically useful)

System that will:
- During data processing / labelling stage - have zero shot OD tool based on LLM + zero shot models to label / detect images without knowing the classes. Use CLIP to have contextual info from image, detect automatically the object, then label the object immediately, especially in rare cases which humans might not do. (near real time during inference is not very mature yet, but for preprocessing still possible)
- During training - StyleGAN / use prompt to change the images for data augmentation for more samples in the rare cases / those that are not doing well by adding more samples to the database. Select those not doing well during training, cluster them in clusters, then each cluster use text prompt to generate image, zoom in and out, crop and change, generate new image from list of underperforming images. Assumption is that the training will be better - data approach is effective and easier to do compared to model approach for the current stakeholders. Use text to regulate how to modify the original image. Recent developments that might be better than CLIP.
- During evaluation - same / similar as Domino current state but add flavours like spatio-temporal knowledge graph
- During deployment - real time inference, what can be added?
NLP, LLM, SOTA models, Image-to-text, Text-to-image, StyleGAN ish style to augment images
Research aspect - how to tune to increase accuracy, how small pieces can come tgt to become an effective framework
Is more of system paper, than a methodological aspect of research
If solidified, might become great cos is q new
From her motivation ⇒ file an ID on it. If can be developed, it could be a nice ML paper published in high rank journals (higher chance on journal cos many pieces).
- Before the end of the year - intention is journal so no time constraint
If pushing for contivation - can use this scope but it is based on shen’s current knowledge
- This is something aligned well w team scope
- It is engineering, not lightweight AI - unless add after deployment side if can add something innovative on that side to add more detail

If push this towards EOY, maybe can:

Wrap it as an extension of domino / further work
- Add into MS4, time have 3 months in MS4
Explore of all these things
- Wait till 2024, see if can park into another project - contivation
Or do this when we have time, since it is pushing for a publication in B3

Aligning well with both our interests

Shen is more on modelling side
Mine is more on lightweight AI side

Try to make it that we both find interesting

Ideation

From shen’s part of the story she has something already, now need to see if I can add anything to the story

For me:

Think about lightweight AI in evaluation, how to make this go into lightweight AI deployment
No need to think about LLM since there is some aspect in previous steps
Should be practically useful

Am I enthusiastic on writing own paper in domino - local conference to publish?

No, go for more worth it places, in singapore is more of the sales conferences, not much of value

We should start this soon, so that we can propose

Better to think of getting it done in this Thursday

Start ID earlier the better since 4 months

Has this idea presented (contivation need to put it there)
All black boxes, might affect

When deployed on edge devices, how to be data-centric?

How to look at it from the data side of things?

https://cms.tinyml.org/wp-content/uploads/summit2022/Situnayake-Daniel.pdf

https://cms.tinyml.org/wp-content/uploads/talks2022/tinyML_Talks_Konstantin_Meshcheriakov_220308-1.pdf

Ideas:

Training aware quantization / compressiong / pruning
- Instead of offline quantization at the end, think of way to know the awareness during training, so that later on so that quantized later it will be better
- So that deployment will alr have some knowledge
Row rank adaptation / matrix factorisation / compression
- https://arxiv.org/abs/2106.09685
How to think of something related to deployment / lightweight AI?
- If it is an add on - it is still fine, but if it is integrated then it will be better
Loss function during training incorporating hardware specs / deployment specific info
- But it is model centric
Streaming data (real world performance)
- What about streaming data?
[Loop back into the labelling portion to assist data flywheel?]{.mark}
[On device finetuning / training? On device incremental learning?]{.mark}
- improve the accuracy over time by exploiting fresh information coming from the field, and to deal with concept drift
[On-device out-of-distribution detection / concept drift detection ⇒ finetuning]{.mark}
[On-device zero shot object detection (not prev trained) after OOD detection, then looping back into the on-device model finetuning to augment the model for increased model perf. Or could also loop back into the very start labelling / model training step to assist data flywheel.]{.mark}

Problem with edge cases that are not even in the validation dataset

As much as you want to OD additional classes in dataset, in real world you still face edge cases / out of distribution cases
- how to help?
- In real world inference u don’t know the GT

Model is fixed, already known to be able to deploy on the edge device

Human assisted on device finetuning?

During device trials - assist the algo with GT labelling on device? Will gradually finetune the model based on real data?
Like reinforcement learning from human feedback (RLHF), but now is model finetuning from human feedback
But requires user testing

On-device finetuning

https://arxiv.org/abs/2211.01163 (Click Through Rate finetuning in ecommerce)

On device finetuning mitigates the privacy concern since no data collection back to the main training process
https://www-users.cse.umn.edu/~fengqian/paper/deeptype_ubicomp19.pdf (finetuning local model based on user input on local device, where local model was first obtained from global pretrained model) https://xumengwei.github.io/

https://arxiv.org/pdf/2204.11786.pdf Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [Review paper]

https://arxiv.org/pdf/2007.11622.pdf (TinyTL)
https://arxiv.org/pdf/2107.14759.pdf (TinyML for Concept Drift)
- “improve the accuracy over time by exploiting fresh information coming from the field, and to deal with concept drift”
- “Passive solutions adapt the model at each incoming data,disregarding the fact that a concept drift has occurred inthe data-generating process (or not)”
  - Deep learning-based passive solutions examples can be found in [16], [17], [18].
  - [16]: H. Li, P. Barnaghi, S. Enshaeifar, and F. Ganz, “Continual learning using bayesian neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
  - [17]: G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54—71, 2019.
  - [18]: B. Perez-S ´ anchez, O. Fontenla-Romero, and B. Guijarro-Berdi ´ nas, ˜ “A review of adaptive online learning for artificial neural networks,” Artificial Intelligence Review, vol. 49, no. 2, pp. 281—299, 2018
- “active solutions aim at detecting conceptdrift in the data generation process and, only in that case,they adapt their model to the new conditions”
  - Deep learning-based active approaches (integrating deep learning solutions with active adaptive solutions) can be found in [31], [32].
  - [31]: S. Disabato and M. Roveri, “Learning convolutional neural networks in presence of concept drift,” in 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019, pp. 1—8.
  - [32]: Z. Yang, S. Al-Dahidi, P. Baraldi, E. Zio, and L. Montelatici, “A novel concept drift detection method for incremental learning in nonstationary environments,” IEEE transactions on neural networks and learning systems, vol. 31, no. 1, pp. 309—320, 2019.
https://arxiv.org/pdf/2103.08295.pdf (TinyML with Online Learning - TinyOL)
https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_Rep-Net_Efficient_On-Device_Learning_via_Feature_Reprogramming_CVPR_2022_paper.pdf (Rep-Net: Efficient On-Device Learning via Feature Reprogramming)
https://towardsdatascience.com/machine-learning-sensors-truly-data-centric-ai-8f6b9904633a (Machine Learning Sensors)
- https://arxiv.org/abs/2206.03266
https://arxiv.org/abs/2206.15472 MCUNET v3? - ondevice training under 256kb memory

See [Outlier-Detection-vs-Data-Drift-Detection-vs-Co]

https://cleanlab.ai/blog/outlier-detection/

OOD detection based on feature embeddings (KNN in embedding space, see distance)
OOD detection based on classifier predictions (classification problem - use pred class probs)

Anyway for the ideation, I was thinking along the lines of on-device finetuning.

I’ve been reading some lit review on what ppl have done in that and found some that have done on-device online finetuning for Click Through Rate, or for next word prediction.

For our use case, the new labels in real time are not as easily obtained vs CTR / word prediction since there is user feedback as part of the system. Will it be viable to get human feedback during real world trials and conduct finetuning then?

But some papers also state that fine-tuning on full network req large memory and is difficult to run on edge, so they propose finetuning certain portions like BN / biases and freezing the rest to be more memory efficient.

⇒ after discussion w shen

Could still look into feeding back the detected problems / images to be finetuned into the first training stage since what we are looking at is still at POC level and we are not directly interacting w customers for example.
the out of distribution problem maybe possible to run on device

On device tuning

If going there, it might be close towards federated learning (have pretrained model, with local update, but might be abit big as a topic in this ID)
LoRA Low-Rank Adaptation of Large Language Models ⇒ this specific design is super close / fit well with on device tuning (freeze PT weights, only train low rank portion with new model on device)
- See https://github.com/huggingface/peft for other Parameter Efficient
Only 1 / 2 samples of OOD
- If do finetuning batchnorm - since no prev trained data, might deteriorate the result and shift entire distribution to the new one
Labelling part will be difficult for on device finetuning - require large models, so this part maybe not very useful

Search more and if nothing applies due to labelling can go back to certain stage and incorp LoRA

If all ok, except some OOD on device then do tuning
If cannot then go back to offline

multiple image / group of image descriptions

See Distinctive Image Captioning
Distinctive Image Captioning via CLIP Guided Group Optimization
GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
Group-based Distinctive Image Captioning with Memory Attention
Compositional Generalization in Image Captioning
distribute these ideas to other notes

Darius Knowledge Hub

Explorer

New Ideation for Domino-Sherlock extension

New Ideation for Domino / Sherlock extension

[Loop back into the labelling portion to assist data flywheel?]{.mark}

Graph View

Table of Contents