0

Extract certain variable numbers from attached file title

Hello everybody,

I'm working on an archive of graphics files.

 

I've imported preview graphics as images and extracted the title with this formula:

 

extractx(text(Preview), "(.+)\/(.+)", "i", "$2")

 

Each file is named like this

 

NAME-Nr001-200x400cm-Backlight-2021

 

Each number and text in this title is variable.

Is there a way to extract 001 , 200 and 400 in three different fields?

 

001 in field1 named "Numero intero"

200 in field2 named "Larghezza"

400 in field3 named "Altezza"

 

Each group of numbers can have from 1 to 4 numbers.

In any case, the first group of numbers will always go to field 1, the second to field 2, etc

 

Thanks for any help. Hope there is a solution

18 replies

null
    • Fred
    • 1 yr ago
    • Reported - view

    Seems like you are comfortable with extractx so you would use that to find the correct sequence of numbers. I could never get the hang of regEx so I can’t post the code here.

    Good luck

      • thomas.4
      • 1 yr ago
      • Reported - view

      Fred thank you. Let's say I try and I'm stubborn but this goes too far beyond my fairly limited knowledge 😅

    • Fred
    • 1 yr ago
    • Reported - view

    I guess I'm going to learn a bit of regex. :)

    Try this in a formula field:

    extractx("NAME-Nr001-200x400cm-Backlight-2021", "(\d+).(\d+).(\d+)", "", "$2")
    

    You can replace the first text in quotes with your fieldname.

    The second set of quotes is the regex expression. This code only works if there is one character between the first three number sets, i.e. in your example the second "-" and the first "x" work as would any other character/symbol. But if some names have multi-character then this breaks. Someone smarter than me can make the easy change to allow an unlimited amount of characters between. This regex creates three groups (001,200,400). Which you can use later.

    The last set of quotes "$2", tells Ninox which group to display/use.

    • Fred
    • 1 yr ago
    • Reported - view

    So now i'm on a regex kick. You can modify the regex part of the formula below to say:

    extractx("NAME-Nr001-200x400cm-Backlight-2021", "(\d+)[^0-9]+(\d+)[^0-9]+(\d+)", "", "$2")
    

    And now it will look for any non-digit character of any length between the groups of numbers.

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      Fred good job !!!

      I just add 3 groups to remove the 0 before the significant digit. 
      I also replaced the "[^0-9]+" by "-" and "+" which are signs directly present in the filename structure : (0*)(\d+)-(0*)(\d+)x(0*)(\d+)

      Now the 0 at the beginning of the number is removed. For example, the file number goes from 001 to 1. It can be converted directly by Ninox to create a JSON.

      the group $0 contain the begin of name.
      $1 all zero before significant digit of Numero intero. $2 contain Numero intero.
      $3 all zero before significant digit of Larghezza. $4 contain Larghezza
      $5 all zero before significant digit of Altezza. $6 contain Altezza

      var values := parseJSON(
             extractx("NAME-Nr001-200x400cm-Backlight-2021",
                      "(0*)(\d+)-(0*)(\d+)x(0*)(\d+)",
                      "",
                      "{""Numero intero"":$2, ""Larghezza"" :$4, ""Altezza"" : $6}")
                  );
      
      this.'Numero intero' := values.'Numero intero';
      this.Larghezza := values.Larghezza;
      this.Altezza := values.Altezza;

      This new formula returns directly a JSON variable which contains the three values extracted from the file : " Numero intero ", " Larghezza ", " Altezza ". It is easy to assign them to the corresponding fields in the record.

      If you want to test another regular expression, I invite you to try this web site : https://regex101.com/r/cO8lqs/3

      • Fred
      • 1 yr ago
      • Reported - view

      I stayed away with hard coding the "-" and the "x" as I wanted to make something that was more flexible. I guess dropping the leading 0's are ok. I went the with what was in front of me. I like the use of the JSON. It is easier to read and keep track of.

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      Fred of course, you're right is better to stay away "-" and "x" 👍.

      Also, if you replace [^0-9]+ by ^[1-9]+, it's replace my (0*) and it stay only 3 groups (not 6) : 

      [^1-9]+(\d+)[^1-9]+(\d+)[^1-9]+(\d+)

      var values := parseJSON(
             extractx("NAME-Nr001-200x400cm-Backlight-2021",
                      "[^1-9]+(\d+)[^1-9]+(\d+)[^1-9]+(\d+)",
                      "",
                      "{""Numero intero"":$1, ""Larghezza"" :$2, ""Altezza"" : $3}")
                  );
      

      warning : it work only if each number is not over 0.

    • thomas.4
    • 1 yr ago
    • Reported - view

    I am speechless! You have all been really great and I am immensely grateful to you for making me learn something new and for finding a solution to my problem

    With the following formula by @fred I immediately found myself very well

    extractx("NAME-Nr001-200x400cm-Backlight-2021", "(\d+)[^0-9]+(\d+)[^0-9]+(\d+)", "", "$2")

    But now I have one last "problem" that I hadn't thought of. The measurements in the file can also have millimeters.

    For example 200x400cm could be "200,5x400,4cm". With this formula it shows the number before the "," but not the other one after it

    How do I do in this case? The comma, if present in the title of the file, is inserted only and exclusively for the measures

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      thomas try it : 

      [^1-9]+(\d+,?\d+)[^1-9]+(\d+,?\d+)[^1-9]+(\d+,?\d+)

      • thomas.4
      • 1 yr ago
      • Reported - view

      Jacques TUR that works like a charme! 🤩 But can you please explain me how this formula works? I need to edit it to add the year 2020 in the field "ANNO" and the the quantity (only number without STK) that is in the end of the name. 

      G22-Nr035-100x150cm-Backlight-2020-4STK.jpg

      If you check the name of the file, you can see that here I have the name of the customer that is G22. Problem is that here I have name and letters in the name. Is there a way to tell in the formula that the customer's name is till the first "-" or that it can be alphanumeric? Otherwise, for obvious reasons, everything gets messed up 😅

      This is the formula that I'm using now:

      extractx(Titel, "(\d+)[^1-9]+(\d+,?\d+)[^1-9]+(\d+,?\d+)[^1-9]+(\d+,?\d+)", "", "$3 ")

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      thomasJust add (\d+) at the end of RegExp. But with this new file name example, you have another problem: the file name contains a number and the formula is disturbed. It considers the number that is first in the file name :

      To make RegExp usable for all filenames, you need to define which part of the filename never changes and which part can be different. 

      For this file name, I use RegExp with constant separators: n-n,nxn,n..n :

      This works fine, but if in another file name there is a number between Altezza and Year, it will fail : 

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      thomas I think this RegExp should recover all cases. Try it with different representative file names

      -[^0-9]*(\d+)[^1-9]*(\d+,?\d*)x[^1-9]*(\d+,\d*)cm-[\w]*-(\d+)-?(\w+)?

      • thomas.4
      • 1 yr ago
      • Reported - view

      Jacques TUR I think you are near the final solution 😝 the problem now is that if there is no comma in the measure, it doesn't work. With the previous formula this was working.

      Next problem here is that "backlight" can be changed also in "backlight-nicht-hinterleuchtet-. For example:

       

      FRUTMAC-Nr791-335x500cm-Backlight-nicht-hinterleuchtet-2022-4STK.jpg

       

      With your formula it will mess up after "Backlight" because it is considering the position of  "-". So I think that here we need to tell the formula considering the numbers after "Backlight" as group 4 and group 5. The last number"4" in 4STK will not be always present and I think that if not present it would simply not considered because it's the last number but is here possibile to take only the number and not the "STK"?

       

      I think that we have to mash up your formula with the formula of Fred to find the solution

       

      Here are more filenames to see the variables:

       

      G22-Nr035-100x150cm-Backlight-2020

      HOLLU-Nr357-33,5x500cm-Backlight-nicht-hinterleuchtet-2021

      GRIBA-Nr651-200x300cm-Blockout-2018

      GRAMM-Nr470-200x100cm-Backlight-nicht-hinterleuchtet-2021-4Stk

      HAGELSCHUTZ-Nr1082-150x400cm-Backlight-nicht-hinterleuchtet-mit Schlaufe-2021

       

      Thanks a lot for your help

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      thomas 

      -[^0-9]*(\d+)[^1-9]*(\d+,?\d*)x[^1-9]*(\d+,?\d*)cm-[^1-9]+(\d+)-?(\d+)?

      It works for all files. I invite you to try it with this site https://regex101.com/r/Tyz1yY/2. This example is recorded. You can play and learn how RegExp works to be more autonomous. I think it could help you in the future.

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      thomas  Because the year at the end is each time with 4 digits, and in case the file name contains the number, you could use this RegExp. I modified it so that it returns all parts of the complete file name:

      (.*)-Nr(\d+)-(\d+,?\d*)x(\d+,?\d*)cm-(.*)(\d{4}\b)-?(\d+)?(.*)?

      https://regex101.com/r/ZJcvFo/1

    • Fred
    • 1 yr ago
    • Reported - view
    Jacques TUR said:
    Also, if you replace [^0-9]+ by ^[1-9]+, it's replace my (0*) and it stay only 3 groups (not 6) : 

     I'm not sure why you started with 1 instead of 0.

      • Ninox developper
      • Jacques_TUR
      • 1 yr ago
      • Reported - view

      Fred It to convert string number to JSON number...

      extractx("NOM-Nr001-200x400cm-Backlight-2021", "(\d+)[^0-9]+(\d+)[^0-9]+(\d+)", "", "{""Numero intero"":$1}"), return {"Numero intero":001}.

      parseJSON("{""Numero intero"":001}") return nothing because 001 is not a number.

      While extractx("NOM-Nr001-200x400cm-Backlight-2021", "[^1-9]?+(\d+)[^1-9]+(\d+)[^1-9]+(\d+)", "", "{""Numero intero"":$1}"), return {"Numero intero":1}.

      And parseJSON("{""Numero intero"":1}") return {"Numero intero":1}.

    • thomas.4
    • 1 yr ago
    • Reported - view

    Hi to all,

    I've posted a new question about renaming part of the title using a button.

    If you want take a look, maybe you can help me since you already know the formulas applied previously

    https://forum.ninox.com/t/83hsagj/renaming-part-of-title-attachment

    Regarding regular expressions, can someone recommend me a guide. I'm trying to figure out the formulas you sent me but I'm freaking out

     

    Thank you :)

Content aside

  • 1 yr agoLast active
  • 18Replies
  • 154Views
  • 3 Following