You can convert PDF file data into cloud-based Google Sheets without losing formatting, especially for table data, by using built-in formulas. This method helps preserve the layout and structure when transferring data from a PDF to a spreadsheet. The process involves using a combination of 13 formulas across 9 steps. In the final step, shown in the image (above), the PDF data is pasted on the right side, while the result on the left is displayed using the QUERY function. A sample of PDF data, which is paste in Google Sheet is provided below for demonstration.
Q1:Q4 | Q |
---|---|
1 | 👇 | Paste PDF Data here | 👇 |
2 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 |
3 | 3 2021311149S Alkesh Dindral SENIOR ASSISTANT 68000 37700 105700 50412 55288 300.00 0 0 54988 |
4 | 2 2021110866L Raj Rani Devi Supdt Gd-II 83600 60548 144148 54060 90088 300.00 0 0 89788 |
In Google Sheets, the ideal scenario would be to copy data from a PDF and have it automatically split by default into a neatly formatted table. But in some type of data format not split neatly, as shown below in column D:
If you split data by a custom delimiter, such as a space, the result often appears unsymmetrical, with data values misaligned and not matching the headers in each row, as shown below. This inconsistency typically requires additional manual adjustments to organize the data correctly and ensure that each value aligns with its respective header.
The unsymmetrical formatting is caused by varying name lengths in the first row, where one name might have three words, another four, and another five. This variation in word count shifts the data, causing the rest of the values to misalign across rows, leading to inconsistent columns and data placement. Each row ends up with different lengths, making it difficult to align the values correctly with their headers.
Here, Google Sheets has a solution or some formulas combination for converting PDF data into spreadsheets using combination of 13 formulas. The below formulas bunch can assist you in the conversion process:
Here’s a practice sheet to help you explore and understand the process better. Click the link below to access it. To edit, make a copy and save it for your use.
https://docs.google.com/spreadsheets/d/1XQuuIK7zoMmlKbd7wGxv4KKZ5RAkQ7eefTt9lbB-FJA/edit?gid=0#gid=0
The following formula combinations can be used to manipulate, clean, and structure data transferred from a PDF to a spreadsheet, ensuring the final result is well-organized and meets your desired format:
- IF(logical_expression, value_if_true, value_if_false)
- LEN(text)
- FIND(search_for, text_to_search, [starting_at])
- TRIM(text)
- SPLIT(text, delimiter)
- RIGHT(text, [num_chars])
- QUERY(data, query, [headers])
- LOWER(text)
- IFERROR(value, [value_if_error])
- TEXTJOIN(delimiter, ignore_empty, text1, [text2], …)
- SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number])
- REGEXREPLACE(text, regular_expression, replacement)
- REGEXEXTRACT(text, regular_expression)
Step 1, REGEXEXTRACT:
This formula is particularly useful in scenarios where you need to extract and work with only the alphabetic part of a text string within a cell. It ensures that even if no alphabetic characters are present, the spreadsheet will not show an error but rather leave the cell empty or proceed without interruption.
Q1:Q12 | Q | T |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | G |
=REGEXEXTRACT(text, regular_expression)
=IFERROR( REGEXEXTRACT(Q3,”[[:alpha:]]+”))
If you have a text string and you want to extract only the sequences of letters (ignoring numbers and symbols), this pattern “[[:alpha:]]+” will match and capture those sequences. e.g. the text “2021110848G“ from above sheet Q3, the said pattern will match only “G“ as result cell T3 after ignoring the number values.
Other formulas may help you
- =IFERROR(LEFT(REGEXREPLACE(Q3, “[^a-zA-Z]+.*”, “”), LEN(Q3)), “”)
- =IFERROR(TEXTJOIN(“”, TRUE, ARRAYFORMULA(FILTER(MID(Q3, ROW(INDIRECT(“1:” & LEN(Q3))), 1), REGEXMATCH(MID(Q3, ROW(INDIRECT(“1:” & LEN(Q3))), 1), “[a-zA-Z]”)))), “”)
Step 2, REGEXREPLACE:
REGEXREPLACE(text, regular_expression, replacement), replaces parts of a text string that match a regular expression pattern with another string. It’s ideal for making complex replacements based on patterns, such as removing unwanted characters.
This formula REGEXREPLACE(Q3,”[^a-zA-Z]”, “”) is particularly useful in scenarios where you need to removes non-alphabetic characters from a text string within a cell.
Q1:Q12 | Q | U |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | GKajolGobarASSISTANTG |
Here LEN
function match the value in cell T3 and REGEXREPLACE
from below sheet Q3, the [^a-zA-Z] pattern will match only “Alphabetically value“ as result cell U3, GKajolGobarASSISTANT after ignoring the number values.
=IFERROR( IF(LEN(T3)=1,REGEXREPLACE(Q3,"[^a-zA-Z]", ""),""))
Other formulas may help you
- =IFERROR(IF(LEN(T2)=1, TEXTJOIN(“”, TRUE, ARRAYFORMULA(MID(Q3, ROW(INDIRECT(“1:” & LEN(Q3))), 1)*(ISNUMBER(FIND(MID(Q3, ROW(INDIRECT(“1:” & LEN(Q3))), 1), “abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ”)))), “”))
- =IFERROR(IF(LEN(T2)=1, TEXTJOIN(“”, TRUE, FILTER(MID(Q3, SEQUENCE(LEN(Q3)), 1), ISNUMBER(FIND(MID(Q3, SEQUENCE(LEN(Q3)), 1), “abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ”)))), “”))
Step 3, IF, LEN, SPLIT and LOWER:
=IFERROR( IF(LEN(T3)=1,SPLIT(lower(Q3),”qwertyuiopasdfghjklzxcvbnm`-=[]\;’,./!@#$%^&*()”),””))
LOWER
converts all letters in a text string to lowercase. This function is useful for standardizing text, especially when consistency in letter case is required. This formula attempts to filter out alphabetic characters from a string and return only the numeric values, while its extract the alphabetical parts into separate cells as (W3, X3, Y3) or W3:Y3.
Q1:Q12 | Q | V |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | 1 2021110848 |
The LEN
function calculates the number of characters in a text string, including spaces. It’s useful for text processing, such as determining the length of a string for validation or formatting purposes.
SPLIT
divides a text string into separate parts based on a specified delimiter, such as a comma or space. It’s ideal for breaking down data into components, like separating names or addresses.
=IFERROR( IF(LEN(T3)=1,SPLIT(lower(Q3),"qwertyuiopasdfghjklzxcvbnm`-=[]\;',./!@#$%^&*()"),""))
Other formulas may help you
- =IFERROR(TRIM(REGEXEXTRACT(Q3, “([a-zA-Z]+)”)), “”)
- =IFERROR(REGEXREPLACE(Q3, “[a-zA-Z`-=[]\;’,./!@#$%^&*()]”, “”), “”)
Step 4, IF, RIGHT, LEN, FIND and SUBSTITUTE:
FIND
locates the position of a substring within a text string, returning the starting character’s position. It’s case-sensitive and often used to extract or manipulate parts of a string.
Q1:Q12 | Q | AM |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | 2021110848G |
=IF(V3="","",( RIGHT(V3,LEN(V3)-FIND("*",SUBSTITUTE(V3," ","*",LEN(V3)-LEN(SUBSTITUTE(V3," ",""))))))&""&T3)
The RIGHT
function extracts a specified number of characters from the end of a text string. It’s helpful for isolating portions of a string, such as the last few digits of a code or ID.
SUBSTITUTE
replaces occurrences of a specific substring within a text string with another substring. It’s useful for modifying text, such as correcting or updating specific parts of a string.
Step 5, IF, RIGHT and LEN:
The IF
formula checks a condition and returns one value if the condition is true and another if it’s false. It’s commonly used for applying logic to perform different actions based on specific criteria.
Q1:Q12 | Q | AN |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | KajolGobarASSISTANT |
=IF(U3="","", RIGHT(U3, LEN(U3)-1))
Step 6, TRIM and TEXTJOIN:
TRIM
removes all leading, trailing, and excess spaces from a text string, leaving only single spaces between words. It’s essential for cleaning up data that may contain irregular spacing.
Q1:Q12 | Q | AO |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | 70000 38700 108700 51861 56839 300 00 0 0 56539 |
=TRIM( TEXTJOIN(" ",1,W3:AH3))
Step 7, TEXTJOIN:
TEXTJOIN
combines multiple text strings into one, using a specified delimiter to separate them. It’s commonly used to concatenate data from different cells while controlling how they’re joined. Here cells in the range AM3:AO3 are simplifies the process of combining text, especially when working with ranges that may contain empty cells.
Q1:Q12 | Q | AQ |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | 2021110848G KajolGobarASSISTANT 70000 38700 108700 51861 56839 300 00 0 0 56539 |
The formula would output "2021110848G KajolGobarASSISTANT 70000 38700 108700 51861 56839 300 00 0 0 56539"
in the cell where the formula is placed.
=TEXTJOIN(" ",1,AM3:AO3)
Step 8, IFERROR and SPLIT:
IFERROR
returns a specified value if a formula results in an error, otherwise it returns the result of the formula. It’s used to handle errors gracefully, providing a fallback value instead of an error message.
This formula is useful when you need to split a range of text data into multiple columns based on spaces, making it easier to work with individual words or elements from each cell. Its take the text from each cell in the range AQ3:AQ and split it across columns wherever there is a space.
E1:P3 | E | F | G | H | I | J | K | L | M | N | O | P |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | emp-id | Name and Rank | Basic Pay | Alwn. | Gross | Deduct | Sub Net | NG Rec | O. Gpf | O. Gis | e- Salary | |
2 | FORMULA | |||||||||||
3 | 2021110848G | KajolGobarASSISTANT | 70000 | 38700 | 108700 | 51861 | 56839 | 300 | 0 | 0 | 0 | 56539 |
Note:
- If some cells in the range AQ3:AQ are empty or do not contain spaces, the
IFERROR
function will prevent errors from being displayed and instead show empty cells in the output. - Ensure that the range where you’re placing this formula has enough columns to accommodate the split data. If the text in AQ3:AQ has more words than available columns, it might overwrite existing data.
=ArrayFormula(IFERROR( SPLIT('Convert PDF to Sheet'!AQ3:AQ," ",0,0)))
Step 9, QUERY:
QUERY
enables advanced data manipulation and retrieval using SQL-like syntax. It’s powerful for filtering, sorting, and summarizing data within Google Sheets, allowing for complex operations on datasets.
The final output process involves converting the data from the PDF into Google Sheets, where the extracted content is then organized, cleaned, and structured within the spreadsheet for further analysis or use and we find the results shown in image as below:
A2:M3 | A | B | C | D | E | F | G | H | IJ | K | L | M | N |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Sr.no. | emp-id | Name and Rank | Basic Pay | Alwn. | Gross | Deduct | Sub Net | NG Rec | – | O. Gpf | O. Gis | e- Salary |
3 | 1 | 2021110848G | KajolGobarASSISTANT | 70000 | 38700 | 108700 | 51861 | 56839 | 300 | 0 | 0 | 0 | 56539 |
=QUERY({'Converted Data Sheet'!E3:Q},"Select * where Col3 is not null",0)
It’s not always necessary to use formulas for converting PDF data into Google Sheets. In some cases, you can simply copy the data from the PDF and paste it into Google Sheets, and the formatting will remain intact. This method works well when the PDF has a straightforward structure that aligns easily with spreadsheet formatting.
However, some PDFs have complex formatting that doesn’t translate well when copied directly into Google Sheets. In such cases, copy-pasting the data might not give you the desired result, as the formatting could be altered or lost in the process.
For PDFs with more intricate layouts or when precise data extraction is crucial, specialized tools available online can be very helpful. These tools are designed to handle complex tables and ensure that the data is accurately extracted from the PDF. Once the data is extracted, you can export it and paste it into Google Sheets.
Other types of way or steps to convert the PDF file data into the Google sheets:
These methods should help you convert PDF data into spreadsheet data efficiently, depending on your specific needs and the tools at your disposal.
- Using Google Docs (for Simple Data), Simple tables with few rows and columns.
- Using Data Extraction Tools (for Complex Data), Tabula, Camelot, or Adobe Acrobat Pro etc. offer more advanced data extraction. Complex tables, multiple pages, or when accuracy is crucial.
- Using OCR (Optical Character Recognition) Tools, PDFs with scanned images of tables or text. Google Drive’s built-in OCR, Adobe Acrobat Pro, or online OCR services.
- Direct PDF to Excel Conversion (Online Tools), Quick conversions with minimal manual effort by Smallpdf, PDF2Go, Soda PDF, etc.
- Manual Entry (for Highly Customized or Complex Data), PDFs with non-standard layouts or where tools can’t accurately capture the data.
Did you like this article?
If you found this post helpful, feel free to share it with your network using the social media buttons on X or via email located on the left below.
YOU MUST KNOW?
- Google Sheets: A Cloud-Based Spreadsheet Solution
- IMPORTRANGE Function in Google Sheets, Benefits and Drawbacks
For any questions or assistance regarding function or formulas to convert PDF data in Google Sheets, feel free to reach out in the comments. The CountLen team is dedicated to providing prompt and effective solutions, ensuring that all your queries are addressed swiftly. If you come across any inaccuracies or misleading information, don’t hesitate to share your feedback in the comments. We’re here to help you make the most of your data!